Results 1 to 8 of 8

Thread: Running kernel on host vs device

  1. #1
    Junior Member
    Join Date
    Nov 2012
    Posts
    6

    Running kernel on host vs device

    In the Codeproject example:

    // create data for the run
    float* data = new float[DATA_SIZE];

    // Create the device memory vectors
    input = clCreateBuffer(context, CL_MEM_READ_ONLY, sizeof(float) * count, NULL, NULL);

    // Transfer the input vector into device memory
    err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0,
    NULL, NULL);

    // Set the arguments to the compute kernel
    err = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);

    // Execute the kernel
    err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);

    Question is if I can choose between CL_DEVICE_TYPE_GPU or CL_DEVICE_TYPE_CPU, when executing on the host, how would the kernel use data on the host? It seems to me that in clSetKernelArg, the kernel is always set to use &input, which is on the device, and that doesn't make sense when running on the CPU.

    Any clarification is much appreciated.
    -J

  2. #2
    Senior Member
    Join Date
    Dec 2011
    Posts
    168

    Re: Running kernel on host vs device

    With AMD and Intel OpenCL platform drivers, you can select OpenCL devices that are the CPU instead of the GPU.

    The rest of OpenCL works just like it would with a GPU.

    With your code, clEnqueueWriteBuffer copies data from CPU memory to another part of CPU memory, and then when you execute your kernel on the CPU, it access that memory.

    If you know you are running on the CPU, using clEnqueueMapBuffer can be faster because memory isn't copied, just ownership changes (when mapped you can access the buffer from your main code, when unmapped from kernels; the map and unmap calls are fast).

  3. #3
    Junior Member
    Join Date
    Nov 2012
    Posts
    6

    Re: Running kernel on host vs device

    Dithermaster, thanks very much for your response.

    So only when device type is set to CL_DEVICE_TYPE_GPU, does clEnqueueWriteBuffer actually copies the data to the device over PCIe, causing the long delay?

    Thx

  4. #4
    Senior Member
    Join Date
    Dec 2011
    Posts
    168

    Re: Running kernel on host vs device

    Yes, for GPU clEnqueueWriteBuffer enqueues a command which will asynchronously copy the data over the PCIe bus. If speed is paramount here, read the vendor documentation on how to maximum speed, for example, by using pinned buffers. You could also switch to a model where you use clEnqueueMapBuffer, which always runs at full PCIe bandwidth.

  5. #5
    Junior Member
    Join Date
    Nov 2012
    Posts
    6

    Re: Running kernel on host vs device

    Dithermaster, thanks very much for the explanation.

  6. #6
    Senior Member
    Join Date
    Oct 2012
    Posts
    165

    Re: Running kernel on host vs device

    Dithermaster,

    where are your infos abput the full PCIe bandwith from? Do you have a source for that?

    Thanks in Advance,
    Clint3112

  7. #7
    Senior Member
    Join Date
    Dec 2011
    Posts
    168

    Re: Running kernel on host vs device

    From each manufacturer's OpenCL documentation. The each have guides that recommend the fastest way to transfer data to their devices.

  8. #8
    Senior Member
    Join Date
    Oct 2012
    Posts
    165

    Re: Running kernel on host vs device

    And where did you get those specs? I could not find them on the nvidia page. need to know what binary i till get from CL_PROGRAM_BINARY

Similar Threads

  1. explicit copy from host to device
    By sajis997 in forum OpenCL
    Replies: 1
    Last Post: 03-11-2013, 01:06 AM
  2. Device-host memory communication
    By jbasic in forum OpenCL
    Replies: 2
    Last Post: 10-07-2009, 10:48 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •