Page 2 of 2 FirstFirst 12
Results 11 to 20 of 20

Thread: Running the same kernel on multiple devices

  1. #11
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    I've encountered on one more obstacle in specifications, even when trying to implement "secure" data transfer method with multiple buffers.
    Since clCreateKernel returns one object for all devices program has been built on, it's impossible to use clSetKernelArg with different buffers for different devices. This forces one to make multiple cl_program objects (one for each device), build the programs for their device, and create separate kernels. Ugly.
    clSetKernelArg could have optional parameter cl_device_id, since one kernel object for all devices limits the operations with kernel, like in this case.

    Tho, nVidia's OpenCL SDK is offering multi-gpu example, with the following unlogical solution to the problem:
    Code :
        for(unsigned int i = 0; i < ciDeviceCount; ++i ) 
        {
            workSize[i] = ...;
     
            // Input buffer
            d_Data[i] = clCreateBuffer(cxGPUContext, CL_MEM_READ_ONLY, workSize[i] * sizeof(float), NULL, &ciErrNum);
     
            // Copy data from host to device
            ciErrNum = clEnqueueCopyBuffer(commandQueue[i], h_DataBuffer, d_Data[i], workOffset[i] * sizeof(float), 0, workSize[i] * sizeof(float), 0, NULL, NULL);        
     
            // Output buffer
            d_Result[i] = clCreateBuffer(cxGPUContext, CL_MEM_WRITE_ONLY, ACCUM_N * sizeof(float), NULL, &ciErrNum);
     
            // Create kernel
            reduceKernel[i] = clCreateKernel(cpProgram, "reduce", &ciErrNum);
     
            // Set the args values and check for errors
            ciErrNum |= clSetKernelArg(reduceKernel[i], 0, sizeof(cl_mem), &d_Result[i]);
            ciErrNum |= clSetKernelArg(reduceKernel[i], 1, sizeof(cl_mem), &d_Data[i]);
            ciErrNum |= clSetKernelArg(reduceKernel[i], 2, sizeof(int), &workSize[i]);
     
            workOffset[i + 1] = ...;
        }

    reduceKernel[i] and clSetKernelArg usage in the example makes no sense to me.

  2. #12
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    You can create multiple kernels from the same program and set their arguments differently instead of duplicating the program.

    However, you should be able to enqueue a kernel with one set of arguments, change the arguments and enqueue it again. The runtime should take care of keeping track of what arguments each enqueued kernel should use. Given that you wouldn't even need to create multiple kernels unless it's more convenient for your program.

  3. #13
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    So clSetKernelArg followed by enqueueing the kernel on specific devices tells OpenCL which device should get the data set by clSetKernelArg... interesting, thought unlogical and never mentioned in specs.

    More interesting, this example does set the kernel args after creating same kernel instances (reduceKernel[i]) *without* enqueueing directly after setting the args. reduceKernel[i] all are the same kernel, since clCreateKernel has no device argument... Makes no sense. Am I missing something?

  4. #14
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    A kernel is defined to run on any of the devices for which it was built. You can then enqueue it on multiple devices. Each time you do that you specify the device to which you want to enqueue it via the command queue passed in to clEnqueueNDRangeKernel.

    Think of cl_kernel as keeping track of which arguments are set for an instance of a kernel in a program. You can have as many of these argument sets (cl_kernels) as you want.

    The arguments for a kernel stay the same until you change them. When you enqueue a kernel it is the runtime's responsibility to keep track of which arguments were used for that enqueuing so after you have done that you can change them. It's not complicated, but it is a bit unclear that you can change the arguments after enqueuing the kernel.

  5. #15
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    I know, you're not following me. To repeat myself one more time, nVidia's example acts contradictory to what we said.

    Quote Originally Posted by dbs2
    A kernel is defined to run on any of the devices for which it was built.
    Contradiction: reduceKernel[i] = clCreateKernel(cpProgram, "reduce", &ciErrNum);

    Quote Originally Posted by dbs2
    The arguments for a kernel stay the same until you change them. When you enqueue a kernel it is the runtime's responsibility to keep track of which arguments were used for that enqueuing so after you have done that you can change them.
    Contradiction: clSetKernelArg gets called in a loop, without any of enqueue function calls afterwards.
    After the mentioned loop, comes the loop of clEnqueueNDRangeKernel for each device, and kernel.

  6. #16
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    What Nvidia is doing is fine. They appear to want to have N kernels with N different arguments so they can run them all at once. You're right that I'm not following you here. I can build a program for all devices in a context and then enqueue a kernel from that program it on any device I want. I can also create as many kernels as I want. For example, if I wanted to have different arguments (as they appear to be doing) then I could either create multiple kernels with different arguments or setarg/enqueue multiple times. There's nothing wrong with what Nvidia is doing, and, in fact, if you want to quickly re-enqueue the kernel with the same arguments this is a fine way to do it. My point is just that this is not necessary. If you build the program for all devices in a context, assuming it builds without errors, you can then execute kernels created from that program on any of the devices.

    Does that make sense? If not then I'm definitely not following you. :)

  7. #17
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    Thanks, I see what you mean... Calling clCreateKernel for the *same* kernel *multiple* times, gives you *different* cl_kernel instances. Now with different instances, you can set args without directly queuing afterwards. Multiple instances of same thing make some sense, but looking again make no sense at all . In specs should also be said that creating kernel of same name multiple times return different cl_kernel instances, or maybe this is implementation defined...
    I'll stick to one kernel, and let OpenCL keep track of arg setting and queuing.

  8. #18
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Running the same kernel on multiple devices

    I haven't read the spec for a while (and remember that it's aimed at implementors, not users) but you can assume that on all implementations calling clCreateKernel multiple times will create multiple kernels safely. I would suggest you use whatever is most comfortable for your implementation as the overhead of dealing with multiple kernels should be negligible compared to the actual enqueuing and computation.

  9. #19

    Re: Running the same kernel on multiple devices

    Just to make sure I've understood this correctly.

    I have a class that encapsulates a set of kernels, and I want each instance of this class to have its own set of parameters sent to each kernel invocation. I have done this by having a static cl_program in the class, and instance variables for the cl_kernels. The first time an instance is created, the constructor compiles the .cl source and the static cl_program is initialized. Each instance calls clCreateKernel to get local copies of the kernel objects, and then clSetKernelArg to set the kernel arguments for this particular instance.

    Is this a reasonable way to achieve what I want?

  10. #20
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Running the same kernel on multiple devices

    Yes, it's is reasonable, nVidia's example confirms it.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. CreateBuffer for Multiple Devices.
    By Seshadri in forum OpenCL
    Replies: 5
    Last Post: 05-06-2011, 02:30 PM
  2. clEnqueueCopyBuffer and multiple devices
    By Banjobeni in forum OpenCL
    Replies: 7
    Last Post: 04-15-2011, 04:04 PM
  3. EnqueueWriteBuffer for multiple Devices
    By centershock in forum OpenCL
    Replies: 0
    Last Post: 03-30-2011, 07:55 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •