Results 1 to 7 of 7

Thread: Threads more than local work size

  1. #1
    Junior Member
    Join Date
    Feb 2014
    Posts
    15

    Threads more than local work size

    Hi,
    I have a simple question. What happens if I use more threads than local work size constraint?

    Is it normal that kernel gives me random values?

    I hope to be clear, thank you!

  2. #2
    Senior Member
    Join Date
    Dec 2011
    Posts
    170
    If you specify a work group size larger than your hardware or kernel supports, the clEnqueueNDRange call should fail and return an error code.

  3. #3
    Junior Member
    Join Date
    Feb 2014
    Posts
    15
    It does not happen. what's the mistake?? It continues to working but giving back wrong values!

  4. #4
    Junior Member
    Join Date
    May 2011
    Posts
    17
    Your question isn't very clear. Presumably by "threads" here you mean "work-items", but what is not clear is whether you mean in a single group or as a whole ndrange. If you use more than the maximum size of a work-group the runtime should give you an error. If you use more in the ndrange then you will have more than one work-group. There is no synchronization defined in OpenCL between work-groups which means that if you don't construct your code carefully you may have unexpected behaviour if you expect them to be running in any particular order. If, for example, you are relying on a barrier in your code that barrier will only affect one of the work-groups, not all of them, so your synchronization would be invalid.

    The reason for this is that a GPU, like a CPU, can only actually have a certain number of thread contexts at a time and this number is abstracted away in OpenCL. Instead the model is based on streaming more work-items over that underlying set of thread contexts. You may not have enough capacity in the machine to run everything concurrently and hence it is always valid to serialize the set of work-groups. There can, therefore, be no global synchronization in the model.

  5. #5
    Junior Member
    Join Date
    Sep 2013
    Posts
    7
    Hi,

    I am going to piggy back on this thread to ask another simple question. Is there a maximum number of work groups? I know there is a max number of items per group, but is there a similar value for work groups? Or can I make as many groups as I would like?

  6. #6
    Senior Member
    Join Date
    Dec 2011
    Posts
    170
    > If you specify a work group size larger than your hardware or kernel supports, the clEnqueueNDRange call should fail and return an error code.
    That would be nice but I don't think you can reliably expect that from every driver. Some might crash or return incorrect results.

    > Is there a maximum number of work groups?
    The limit is pretty big. Ideally the global work size is limited by the largest number that fits in a size_t but more likely it is limited by the driver to something smaller, I'm guessing somewhere between 2^16 to 2^31. If your global work size is larger than the maximum work group size it will run as many work groups as necessary to get the work done. They might run in parallel or serial or a combination. Practically, on some platforms you will find the limit is time; the OS will kill the kernel if it takes more than a few seconds to run on your global work size.

  7. #7
    Junior Member
    Join Date
    Sep 2013
    Posts
    7
    Thanks for the answer Dithermaster.

    I am looking at a bitonic sort example and as far as I can see each work item only works on one element in the sequence it is sorting. So if the upper limit is 2^31 and the sequence is larger, then some more trickery will have to be done in order to make it work. It seems to be a pretty basic implementation though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •