Results 1 to 7 of 7

Thread: question on work-items?

  1. #1
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    question on work-items?

    Hi,
    I have a question about one of the basic concepts in OpenCL, work-items. Suppose you have an array of 1,000,000 elements and you want to execute some code on each item which is completely independent from other items. Now you can have two scenarios to do so:
    1- You can have a work-item for each element, which adds up to 1,000,000 work-items. As the GPU likely would not have this number of PEs to assign to each work-item, I think some work-items will have to wait until the others are completed. Am I correct? How are work-items mapped to PEs during runtime?
    2- Now suppose you want to unroll the parallel algorithm, such that each work-item deals with more than just one element. For example if total number of PEs is 100, then each work-item is responsible for processing 10,000 elements. How can I achieve this goal assuming that I donít know the number of PEs in GPU?

    I will really appreciate any kind of suggestions!

  2. #2

    Re: question on work-items?

    Quote Originally Posted by Arian
    Hi,
    I have a question about one of the basic concepts in OpenCL, work-items. Suppose you have an array of 1,000,000 elements and you want to execute some code on each item which is completely independent from other items. Now you can have two scenarios to do so:
    1- You can have a work-item for each element, which adds up to 1,000,000 work-items. As the GPU likely would not have this number of PEs to assign to each work-item, I think some work-items will have to wait until the others are completed. Am I correct? How are work-items mapped to PEs during runtime
    Work items in the same work group are executed parallely.

    Quote Originally Posted by Arian
    2- Now suppose you want to unroll the parallel algorithm, such that each work-item deals with more than just one element. For example if total number of PEs is 100, then each work-item is responsible for processing 10,000 elements. How can I achieve this goal assuming that I donít know the number of PEs in GPU?!
    Why don't you know the number of PEs? You can divide the number of data elements by CL_DEVICE_MAX_WORK_GROUP_SIZE.

    But the question is, why do you want to "unroll"?

  3. #3
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: question on work-items?

    Quote Originally Posted by Arian
    Hi,
    I have a question about one of the basic concepts in OpenCL, work-items. Suppose you have an array of 1,000,000 elements and you want to execute some code on each item which is completely independent from other items. Now you can have two scenarios to do so:
    1- You can have a work-item for each element, which adds up to 1,000,000 work-items. As the GPU likely would not have this number of PEs to assign to each work-item, I think some work-items will have to wait until the others are completed. Am I correct? How are work-items mapped to PEs during runtime?
    2- Now suppose you want to unroll the parallel algorithm, such that each work-item deals with more than just one element. For example if total number of PEs is 100, then each work-item is responsible for processing 10,000 elements. How can I achieve this goal assuming that I donít know the number of PEs in GPU?

    I will really appreciate any kind of suggestions!
    PE isn't an opencl term, perhaps you mean CU, not that it is terribly important.

    1. It just runs in batches of the size the hardware can run concurrently, until they're all done. The number it can run concurrently depends on the hardware, how many cu's there are on the card, how many threads can run at the same time, etc.

    2. Err, write a loop? You answer your own question here.

  4. #4
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: question on work-items?

    Quote Originally Posted by notzed
    PE isn't an opencl term, perhaps you mean CU, not that it is terribly important.
    Oops, got that wrong, for some reason I thought it was a vendor-specific term like 'wave front' is.

    Anyway, PE cannot be queried from code and is more of an implementation detail anyway. CU can be queried, and thus used to fit an algorithm to a device.

  5. #5
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    Re: question on work-items?

    Thanks for your replies, but I didn't actually get my answer.
    Let me ask my question in a different way: according to the specification there is no limit on the global number of work-items, but logically in any given time there should be a limit on the number of work-items that can be mapped to real hardware processing elements. In this regard, what is the maximum number of work-items that can run simultaneously on a GPU?

    Thanks,

  6. #6
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: question on work-items?

    Quote Originally Posted by Arian
    Thanks for your replies, but I didn't actually get my answer.
    Let me ask my question in a different way: according to the specification there is no limit on the global number of work-items, but logically in any given time there should be a limit on the number of work-items that can be mapped to real hardware processing elements. In this regard, what is the maximum number of work-items that can run simultaneously on a GPU?

    Thanks,
    I answered that. "batches of the size the hardware can run concurrently"

    How big this number is depends entirely on the hardware (which we don't know what you're using, and even if we did, don't know enough about it to tell you accurately), and your code (which we know nothing about). So there's no possible way to be any more specific than that.

    You have to study the details of the specific hardware and vendor implementation, and your own code to determine this. Or at least run it on a given piece of hardware and see what the profiler tells you it did.

  7. #7
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    Re: question on work-items?

    Thanks notzed,

    I didn't ask for the number of work-items in my GPU of course it is hardware dependent and different for each device. I was asking a more general question about work-item assignment and scheduling. However, I think I found my answer.

Similar Threads

  1. help with work items in work groups
    By gatodelsol in forum OpenCL
    Replies: 3
    Last Post: 09-14-2011, 09:12 AM
  2. Maximum number of work-items
    By matts in forum OpenCL
    Replies: 1
    Last Post: 04-29-2011, 05:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •