Results 1 to 4 of 4

Thread: Sub-compute units

  1. #1
    Junior Member
    Join Date
    Jan 2010
    Posts
    3

    Sub-compute units

    Hi, I was wondering if there is any way to get how many max processors/stream processors/threads inside a single compute unit. I know that for example in the radeon 4870 card 1 openCl compute unit does not correspond to one thread/stream processor in the 4870. I.e. 1 compute unit can kick off more then 1 thread at a time.

    It would be useful to have some sort of mechanism where you could see how many sub-processors are there in compute unit. For example if I have an array of 100 length and I have 10 compute units, how do i know to distribute the array across the 10 compute units. If for example there was 100 threads in 1 compute unit, it would be best to put the whole array into the first compute unit (local work size = 100 & global work size = 100) and use the last 9 for something else.

    If I was to divide 100 equally across 10 compute units, each compute units would waste the rest 90 threads for that execution right?

    I hope you can see my problem any suggestions to overcome this? Are there any plans to have a device query for max sub-compute units?

    I know that some nvidia cards seem to have 1 stream processor corresponding to one openCl compute unit.

    Thanks

  2. #2
    Junior Member
    Join Date
    Dec 2009
    Posts
    22

    Re: Sub-compute units

    I dont think you need to worry about this. Have a look at 'warps' in nvidia.

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Sub-compute units

    As long as you've got way more work-items than total compute cores (which you always should, say on the order of > 2000 work-items) they should be nicely distributed. It's a secondary optimization to determine the optimal work-group size, and a tertiary optimization to adjust for the specifics of the architecture like this. (I'm not saying it is unneeded, just that it's not something to worry about until later on.) In general, you want your local work-groups to be multiples of 16 for Nvidia and multiples of 64 for AMD.

  4. #4
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: Sub-compute units

    Quote Originally Posted by dbs2
    In general, you want your local work-groups to be multiples of 16 for Nvidia
    I thought it would be 32 on Nvidia GPUs, because that's the warp size and thus work-items in a work-group are executed in a batch of 32. Don't you waste resources if your work-group size is less than 32?

Similar Threads

  1. Limiting number of compute units?
    By llaves in forum OpenCL
    Replies: 2
    Last Post: 03-09-2010, 11:14 AM
  2. Low number of compute units?
    By tmp in forum OpenCL
    Replies: 5
    Last Post: 03-09-2010, 02:50 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •