Results 1 to 5 of 5

Thread: Some newbie questions about workitems and workgroup sizes.

  1. #1
    Junior Member
    Join Date
    Sep 2011
    Posts
    2

    Some newbie questions about workitems and workgroup sizes.

    Please go easy on me and help me understand some things. I have read a lot of documentation but am still confused on some parts and I hope you can help break it down into simpler terms for me.

    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?

    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?

    On my machine, 512 seems to be the minimum number of work-items. Would I queue up 512 instances of the kernel (workitems), and have the last 279 instances do nothing? Thanks ahead of time, and I appreciate any well-thought-out responses.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some newbie questions about workitems and workgroup size

    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?
    This is how it works: each compute unit in your hardware can execute one work-group at a time. The number of work-groups you choose to execute depends on the amount of computation that your algorithm requires. If you have a lot of computation to do, you will typically need a lot of work-groups.

    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?
    The application chooses the number of work-items to execute when calling clEnqueueNDRangeKernel(). Do you see the global_work_size parameter? That's how you choose how many work-items you want to run.

    On my machine, 512 seems to be the minimum number of work-items.
    That's probably the maximum number of work-items per work-group, not the minimum. The minimum is one in any hardware.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Sep 2011
    Posts
    2

    Re: Some newbie questions about workitems and workgroup size

    Quote Originally Posted by david.garcia
    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?
    The application chooses the number of work-items to execute when calling clEnqueueNDRangeKernel(). Do you see the global_work_size parameter? That's how you choose how many work-items you want to run.

    [quote:30qif8yn]On my machine, 512 seems to be the minimum number of work-items.
    That's probably the maximum number of work-items per work-group, not the minimum. The minimum is one in any hardware.[/quote:30qif8yn]

    This is what is confusing me. I'm following along with the hello.c program listed here: http://developer.apple.com/library/mac/ ... llo_c.html
    On my machine, if I set global_work_size to point to a value of 512, 1024, or 2048 (etc), it works fine. But any other non-power-of-two, or a value less than 512, will produce errors.

    That code is written to square 1024 floats. What if I only wanted to square 900 floats? If I simply chance 1024 to 900 in that code, I get nothing but errors. Thanks for your patience, I appreciate it.

  4. #4
    Member
    Join Date
    Jul 2011
    Location
    Moscow, Russia
    Posts
    41

    Re: Some newbie questions about workitems and workgroup size

    Quote Originally Posted by david.garcia
    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?
    This is how it works: each compute unit in your hardware can execute one work-group at a time.
    Actually, no. AMD and NVidia GPUs are running several work-groups at single compute unit.
    Blog (in russian)

  5. #5
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some newbie questions about workitems and workgroup size

    That code is written to square 1024 floats. What if I only wanted to square 900 floats? If I simply chance 1024 to 900 in that code, I get nothing but errors.
    That's because the code you linked to is explicitly specifying a work-group size when it calls clEnqueueNDRangeKernel(). Notice that global_work_size must always be a multiple of local_work_size.

    There are three ways to solve that issue. Either you pass a local work size that is a multiple of the global work size you want, or you pass NULL as the local work size, or you do something like this inside your kernel:

    Code :
    __kernel void foo(..., uint max_size)
    {
        if(get_global_id(0) < max_size)
        {
            // Kernel code here.
        }
    }
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 1
    Last Post: 08-13-2010, 09:19 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •