Results 1 to 5 of 5

Thread: Some newbie questions about workitems and workgroup sizes.

  1. #1
    Junior Member
    Join Date
    Sep 2011
    Posts
    2

    Some newbie questions about workitems and workgroup sizes.

    Please go easy on me and help me understand some things. I have read a lot of documentation but am still confused on some parts and I hope you can help break it down into simpler terms for me.

    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?

    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?

    On my machine, 512 seems to be the minimum number of work-items. Would I queue up 512 instances of the kernel (workitems), and have the last 279 instances do nothing? Thanks ahead of time, and I appreciate any well-thought-out responses.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some newbie questions about workitems and workgroup size

    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?
    This is how it works: each compute unit in your hardware can execute one work-group at a time. The number of work-groups you choose to execute depends on the amount of computation that your algorithm requires. If you have a lot of computation to do, you will typically need a lot of work-groups.

    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?
    The application chooses the number of work-items to execute when calling clEnqueueNDRangeKernel(). Do you see the global_work_size parameter? That's how you choose how many work-items you want to run.

    On my machine, 512 seems to be the minimum number of work-items.
    That's probably the maximum number of work-items per work-group, not the minimum. The minimum is one in any hardware.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Sep 2011
    Posts
    2

    Re: Some newbie questions about workitems and workgroup size

    Quote Originally Posted by david.garcia
    2) How does one queue an arbitrary number of workitems on a GPU? For example, say my algorithm requires me to execute 233 instances of a kernel in parallel, using the GPU. How is this typically done?
    The application chooses the number of work-items to execute when calling clEnqueueNDRangeKernel(). Do you see the global_work_size parameter? That's how you choose how many work-items you want to run.

    [quote:30qif8yn]On my machine, 512 seems to be the minimum number of work-items.
    That's probably the maximum number of work-items per work-group, not the minimum. The minimum is one in any hardware.[/quote:30qif8yn]

    This is what is confusing me. I'm following along with the hello.c program listed here: http://developer.apple.com/library/mac/ ... llo_c.html
    On my machine, if I set global_work_size to point to a value of 512, 1024, or 2048 (etc), it works fine. But any other non-power-of-two, or a value less than 512, will produce errors.

    That code is written to square 1024 floats. What if I only wanted to square 900 floats? If I simply chance 1024 to 900 in that code, I get nothing but errors. Thanks for your patience, I appreciate it.

  4. #4
    Member
    Join Date
    Jul 2011
    Location
    Moscow, Russia
    Posts
    41

    Re: Some newbie questions about workitems and workgroup size

    Quote Originally Posted by david.garcia
    1) Does the number of work-groups affect execution in any meaningful way? Or, are they simply there to provide an optional means of simplifying a problem for the developer?
    This is how it works: each compute unit in your hardware can execute one work-group at a time.
    Actually, no. AMD and NVidia GPUs are running several work-groups at single compute unit.
    Blog (in russian)

  5. #5
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Some newbie questions about workitems and workgroup size

    That code is written to square 1024 floats. What if I only wanted to square 900 floats? If I simply chance 1024 to 900 in that code, I get nothing but errors.
    That's because the code you linked to is explicitly specifying a work-group size when it calls clEnqueueNDRangeKernel(). Notice that global_work_size must always be a multiple of local_work_size.

    There are three ways to solve that issue. Either you pass a local work size that is a multiple of the global work size you want, or you pass NULL as the local work size, or you do something like this inside your kernel:

    Code :
    __kernel void foo(..., uint max_size)
    {
        if(get_global_id(0) < max_size)
        {
            // Kernel code here.
        }
    }
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. newbie questions
    By bsabiston in forum Developers Coding:Beginner
    Replies: 2
    Last Post: 01-28-2012, 09:49 AM
  2. Replies: 1
    Last Post: 08-13-2010, 09:19 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •