Results 1 to 4 of 4

Thread: A few quick questions from a newbie

  1. #1
    Junior Member
    Join Date
    Jun 2011
    Posts
    3

    A few quick questions from a newbie

    1. Is Wave front and Wave front granularity with AMD equivalent to Warp size and warp size granularity with nVidia?

    2. When creating a new variable in a kernel and not exclusively using "private/local/global/const/..." in declaration, for example "float newVar;", in what memory is it created and what is the priority? Is it automatically global?

    4. Lets say that I want to operate on many small vectors of length 64, and my optimal work group size is 256 for my platform. Is it a bad idea (performance wise) to set group size to 32 or 64? Is it very important not to go too far below 256, and instead try to split the same work group out over different vectors? The reason why I ask is because splitting the work group up like that could potentially be bad in some aspects in my implementation.

    3. A question regarding flow control. I read AMD Accelerated Parallel Processing OpenCL Programming Guide (section 1.3.2) and got a question about this statement:
    If work-items within a wavefront diverge, all paths are executed
    serially. For example, if a work-item contains a branch with two paths, the
    wavefront first executes one path, then the second path. The total time to
    execute the branch is the sum of each path time. An important point is that even
    if only one work-item in a wavefront diverges, the rest of the work-items in the
    wavefront execute the branch.
    This cant possibly mean that all work-items in a wavefront is automatically synchronized, right? Only that all cases of the statement is executed by each thread. If not, it seems that the "barrier" command would be useless.

  2. #2
    Junior Member
    Join Date
    Jun 2011
    Posts
    3

    Re: A few quick questions from a newbie

    5. Is there any way to estimate how much private memory I have on my GPU (nVidia GTX 470 and ATI HD5850)?

  3. #3
    Junior Member
    Join Date
    Jun 2011
    Posts
    3

    Re: A few quick questions from a newbie

    6. Is there any particular reason to use 2D or 3D work groups, other than it might be easier/prettier to map the threads to the work space? Performance gain for example?

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: A few quick questions from a newbie

    2. When creating a new variable in a kernel and not exclusively using "private/local/global/const/..." in declaration, for example "float newVar;", in what memory is it created and what is the priority? Is it automatically global?
    It's private by default.

    4. Lets say that I want to operate on many small vectors of length 64, and my optimal work group size is 256 for my platform. Is it a bad idea (performance wise) to set group size to 32 or 64? Is it very important not to go too far below 256, and instead try to split the same work group out over different vectors? The reason why I ask is because splitting the work group up like that could potentially be bad in some aspects in my implementation.
    Very small work group sizes have significantly lower performance. Why not compute 4x64 vectors in each work-group to achieve the ideal work-group size of 256?

    This cant possibly mean that all work-items in a wavefront is automatically synchronized, right? Only that all cases of the statement is executed by each thread. If not, it seems that the "barrier" command would be useless.
    It is true that all work-items inside a warp/wavefront are implicitly synchronized. A work-group will contain multiple warps/wavefronts, so the barrier function is still useful to synchronize between work-items in different warps/wavefronts.

    However, this is an implementation detail and if you write your code assuming that it is true universally, your code will not run on some other OpenCL implementations. I strongly recommend not making implementation-specific tweaks like these.

    5. Is there any way to estimate how much private memory I have on my GPU (nVidia GTX 470 and ATI HD5850)?
    Only reading the programming guides for those two hardware vendors. There's no standard way to query the amount of private memory AFAICR.

    6. Is there any particular reason to use 2D or 3D work groups, other than it might be easier/prettier to map the threads to the work space? Performance gain for example?
    It's simply to make it easier to map to your particular problem domain.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. newbie with some questions
    By benmidgley in forum OpenGL ES general technical discussions
    Replies: 3
    Last Post: 11-01-2004, 07:28 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •