Results 1 to 6 of 6

Thread: Low number of compute units?

  1. #1
    Junior Member
    Join Date
    Mar 2010
    Posts
    14

    Low number of compute units?

    According to the official specs of my graphic card (look in the "Specifications" tab) it should hold 16 computation units. However, when querying MAX_COMPUTE_UNITS on the device, it only returns "2". Can anyone explain where the other 14 units might have gone?

    FYI: I use PyOpenCL as wrapper.

    Thanks

  2. #2
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: Low number of compute units?

    I guess the 16 computations units describe the 16 cores on your GPU. On current NVidia GPUs you have sets of 8 cores grouped together in what they call streaming multiprocessor (SM).
    That's probably why the OpenCL implementation says there are two compute units.

  3. #3
    Junior Member
    Join Date
    Mar 2010
    Posts
    14

    Re: Low number of compute units?

    Interesting. But what does that mean from a parallelization point-of-view? Will my kernel only be executed in two cores simultaneously, or will it automatically be distributed to all 16 units?

  4. #4
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: Low number of compute units?

    Each workgroup will be scheduled to a compute unit. The workitems, however, are distributed across the cores on a compute unit.

  5. #5
    Junior Member
    Join Date
    Mar 2010
    Posts
    14

    Re: Low number of compute units?

    Can I somehow verify that all 16 compute units (cores) are used? It worries me, that opencl only returns "2" when I ask for MAX_COMPUTE_UNITS, and my running times also match suspiciously well to a situation where only two cores are used. Would like to verify that this is not the case.

  6. #6
    Member
    Join Date
    Nov 2009
    Location
    Scotland
    Posts
    72

    Re: Low number of compute units?

    I can't think of a way of verifying how many cores are used, but it really is normal that MAX_COMPUTE_UNITS on NVidia GPUs returns the number of SMs rather than cores. On an NVidia Tesla S1070 which has 240 cores it returns 30, because that's the number of SMs on that chip.

    What exactly do you mean with "my running times also match suspiciously well to a situation where only two cores are used."? There can be several reasons why your program doesn't show the expected speedup, e.g. your program could be bandwidth-limited.

Similar Threads

  1. Replies: 5
    Last Post: 06-04-2011, 08:06 PM
  2. Limiting number of compute units?
    By llaves in forum OpenCL
    Replies: 2
    Last Post: 03-09-2010, 11:14 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •