Results 1 to 4 of 4

Thread: warp size vs # of SPs per SM

  1. #1
    Junior Member
    Join Date
    Nov 2012
    Posts
    6

    warp size vs # of SPs per SM

    In my GPU there are 384 cores, 8 compute units (streaming multiprocessors), so there 384/8 = 48 streaming processors on each compute unit. Given that NVidia warp size is 32, which means 32 threads execute in step, doesn't that mean 48-32=16 SPs are not doing anything on each cycle? That doesn't seem to make sense to me. Can someone help to clarify?

    Thanks,
    J

  2. #2

    Re: warp size vs # of SPs per SM

    I'm guessing you have a 2nd-gen Fermi (cc 2.1). The scheduling on those is a little weird and I don't entirely have my head around it myself, but if you read the CUDA C Programming Guide appendix on Fermi it explains it all.

  3. #3

    Re: warp size vs # of SPs per SM

    On Fermi, each warp is physically executed as two half-warps; the 2.1 devices can effectively run 3 half-warps at once. (The thing is actually more complex, due to the device ability to issue more than one independent instruction per cycle, but that's the gist of it.)

  4. #4
    Junior Member
    Join Date
    Nov 2012
    Posts
    6

    Re: warp size vs # of SPs per SM

    Quote Originally Posted by Bilog
    On Fermi, each warp is physically executed as two half-warps; the 2.1 devices can effectively run 3 half-warps at once. (The thing is actually more complex, due to the device ability to issue more than one independent instruction per cycle, but that's the gist of it.)

    Thanks guys!

Similar Threads

  1. How much waste can the warp divergence bring?
    By linyufly in forum OpenCL
    Replies: 4
    Last Post: 08-30-2012, 07:29 AM
  2. How to get the warp/wavefront size in runtime?
    By yoavhacohen in forum OpenCL
    Replies: 2
    Last Post: 02-01-2012, 04:21 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •