Results 1 to 4 of 4

Thread: Constant Memory latency

  1. #1

    Constant Memory latency

    So we know that on GPU (Nvidia specifically) that global memory access is a *lot* slower than local storage. Does anybody know how the memory spaces, in particular constant memory, compare?

    I have some routines that calculate values based on tables as static constants in the source and I'm wondering if I copied these in to local memory whether I might get a speed increase. The OpenCL spec says that constants are allocated in an area of global memory, so should I expect to have to do similar caching techniques as I do with global memory, or do constants get loaded into a faster access store?

  2. #2
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Constant Memory latency

    I'd suggest looking at the Nvidia programming guides. I don't remember off-hand where they are stored in hardware, but it isn't the same physical location as global memory.

    My understanding is that to get the absolute maximum performance for MADs you need to have one source come from local memory, one from registers, and one from constant memory, so that would suggest it's different.

    You could always write a simple copy kernel to see what is fastest. :)

  3. #3
    Member
    Join Date
    Sep 2009
    Posts
    35

    Re: Constant Memory latency

    Paul,

    nVidia's OpenCL best practices guide 3.2.5 Constant Memory is saying:
    ... The constant memory space is cached. As a result, a read from constant memory costs one memory read from device memory only on a cache miss; otherwise, it just costs one read from the constant cache. For all threads of a half warp, reading from the constant cache is as fast as reading from a register as long as all threads read the same address. ...
    Also I recommend reading Dr.Dobb's "CUDA, Supercomputing for the Masses" whole article, http://www.ddj.com/architect/207200659.

  4. #4

    Re: Constant Memory latency

    ... For all threads of a half warp, reading from the constant cache is as fast as reading from a register as long as all threads read the same address. ...
    Thanks for the link, but that sounds like it could be what's causing my problem. Each work-item is (intentionally) accessing these tables randomly, so I'll be cache missing. Sounds like it's worth an experiment moving things to local memory.

Similar Threads

  1. const variable / memory latency
    By Frizz in forum OpenCL
    Replies: 2
    Last Post: 12-23-2010, 04:16 PM
  2. Constant memory
    By Adam Simpson in forum OpenCL
    Replies: 1
    Last Post: 12-07-2009, 12:32 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •