Results 1 to 3 of 3

Thread: Multiple access to global memory

  1. #1
    Junior Member
    Join Date
    Aug 2012
    Posts
    5

    Multiple access to global memory

    Dear all,

    In my code, the several threads need to read from the global memory a lot of variables with the same address. Unfortunately, the size of the varibles is too large in order to fit them all in local memory. As a consequence, reading these variables takes 80% of the time, even if it represents only less than 5% of the instructions.
    Can anyone suggest a way to speed up the access to these shared variables?

    (my procedure is somehow similar to the multiplication of two matrices)

    Thank you

  2. #2

    Re: Multiple access to global memory

    Optimization is very specific to the hardware you're targeting, and also to the problem. Without much more detail you're only going to get vague answers. Some of the things that are generally a good idea on a GPU when accessing global memory:
    • Ensure that memory accesses are coalesced. That means that each thread should access memory that immediately follows that of the previous thread.[/*:m:19866tvl]
    • If the GPU doesn't have an L1 cache (e.g. NVIDIA prior to Fermi), copy a chunk of data into shared memory and then work on it before loading another chunk. This is particularly useful if the memory is reused, as occurs in matrix multiplication.[/*:m:19866tvl]
    • Put the data in an image and access it through a sampler. [/*:m:19866tvl]

    Depending on how similar your operation is to matrix multiply, try reading some of the papers on it e.g. Google for Volkov matrix multiply or Nakasato matrix multiply.

  3. #3
    Senior Member
    Join Date
    Dec 2011
    Posts
    154

    Re: Multiple access to global memory

    bmerry is correct, this will take some work.

    First (and it seems you've done this), code it to use global memory, to work out the algorithm.

    Then, figure out how to use shared memory, up to it's limited size.

    If you can't fit everything you need, figure out some subset that will be useful.

    There are great examples of using shared memory for array multiplies, find them and study them, to figure out how to make best use of shared memory.

Similar Threads

  1. concurrent access to global memory
    By xMate23 in forum OpenCL
    Replies: 2
    Last Post: 10-24-2012, 05:48 AM
  2. Global memory access
    By Rui in forum OpenCL
    Replies: 1
    Last Post: 03-23-2010, 12:18 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •