Results 1 to 4 of 4

Thread: Regarding memory sections in openCL

  1. #1
    Newbie
    Join Date
    Aug 2013
    Posts
    9

    Regarding memory sections in openCL

    Hi,
    I just wanted to clarify below points related to memory sections in openCL. Please let me know if i am correct or not.
    1) The variables defined inside the kernel are stored in Private memory. Any variable defined other than __local,
    2) if i declare constant variable inside kernel, then it will be stored in private memory
    3) Constant variables are stored in the global memory region
    4) what is the necessity of sending a local variable as kernel parameter as it can be done inside kernel also

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    110
    1) This is the default behavior when there is no address space qualifier.
    2) It will be stored in the global memory (or optimized away by the compiler).
    3) Yes.
    4) This is useful when the size of local buffer is variable and set by the host.

  3. #3
    Newbie
    Join Date
    Aug 2013
    Posts
    9
    Thank you very much for your reply. I wanted one more clarification regarding local memory usage.
    1) Inside the kernel, we have to load the local memory from global memory for each work item and cannot copy in a single shot(like memcpy for all the work items)
    2) I used local memory but couldn't see any improvement in performance. In my case i will reading all the pixels only once. Is it due to this ? as it will be helpfull only from the second round of access when it will be moved to cache during our first access ?

  4. #4
    Senior Member
    Join Date
    Oct 2012
    Posts
    110
    1) You can, but this would be poor coding. Local memory acts as a user-managed cache for the work-items of a work-group. If you want to have the global memory to local memory copy made by one work-item, you'll have to do it in a conditional section such as 'if get_local_id(0)==0', which will cause an execution flow divergence.
    2) It depends on your algorithm. Using local memory is interesting if you can amortize the cost of the global-to-local copy.
    In the classical case of matrix multiplication, the size of data is O(n^2) while the number of operations is O(n^3). Caching data is then meaningful since you can expect an O(n) improvement (n being the width of a block)
    In the case of a vector-to-vector addition for example, it would be pointless to cache data into local memory because data is accessed only once and there is no 'reuse' of the cached data.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •