Results 1 to 8 of 8

Thread: Runtime differences __global, __constant, __local

  1. #1
    Junior Member
    Join Date
    Nov 2010
    Posts
    9

    Runtime differences __global, __constant, __local

    Imagine I have some sort of filter algorithm.
    - in_array has the input data (vectorized, for faster access)
    - out_array gets the result of the filtering
    - filter is the filter itself.

    The kernel would look something like this:
    __kernel void vec_iii_1d(__global float4 *filter, __global float4* in_array, __global float4* out_array)
    {
    ...
    out_array[tid] = in_array[tid] * filter[fid];
    ...
    }

    Questions:
    1) If I change "__global float4 *filter" to "__constant float4 *filter", would the data then be automatically cached in the constant cache + kept there for all subsequent kernel calls (the kernel is called several times) ?

    2) If I change "__global float4 *filter" to "__local float4 *filter" - what will happen then?
    2a) Is the data in global memory first, and then copied automatically to local memory when the kernel is executed?

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Runtime differences __global, __constant, __local

    If I change "__global float4 *filter" to "__constant float4 *filter", would the data then be automatically cached in the constant cache + kept there for all subsequent kernel calls (the kernel is called several times) ?
    The short answer is "probably yes".

    It's up to the OpenCL driver to see if it can keep it there or not between kernel calls. In other words, it depends on the implementation and environmental factors (other kernels running, etc).

    If I change "__global float4 *filter" to "__local float4 *filter" - what will happen then?

    2a) Is the data in global memory first, and then copied automatically to local memory when the kernel is executed?
    These are very good questions.

    I'll start with 2a. No, data is not copied automatically to local memory. Local memory is lost every time a new work-group starts to execute. This means that each work-group is responsible for filling the local memory with useful data before it reads from it. In your case, each work group would copy the filter data from global (or constant) memory to local memory, and only then the filtering operation can start.

    Now back to question 2. I would expect __constant to be faster than __local. Please let me know if you find the opposite is true in some platform
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Nov 2010
    Posts
    9

    Re: Runtime differences __global, __constant, __local

    Thanks David.

    I am surprised that it is possible at all to use __local variables in a kernel function interface, since I've never seen that so far in sample code.

    What they usually do is manually copy data from __global memory to __local memory + use something like barrier(CLK_LOCAL_MEM_FENCE) before they actually use that data.

    Can you tell me why they never use "my" approach?

    cheers,
    F.

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Runtime differences __global, __constant, __local

    I am surprised that it is possible at all to use __local variables in a kernel function interface, since I've never seen that so far in sample code.
    This feature is defined in section 5.7.2 (page 127) and 6.5.2 (page 185).

    What they usually do is manually copy data from __global memory to __local memory + use something like barrier(CLK_LOCAL_MEM_FENCE) before they actually use that data.
    Yes, that is still necessary even if you pass the __local variable as a kernel argument.

    Can you tell me why they never use "my" approach?
    What is your approach?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Nov 2010
    Posts
    9

    Re: Runtime differences __global, __constant, __local

    > Yes, that is still necessary even if you pass the __local variable as a kernel argument.

    Uhh! So would the first function call in my kernel then by a barrier call? Like this here:

    __kernel void vec_iii_1d(__local float4 *filter, __global float4* in_array, __global float4* out_array)
    {
    barrier(CLK_LOCAL_MEM_FENCE);
    ...
    }

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Runtime differences __global, __constant, __local

    No, the first instruction would not be a barrier. A local barrier is needed after the global-to-local copy is done.

    I was replying affirmatively to this:

    What they usually do is manually copy data from __global memory to __local memory + use something like barrier(CLK_LOCAL_MEM_FENCE) before they actually use that data.
    By the way, CL 1.1. introduced some builtin functions, such as async_work_group_copy(), to help with these sort of use cases.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  7. #7
    Senior Member
    Join Date
    Sep 2002
    Location
    Santa Clara
    Posts
    105

    Re: Runtime differences __global, __constant, __local

    David, w.r.t. the following comment posted by you on 7 Jan 2011, 1:21pm

    "Now back to question 2. I would expect __constant to be faster than __local. Please let me know if you find the opposite is true in some platform."

    __local should almost always be faster than __constant if __local really is dedicated local memory (typically some form of SRAM) vs. just mapped as __global memory. I assume you meant "Now back to question 2. I would expect __constant to be faster than __global..." which should be the case for most platforms.

  8. #8
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Runtime differences __global, __constant, __local

    Without going into details, let's say that the performance of the different address spaces will be implementation-dependent and that Frizz should try both
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. __local atomic in opencl
    By orochimaster in forum OpenCL
    Replies: 1
    Last Post: 03-15-2012, 02:32 PM
  2. __constant vs const __global
    By guy.brush in forum OpenCL
    Replies: 4
    Last Post: 05-12-2010, 07:06 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •