Results 1 to 5 of 5

Thread: Precomputing array of 16 elements

  1. #1

    Precomputing array of 16 elements

    Hi,

    Before I start, just want to say that (almost) everything works fine so it's just a question about what you think on this particular subject.
    Let's say that I have a 1D buffer of size N that I "cover" with 1D thread blocks of 128 threads (local size). Each thread divides the angle range [0, 2*pi] in 16 sectors. For each thread, I do something like that:
    Code :
    const float sector_size = sector_size = 2.f * M_PI_F / 16;
    for (int i=0; i<16; ++i) {
        float sin_i = sin(i*sector_size);
        float cos_i = cos(i*sector_size);
        (...)
    }
    As you can see, it's pretty straightforward. No need to add more details.
    Then I thought it was pretty stupid to compute sin and cos many times. I can just pre-compute them and put them into a local array that I fill in parallel using the first 16 threads of my group:
    Code :
    __local float sin_array[16];
    __local float cos_array[16];
    if (thread_id<16) {
        const float sector_size = 2.f * M_PI_F / 16;
        sin_array[thread_id] = sin(thread_id * sector_size);
        cos_array[thread_id] = cos(thread_id * sector_size);
    }
    barrier(CLK_LOCAL_MEM_FENCE);
     
    (...)
     
    for (int i=0; i<16; ++i) {
        float sin_i = sin_array[i];
        float cos_i = cos_array[i];
        ...
    }
    This works fine but it doesn't speed-up a thing. I'm used to be surprised in GPU coding. I guess here the barrier cancel the benefit of precomputing the array value.
    Then I thought "why don't I initilialize the array by hand?". So I tried the following approach :

    Code :
    __local float sin_array[16] = { 0.000000f,  0.382683f,  0.707107f,  0.923880f,
                                    1.000000f,  0.923880f,  0.707107f,  0.382683f,
                                    0.000000f, -0.382683f, -0.707107f, -0.923880f,
                                   -1.000000f, -0.923880f, -0.707107f, -0.382683f};
     
    __local float cos_array[16] = { 1.000000f,  0.923880f,  0.707107f,  0.382683f,
                                    0.000000f, -0.382683f, -0.707107f, -0.923880f,
                                   -1.000000f, -0.923880f, -0.707107f, -0.382683f,
                                   -0.000000f,  0.382683f,  0.707107f,  0.923880f};
     
    (...)
     
    for (int i=0; i<16; ++i) {
        float sin_i = sin_array[i];
        float cos_i = cos_array[i];
        ...
    }
    We don't have a barrier here so it should be faster no? Problem : sin_array and cos_array are not filled correctly. So this is my main question: Why?

    The second question, if this one is solved, is: is it better to let theses arrays in the
    __local memory or should I put it in the __constant memory (for instance before the function definition)?

    Many thank,

    Vincent

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    109

    Re: Precomputing array of 16 elements

    As stated in the OpenCL specification, variables allocated in the __local address space inside a kernel function cannot be initialized.

    So, your code should not even compile (it does not with both NVIDIA and Intel OpenCL compilers on my computer).

    However, this seems to be indeed the perfect case for using __constant memory.

  3. #3

    Re: Precomputing array of 16 elements

    Thank you for your answer. The code did compile though. Strange.
    Anyway, I tried the "__constant" version and it doesn't speed-up anything.
    It's like the more I code in OpenCL the less I understand...

  4. #4
    Senior Member
    Join Date
    Oct 2012
    Posts
    109

    Re: Precomputing array of 16 elements

    The compiler generally unrolls the loop, so it detects that sin() and cos() are computed on now constant values, and it probably caches the result into... a constant array.

  5. #5

    Re: Precomputing array of 16 elements

    That's my guess too. Compilers are too smart nowadays

Similar Threads

  1. Summing up all elements of a buffer
    By jam383 in forum OpenCL
    Replies: 5
    Last Post: 05-01-2012, 08:08 PM
  2. accessing elements of cl::Buffer?
    By qwer in forum OpenCL
    Replies: 1
    Last Post: 01-31-2011, 11:24 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •