Results 1 to 3 of 3

Thread: Array size at runtime

  1. #1
    Junior Member
    Join Date
    May 2010
    Posts
    5

    Array size at runtime

    I have a question, probably a simple one. I need to create a "work array" that is only needed during the execution of one "work item". My understanding is that I can simply declare it as a sort of local variable inside of the kernel, for example:

    float temp[1024];

    This works just fine, so long as the array size is a constant. But as soon as I set it to something, such as...

    float temp[bufferSize];

    I start to get "out of resource" errors raised when I try to read the results back from the buffers.

    My question, is what is the best way to create a small temp buffer? Does it need to be a param to the kernel? I've gotten it to work this way, but then I have to create a very large buffer to hold space for all of my work items.

  2. #2
    Junior Member
    Join Date
    May 2010
    Posts
    5

    Re: Array size at runtime

    Okay, I think I found the answer to my own question. When I run this on an AMD, I actually get a compile error. (NVidia did not generate a compile error, just would not work properly).

    Line 81: error: expression must
    have a constant value
    float test[inputSize];

    However, I still wonder. If I need to create a small buffer. That will be the same size for each "work item", yet could change per run of the program, how is it best to create such a temp buffer?

  3. #3
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Array size at runtime

    If you know that the maximum size of that buffer is reasonably small, you could always allocate that maximum.

    Other than that, you could compile multiple versions of the program with different maximum sizes for that array (you can use a #define for that). Since the array is private to each work-item, performance will suffer the larger is that buffer.

    Alternatively, you could use local memory and manually make each work-item index into that local memory so that there's no overlap between different work-items. Local memory is not as fast as private memory (the methods above) but it saves you the work of recompiling the program. It would look something like this:

    Code :
    __kernel void foo(uint buffer_elements_per_work_item, __local float* buffer_start)
    {
        __local float* work_item_private_buffer = buffer_start + buffer_elements_per_work_item * get_local_id(0);
     
        // From here on you can use work_item_private_buffer as if it was declared as this:
        // float work_item_private_buffer[buffer_size_per_work_item];
    }
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Replies: 6
    Last Post: 04-11-2012, 07:18 PM
  2. How to get the warp/wavefront size in runtime?
    By yoavhacohen in forum OpenCL
    Replies: 2
    Last Post: 02-01-2012, 04:21 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •