Results 1 to 2 of 2

Thread: Array dimension on Kernel - AMD Histogram OpenCl example

  1. #1
    Junior Member
    Join Date
    Nov 2012
    Posts
    3

    Array dimension on Kernel - AMD Histogram OpenCl example

    First of all I want to present myself, I am from Chile and this is my first post on the forum. I am Ph.D. in Physics and some time ago I started this adventure on OpenCL.

    The question is the following. I am studying an example of a Histogram from AMD webpage ( http://developer.amd.com/tools/heteroge ... les-demos/ ) and I have an "inner conflict" with the dimensions of the data array on the kernel, the code is the following:

    #define LINEAR_MEM_ACCESS
    #pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable

    #define BIN_SIZE 256

    /**
    * @brief Calculates block-histogram bin whose bin size is 256
    * @param data input data pointer
    * @param sharedArray shared array for thread-histogram bins
    * @param binResult block-histogram array
    */
    __kernel
    void histogram256(__global const uint* data,
    __local uchar* sharedArray,
    __global uint* binResult)
    {
    size_t localId = get_local_id(0);
    size_t globalId = get_global_id(0);
    size_t groupId = get_group_id(0);
    size_t groupSize = get_local_size(0);

    /* initialize shared array to zero */
    for(int i = 0; i < BIN_SIZE; ++i)
    sharedArray[localId * BIN_SIZE + i] = 0;

    barrier(CLK_LOCAL_MEM_FENCE);

    /* calculate thread-histograms */
    for(int i = 0; i < BIN_SIZE; ++i)
    {
    #ifdef LINEAR_MEM_ACCESS
    uint value = data[groupId * groupSize * BIN_SIZE + i * groupSize + localId];
    #else
    uint value = data[globalId * BIN_SIZE + i];
    #endif // LINEAR_MEM_ACCESS
    sharedArray[localId * BIN_SIZE + value]++;
    }

    barrier(CLK_LOCAL_MEM_FENCE);

    /* merge all thread-histograms into block-histogram */
    for(int i = 0; i < BIN_SIZE / groupSize; ++i)
    {
    uint binCount = 0;
    for(int j = 0; j < groupSize; ++j)
    binCount += sharedArray[j * BIN_SIZE + i * groupSize + localId];

    binResult[groupId * BIN_SIZE + i * groupSize + localId] = binCount;
    }
    }

    In the "/*calculate thread-histograms*/" part each element of the data array is called in the following way:

    data[groupId * groupSize * BIN_SIZE + i * groupSize + localId]

    Now comes my conflict. The dimensions of the data array is 1024*1024=1048576, the number of workgroups is 4096 and the local size is 128 (as defined by the host program). If one calculates the index for the "last thread" of groupId=4095 and localId=127 one would have an index of 4095*128*256+i*128+127=134185087+i*128, which clearly exceeds the dimension of the data array. So how does the program handle this issue?

    Well, a last question, what does LINEAR_MEM_ACCESS mean?, I have been trying to find the answer without luck... thanks for your help!

  2. #2
    Junior Member
    Join Date
    Nov 2012
    Posts
    3

    Re: Array dimension on Kernel - AMD Histogram OpenCl example

    OK, I realized that I had a conceptual error and now I understand the dimensions in the problem. The global number of Threads is 4096 and the group size is 128, therefore there are 32 work groups... now everything fits...

Similar Threads

  1. Replies: 2
    Last Post: 06-24-2012, 07:21 AM
  2. Replies: 1
    Last Post: 05-03-2011, 05:08 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •