Results 1 to 5 of 5

Thread: Maximum array index limited to 16 bits ?

  1. #1

    Maximum array index limited to 16 bits ?

    I'm doing a standard sum-of-ints algorithm. The minor twist is that I do this on an 2 dimensional matrix in which I sum each column individually.

    The matrix is laid out in global memory in a column after column fashion.

    __kernel void reduce(__global ushort *g_matrix,__global uint *g_sums,__local uint *s_local)

    I use something like : g_matrix[col*number_of_rows+row] to index the g_matrix.

    If I run my algorithm with a small input matrix of say 32*32 ushorts then it works just fine : I get an array with 32 sums, one for each column. If I scale things up though, e.g. 512*512 ushorts, then I seem to get a wrap-around after column 127. Now is 128 columns * 512 rows per column of course equal to 64 k, and as usual in any computing environment strange things happen if you cross the 16 bit boundary (and usually 32 bit as well)

    I've tried to find a limit in the OpenCL definition which would explain this behavior but couldn't find anything I considered related. What did I overlook ?

    Thanks in advance

  2. #2
    Junior Member
    Join Date
    Jul 2011
    Posts
    5

    Re: Maximum array index limited to 16 bits ?

    What is your local group size?
    How do you calculate your local memory index?

  3. #3

    Re: Maximum array index limited to 16 bits ?

    My global workgroup size is 256*512 and my local workgroup size is 256*2. The problem doesn't seem to be in indexing the local memory, but in global memory. When I try to index the g_matrix with anything over 65536 it seems to wrap around to reading at index 0 again. I've included the kernel code below.

    Code :
    __kernel void reduce(__global ushort *g_matrix,__global uint *g_line,__local uint *s_column)
    {
      // g_matrix  : 0..511,0..511
      // g_line    : 0..511
      // s_column  : 0..255,0..1
     
      size_t g_row=get_global_id(0);                  // 0..255
      size_t g_row_work_size=get_global_size(0);      // 256
      size_t g_column=get_global_id(1);               // 0..511
      size_t g_column_work_size=get_global_size(1);   // 512
     
      size_t l_row=get_local_id(0);                   // 0..255
      size_t l_row_work_size=get_local_size(0);       // 256
      size_t l_column=get_local_id(1);                // 0..1
      size_t l_column_work_size=get_local_size(1);    // 2
     
      ushort offset=l_row_work_size;                  // 256,128,64,32,16,8,4,2,1,0
     
      // copy current columns from global to shared memory for faster access and
      // do the first addition within a column in a single stroke
      s_column[l_column*l_row_work_size+l_row]=
          g_matrix[g_column*g_row_work_size*2+g_row]+
          g_matrix[g_column*g_row_work_size*2+g_row+offset];
     
      offset>>=1;
     
      barrier(CLK_LOCAL_MEM_FENCE);
     
      // do the remaining additions in the columns in shared memory
      while(offset>0)
      {
        if(l_row<offset)
        {
          s_column[l_column*l_row_work_size+l_row]+=
              s_column[l_column*l_row_work_size+l_row+offset];
        }
     
        offset>>=1;
     
        barrier(CLK_LOCAL_MEM_FENCE);
      }
     
      // copy sum to global result memory
      if(l_row==0)
      {
        g_line[g_column]=s_column[l_column*l_row_work_size];
      }
    }

    Everything works fine for the first 128 columns, but after that things seem to wrap back again to column 0 : the results are identical although the matrix data at column 128 and beyond is different from that at column 0 (for testing I'm providing data which starts with 0 at (0,0) and increments by 1 for each next cell, row first).

  4. #4
    Junior Member
    Join Date
    Jul 2011
    Posts
    5

    Re: Maximum array index limited to 16 bits ?

    // copy current columns from global to shared memory for faster access and
    // do the first addition within a column in a single stroke
    s_column[l_column*l_row_work_size+l_row]=
    g_matrix[g_column*g_row_work_size*2+g_row]+
    g_matrix[g_column*g_row_work_size*2+g_row+offset];
    Are you sure you have to multiply by 2 here? With buffer size 256x512 (columns by rows I think) jump out of memory after 128.

  5. #5

    Re: Maximum array index limited to 16 bits ?

    Hi,

    g_matrix is not 256*512, but 512*512 in size, but I'm throwing 256 threads at a time at each column of it, each of which copies two elements from global memory, 256 cells apart, adds the two values and stores the result in local memory.

    So g_row_work_size is half the size of a column, and therefore I have to multiply by two to get the column size.

    You can find the sizes of the arguments right after the kernel function definition.

Similar Threads

  1. Maximum size for index buffer and how to go around it
    By dcantor in forum Developers Coding:Beginner
    Replies: 3
    Last Post: 08-11-2010, 12:40 AM
  2. Constraints of limited RAM on GPGPUs
    By sine in forum OpenCL
    Replies: 2
    Last Post: 06-21-2010, 01:59 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •