Results 1 to 4 of 4

Thread: OpenCL Binary | Effective Memory Usage

  1. #1

    OpenCL Binary | Effective Memory Usage

    Hi Community!

    Right know I try to handle a problem, and my google-Research did not work out. I do some calculations of Voronoi-Cells (nearest neighbor-problem). Therefore, I write into my Result array, which is quite big (Number of Points > 250.000).

    I only want to store True/False to keep the information of closest neighbors. (i.e. Pnt 12 is Neighbour of Pnt 13 => 12/13 = 1)

    I compressed this information to a uint8 linear array (Values 0-255) and I do bit manipulations to store 8 Neighbor Informations.

    Furthermore the array grows with the Gaussian sum:


    Matrix looks like:

    0 1 2 3

    0

    1 0

    2 0 1

    3 0 1 0





    Source:

    nachbar = zeros(array_size, dtype=uint

    nachbar_buf = cl.Buffer(ctx, mf.READ_WRITE | mf.USE_HOST_PTR, hostbuf=nachbar)



    int index = (cur_compare_pnt-1)*cur_compare_pnt/2 + cur_pnt;

    int main_index = index / 8;

    int sub_index = index % 8;



    nachbar_buf[main_index] |= 1<<sub_index;




    I am not sure, if this was understandable… I hope so. My problem is, that this process is not vectorizable, since I write with different Point Tupels into the same uint8 Entry.

    My question is: Is there a way to use a bool-linear- array in OpenCL? Or is there a better way at all?

  2. #2
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: OpenCL Binary | Effective Memory Usage

    Is it even worth compressing? 250k is nothing on modern hardware. Of course, if this grouping makes the next stage of processing that much more efficient that is another matter.

    But you have some options:
    - do a full bytes worth in each thread. less parallelism but with a big enough problem it should still be ok
    - use atomic_or to merge results from each group of N threads
    - use a reduction algorithm to perform the bit merging without atomics

    For atomics you probably want to use local memory as a staging area, as with the reduction algorithm. The reduction algorithm might be better as atomics can be slow - although most hardware has specialised units which make certain operations fast.

  3. #3

    Re: OpenCL Binary | Effective Memory Usage

    Hi notzed,

    thanks for your answer. The Problem is not the 250k Points, the problem is that i have to store the nearest neighbor information. The amout of data is given by
    size = 1/2 * N * ( N -1). This is for 250k ca. 31G Connections (so. ca. 31 GB)


    Right now I try to work with semaphors and atomic in generell but somehow I am to unskilled to be successful I simply dont understand the idea of barriers and local groups.

    Does somebody has a nice tutorial for this kind of problems?

  4. #4

    Re: OpenCL Binary | Effective Memory Usage

    Hi notzed,


    "do a full bytes worth in each thread. less parallelism but with a big enough problem it should still be ok"


    I was able to apply this idea to some other code, which creates large matrizes parallely.

    Source:
    __kernel void create_big_ones( __global unsigned char *big_ones)
    {
    int main_index = get_global_id(0);
    for( int sub_index = 0; sub_index < 8; sub_index++)
    {
    int index = sub_index + main_index * 8;



    float id1_temp = sqrt(2 * index + 0.25f) + 0.5f;
    int id1 = (int) id1_temp;
    int id0 = index - (id1-1)*id1/2;

    //printf("%d %d %d\\n",index,main_index,sub_index);
    //printf("Points to check: %d %d %d\\n", id0, id1, index);

    int main_index = index / 8;
    if(id0 == 1)
    {
    big_ones[main_index] |= 1<<sub_index;
    }
    }
    }


    This code work perfectly parallel. But this idea is not appliable for my Nearest Neighbour problem, since I cannot check a specific connection (i.e. Point 12 to Point 14 ? Neighbors). I start the code with a specific point and get a list of Neighbors. So, I do not know which byte will be modified in advance.

    I hope that this is somehow understandable to anouther person

    Ciao and thanks
    Timo

Similar Threads

  1. Probable usage of texture memory
    By sajis997 in forum OpenCL
    Replies: 0
    Last Post: 03-11-2013, 12:46 PM
  2. Compile OpenCL code to binary
    By Rui in forum OpenCL
    Replies: 2
    Last Post: 05-31-2010, 09:46 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •