Results 1 to 4 of 4

Thread: global memory coalescing question

  1. #1
    Junior Member
    Join Date
    Feb 2010
    Posts
    10

    global memory coalescing question

    Hi,

    If I run this test kernel, where input_data and output_data are pointers to global floats:

    Code :
    float test;
    for ( int i = 0 ; i < 1024; i++ )
    {
      barrier( CLK_GLOBAL_MEM_FENCE);
      float f = *(input_data + i);
      test = (float)get_local_id(0);
    }
    barrier( CLK_GLOBAL_MEM_FENCE);
    *(output_data + get_local_id(0)) = test;

    then openclprof tells me that my global memory accesses are coalesced.

    But if I run this kernel:

    Code :
    float test;
    for ( int i = 0 ; i < 1024; i++ )
    {
      barrier( CLK_GLOBAL_MEM_FENCE);
      float f = *(input_data + i);
      //test = (float)get_local_id(0);
      test = f;
    }
    barrier( CLK_GLOBAL_MEM_FENCE);
    *(output_data + get_local_id(0)) = test;

    then my global accesses aren't coalesced. This has me stumped -- I think all my threads are reading the same input_data addresses at the same time, and all my threads are writing data to adjacent output_data addresses at the same time too. I think my addresses are aligned correctly in both cases as well. So why does making that assignment to test inside the loop make my accesses uncoalesced?

    What am I missing here?

  2. #2
    Junior Member
    Join Date
    Feb 2010
    Posts
    10

    Re: global memory coalescing question

    Whooops: I forgot to add, I'm running this code on a compute capability 1.1 board.

  3. #3
    Junior Member
    Join Date
    Feb 2010
    Posts
    10

    Re: global memory coalescing question

    Hmmm, if I change the code to:

    Code :
    float test;
    for ( int i = 0 ; i < 1024; i++ )
    {
      barrier( CLK_GLOBAL_MEM_FENCE);
      float f = *(input_data + get_local_id(0)); // indexing off tid instead of loop counter
      //test = (float)get_local_id(0);
      test = f;
    }
    barrier( CLK_GLOBAL_MEM_FENCE);
    *(output_data + get_local_id(0)) = test;

    then the memory accesses are coalesced. I had thought if all threads accessed the same address then it was a special case of coalesced access, but openclprof tells me I'm wrong. Maybe what happened in my "coalesced code" version was the compiler saw I wasn't using the input_data and so didn't do the access?

  4. #4
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: global memory coalescing question

    Quote Originally Posted by openclnewb
    Maybe what happened in my "coalesced code" version was the compiler saw I wasn't using the input_data and so didn't do the access?
    That was my guess when I saw the example.

Similar Threads

  1. Replies: 6
    Last Post: 02-28-2013, 04:59 PM
  2. memory coalescing
    By nagar781 in forum OpenCL
    Replies: 2
    Last Post: 02-09-2013, 08:18 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •