PDA

View Full Version : global memory coalescing question



openclnewb
03-30-2010, 01:33 PM
Hi,

If I run this test kernel, where input_data and output_data are pointers to global floats:



float test;
for ( int i = 0 ; i < 1024; i++ )
{
barrier( CLK_GLOBAL_MEM_FENCE);
float f = *(input_data + i);
test = (float)get_local_id(0);
}
barrier( CLK_GLOBAL_MEM_FENCE);
*(output_data + get_local_id(0)) = test;


then openclprof tells me that my global memory accesses are coalesced.

But if I run this kernel:



float test;
for ( int i = 0 ; i < 1024; i++ )
{
barrier( CLK_GLOBAL_MEM_FENCE);
float f = *(input_data + i);
//test = (float)get_local_id(0);
test = f;
}
barrier( CLK_GLOBAL_MEM_FENCE);
*(output_data + get_local_id(0)) = test;


then my global accesses aren't coalesced. This has me stumped -- I think all my threads are reading the same input_data addresses at the same time, and all my threads are writing data to adjacent output_data addresses at the same time too. I think my addresses are aligned correctly in both cases as well. So why does making that assignment to test inside the loop make my accesses uncoalesced?

What am I missing here?

openclnewb
03-30-2010, 01:35 PM
Whooops: I forgot to add, I'm running this code on a compute capability 1.1 board.

openclnewb
03-30-2010, 02:04 PM
Hmmm, if I change the code to:



float test;
for ( int i = 0 ; i < 1024; i++ )
{
barrier( CLK_GLOBAL_MEM_FENCE);
float f = *(input_data + get_local_id(0)); // indexing off tid instead of loop counter
//test = (float)get_local_id(0);
test = f;
}
barrier( CLK_GLOBAL_MEM_FENCE);
*(output_data + get_local_id(0)) = test;


then the memory accesses are coalesced. I had thought if all threads accessed the same address then it was a special case of coalesced access, but openclprof tells me I'm wrong. Maybe what happened in my "coalesced code" version was the compiler saw I wasn't using the input_data and so didn't do the access?

dbs2
03-31-2010, 10:47 AM
Maybe what happened in my "coalesced code" version was the compiler saw I wasn't using the input_data and so didn't do the access?

That was my guess when I saw the example.