PDA

View Full Version : problem understanding the behaviour of my kernel!



dukeleto
02-25-2011, 07:18 AM
Hello,

I am having problems understanding what a simple 1D kernel
is doing with local memory.
The kernel does nothing useful at the moment, I simplified it for
this post, but nevertheless exhibits what seems to me strange
behaviour.

Kernel code:


#define STENCILSIZE(2)
#define x11_size_x (20)

__kernel void fluxes_x11_fast3(
__global double* rhoe_in,
__global double* rhoe_out )
{
int ind, num_point;

// 1D version
int main_ref = get_global_id(0);

//Identification of workgroup
int i = get_group_id(0);

//Identification of work item inside workgroup
int idX = get_local_id(0);
int sizeX = get_local_size(0);

__local double lrhoe[x11_size_x+2*STENCILSIZE];

//Copy submatrixes to local memory. One element copied per work item

lrhoe[idX+STENCILSIZE] = rhoe_in[main_ref];

// "Edge" effects on the left
if(idX == 0 && i >= 1) {

for (ind = 0; ind <= STENCILSIZE-1; ind++ ) {
lrhoe[idX+ind] = rhoe_in[main_ref-STENCILSIZE+ind];
}
}


// "Edge" effects on the right
if(idX == x11_size_x-1 && i <= (int)get_num_groups(0)-2 ) {

for (ind=1; ind<=STENCILSIZE; ind++){
lrhoe[idX+STENCILSIZE+ind] = rhoe_in[main_ref+ind];
}
}

// synchronise all the local memory
barrier(CLK_LOCAL_MEM_FENCE);

// main computation bit

if ( main_ref >= STENCILSIZE && main_ref <=DATA_SIZE_X-1-STENCILSIZE ) {

rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];

}

}



The kernel has one input array, and one output array.
All I'm trying to do for the moment is pull the input array into local
memory, while taking "edge" effects into account (i.e. for each point
in the input array, the local array would have neighbours of this point on
each side).
In the output array, I insert a value from the local array.

What I don't understand is the values I get from the local array, see last line of
actual code:
- if I take lrhoe[idX+STENCILSIZE], I get the desired result, i.e. the input value.
- if I take lrhoe[idX+STENCILSIZE-1], I expect the input array shifted by one to the
right, but instead I get the input array shifted by 6.

Can anyone explain this behaviour, or suggest what I might be doing wrong?
I can also post host code if this is deemed useful.

Many thanks!

Olivier

david.garcia
02-25-2011, 04:15 PM
That sounds odd. Also, shouldn't the last line read like this?

rhoe_out[main_ref-STENCILSIZE] = lrhoe[idX+STENCILSIZE];

dukeleto
02-25-2011, 04:58 PM
Hi David, and thanks alot for answering.
The kernel here is a much simplified finite-difference kernel,
which I have got working properly in 1 and 2D using only global memory.
The input and output arrays represent my entire domain, and thus "main_ref"
is the index into the entire domain. The line you asked about,
rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];
represents the computation of a new value rhoe from its neighbouring
values; at the moment it is simply a duplication (i.e. lrhoe[idX+STENCILSIZE] should equal
rhoe_in[main_ref]) but the idea is be to be able to do something like


double rhoe_out_private = lrhoe[idX+STENCILSIZE];
for (index=1;index<=STENCILSIZE;index++) {
rhoe_out_private +=
coef[index] * (lrhoe[idX+STENCILSIZE+index] - lrhoe[+STENCILSIZE-index]);
}
rhoe_out[main_ref] = rhoe_out_private;


So as I see it, the index I want into the out array is main_ref.
I only update if stencilsize<main_ref<global_size-stencilsize due to a different treatment applied at boundaries.
Does that make sense?
Thanks again, I am at a loss!
Olivier

dukeleto
02-26-2011, 08:14 AM
To add to my confusion, the kernel does exactly what I expect it to do, when run
on the cpu with the amd sdk, that's to say it duplicates the input array
with the following line


rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];

and shifts it by one to the right with the following line


rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE-1];


Does this point to a bug in the nvidia implementation, or am I perhaps misusing
some aspect of local memory to which the amd cpu implementation is less
sensitive?
Many thanks,
Olivier

david.garcia
02-26-2011, 10:06 AM
It looks like a possible bug in NVidia's compiler. I suggest sending it to their customer support.

dukeleto
02-26-2011, 02:06 PM
OK, I'll do that and update this thread when I get a response from NVIDIA.
Thanks alot for the help,
Olivier