dukeleto

02-25-2011, 06:18 AM

Hello,

I am having problems understanding what a simple 1D kernel

is doing with local memory.

The kernel does nothing useful at the moment, I simplified it for

this post, but nevertheless exhibits what seems to me strange

behaviour.

Kernel code:

#define STENCILSIZE(2)

#define x11_size_x (20)

__kernel void fluxes_x11_fast3(

__global double* rhoe_in,

__global double* rhoe_out )

{

int ind, num_point;

// 1D version

int main_ref = get_global_id(0);

//Identification of workgroup

int i = get_group_id(0);

//Identification of work item inside workgroup

int idX = get_local_id(0);

int sizeX = get_local_size(0);

__local double lrhoe[x11_size_x+2*STENCILSIZE];

//Copy submatrixes to local memory. One element copied per work item

lrhoe[idX+STENCILSIZE] = rhoe_in[main_ref];

// "Edge" effects on the left

if(idX == 0 && i >= 1) {

for (ind = 0; ind <= STENCILSIZE-1; ind++ ) {

lrhoe[idX+ind] = rhoe_in[main_ref-STENCILSIZE+ind];

}

}

// "Edge" effects on the right

if(idX == x11_size_x-1 && i <= (int)get_num_groups(0)-2 ) {

for (ind=1; ind<=STENCILSIZE; ind++){

lrhoe[idX+STENCILSIZE+ind] = rhoe_in[main_ref+ind];

}

}

// synchronise all the local memory

barrier(CLK_LOCAL_MEM_FENCE);

// main computation bit

if ( main_ref >= STENCILSIZE && main_ref <=DATA_SIZE_X-1-STENCILSIZE ) {

rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];

}

}

The kernel has one input array, and one output array.

All I'm trying to do for the moment is pull the input array into local

memory, while taking "edge" effects into account (i.e. for each point

in the input array, the local array would have neighbours of this point on

each side).

In the output array, I insert a value from the local array.

What I don't understand is the values I get from the local array, see last line of

actual code:

- if I take lrhoe[idX+STENCILSIZE], I get the desired result, i.e. the input value.

- if I take lrhoe[idX+STENCILSIZE-1], I expect the input array shifted by one to the

right, but instead I get the input array shifted by 6.

Can anyone explain this behaviour, or suggest what I might be doing wrong?

I can also post host code if this is deemed useful.

Many thanks!

Olivier

I am having problems understanding what a simple 1D kernel

is doing with local memory.

The kernel does nothing useful at the moment, I simplified it for

this post, but nevertheless exhibits what seems to me strange

behaviour.

Kernel code:

#define STENCILSIZE(2)

#define x11_size_x (20)

__kernel void fluxes_x11_fast3(

__global double* rhoe_in,

__global double* rhoe_out )

{

int ind, num_point;

// 1D version

int main_ref = get_global_id(0);

//Identification of workgroup

int i = get_group_id(0);

//Identification of work item inside workgroup

int idX = get_local_id(0);

int sizeX = get_local_size(0);

__local double lrhoe[x11_size_x+2*STENCILSIZE];

//Copy submatrixes to local memory. One element copied per work item

lrhoe[idX+STENCILSIZE] = rhoe_in[main_ref];

// "Edge" effects on the left

if(idX == 0 && i >= 1) {

for (ind = 0; ind <= STENCILSIZE-1; ind++ ) {

lrhoe[idX+ind] = rhoe_in[main_ref-STENCILSIZE+ind];

}

}

// "Edge" effects on the right

if(idX == x11_size_x-1 && i <= (int)get_num_groups(0)-2 ) {

for (ind=1; ind<=STENCILSIZE; ind++){

lrhoe[idX+STENCILSIZE+ind] = rhoe_in[main_ref+ind];

}

}

// synchronise all the local memory

barrier(CLK_LOCAL_MEM_FENCE);

// main computation bit

if ( main_ref >= STENCILSIZE && main_ref <=DATA_SIZE_X-1-STENCILSIZE ) {

rhoe_out[main_ref] = lrhoe[idX+STENCILSIZE];

}

}

The kernel has one input array, and one output array.

All I'm trying to do for the moment is pull the input array into local

memory, while taking "edge" effects into account (i.e. for each point

in the input array, the local array would have neighbours of this point on

each side).

In the output array, I insert a value from the local array.

What I don't understand is the values I get from the local array, see last line of

actual code:

- if I take lrhoe[idX+STENCILSIZE], I get the desired result, i.e. the input value.

- if I take lrhoe[idX+STENCILSIZE-1], I expect the input array shifted by one to the

right, but instead I get the input array shifted by 6.

Can anyone explain this behaviour, or suggest what I might be doing wrong?

I can also post host code if this is deemed useful.

Many thanks!

Olivier