PDA

View Full Version : Local memory allocation



pplaszew
06-13-2010, 03:41 AM
Hi everyone,
I've read somewhere (some forum I cannot recall right now) that allocating local ("shared" in nvidia cuda nomenclature) memory statically like below should be avoided since it's implementation dependend:

__local float s_elData[32];
The dynamic allocation using kernel args and clSetKernelArg should be used instead:

__kernel void kernelP1(
__local float* s_elData,
//...
and (in host code):

clSetKernelArg(kernel, 1, 32 * sizeof(float), NULL);

Unfortunately when I'm using the latter method my register usage increases from 14 to 19 - no other change in code, just the way of allocation. So I rather stick to the former - static - method of allocation - is it safe or really should be avoided?
Thanks!

matrem
06-14-2010, 01:30 AM
Where do you define this variable?
At program scope I guess?

pplaszew
06-14-2010, 04:32 AM
Where do you define this variable?
At program scope I guess?

You mean when allocating statically? In kernel. Like this:


__kernel void K(
//.. kernel args
){
//definition of s_el
__local float s_el[32];
//.. download data from global to s_el, make computations in parallel, store results from s_el back to global
}


So kernel scope. I need it only to download some data from global memory to it and then perform a lot of computations in the kernel and store the results back to global memory.
It's working (on nvidia opencl implementation) and the reg consumption is lower then if I allocated dynamically with kernel arguments and clSetKernelArg (The s_el array is always constant size so I don't need dynamic allocation). Is this way of defining variables in local mem all right?

matrem
06-14-2010, 07:04 AM
I guess it's right.
Perhaps it's a bug in nVidia implementation.

david.garcia
07-13-2010, 03:47 PM
Hi everyone,
I've read somewhere (some forum I cannot recall right now) that allocating local ("shared" in nvidia cuda nomenclature) memory statically like below should be avoided since it's implementation dependend

You are doing things fine. Declaring a local variable at kernel scope is perfectly legal. See section 6.5.2 of the CL 1.1 spec; there's even an example. Don't worry about that.