PDA

View Full Version : passing constant value to kernel



duanmu
10-26-2009, 09:53 AM
The OpenCL 1.0 does not allow to pass a number, e.g. __constant int val, to kernel from the constant memory. Only pointers to constant memory space may be passed. As a result, one has to pass the number as private (__private const int val). This clearly increases the register pressure. I wonder why this is the case.

In NVIDIA's implementation, constant memory is cached and is as fast as local registers if there is no cache miss.

dbs2
10-26-2009, 11:43 AM
I don't quite follow. Are you talking about passing values in to kernels via clSetKernelArgs or calling functions from within a kernel? If it's the former I'm a bit confused because you can't pass in private values because the private space is per-work-item, not per kernel launch. Constant values can be accessed by all work-items in the kernel launch, but private values are unique to one work-item and can not be shared. If it's the latter case, then you should be able to pass a private variable between function calls within a single work-item.

duanmu
10-26-2009, 01:26 PM
Suppose we have a kernel like this


__kernel void foo(const int val, ...)
{ ... }

In this example, the same value of "val" is passed from the host program to each work item. But it is located in the private space. In other words, each work item has a register holding the same "val". This is a waste of private space. It would be nice to change "const int val" to "__constant int val" which is not allowed by OpenCL 1.0.

jbasic
10-26-2009, 01:46 PM
__constant variables are allocated in global memory, and the constant memory space is cached. Accessing private memory is faster and should remain as only option where to store function arguments (as they are top priority of things you want to access from inside functions).

duanmu
10-26-2009, 03:34 PM
__constant variables are allocated in global memory, and the constant memory space is cached. Accessing private memory is faster and should remain as only option where to store function arguments (as they are top priority of things you want to access from inside functions).

I understand your argument. However, at least in NVIDIA's implementation, constant memory is as fast as private memory as long as it is cached. So I wonder why the standard did not allow it since it does not hurt.