The OpenCL 1.0 does not allow to pass a number, e.g. __constant int val, to kernel from the constant memory. Only pointers to constant memory space may be passed. As a result, one has to pass the number as private (__private const int val). This clearly increases the register pressure. I wonder why this is the case.
In NVIDIA's implementation, constant memory is cached and is as fast as local registers if there is no cache miss.