I have the following situation:

Two threads handling two OpenCL devices which share the same context. Each thread loads a different version of the OpenCL device code, creates a cl::Programm instance and compiles the code for his specific cl::Device. However, the createKernels function after successfully building the program fails with error code -47 =

CL_INVALID_KERNEL_DEFINITION if the function definition for __kernel function given by kernel_name such as the number of arguments, the argument types are not the same for all devices for which the program executable has been built.

With multiple cl::Context instances (one for each device) this worked well. If I look at the OpenCL class diagram I don't see why is should not be able to use multiple programs with multiple kernels within one context as they are clearly distinguishable via the associated programs.

I'm using the OpenCL implementaton of Nvidia within CUDA SDK 5.5 (on multiple Tesla M2090). The questions that arises for me are:

Is this a general misunderstanding of the OpenCL structure and there is a rule that says that every kernel within a context must have a unique name, or is this one of Nvidia's non OpenCL standard confirm ways of handling this particular use case?

I really want multiple devices within one context to be able to use copy from one cl::Buffer to another even if their memory resides on different devices.