I'm at a loss to why the following doesn't work in OpenCL, as it does work in ordinary C and in CUDA. I want to take a character pointer to global memory, make it a pointer to an unsigned short, take its content, convert it back to a char and then write it back to the content. The following code will give an CL_INVALID_COMMAND_QUEUE error, which I've learned basically means your kernel doesn't work, whereas after uncommenting the commented line it works fine. The latter makes it unclear to me, since I don't see the difference. (I've added the last line only to demonstrate that there's not some kind of problem of self-reference.)

__kernel void test(__global char *test)
{
unsigned long idx = (get_global_id(1)*get_global_size(0) + get_global_id(0));
unsigned short *x;
*x = *(unsigned short*)(test+idx);
//*x = 300;
*(test+idx)=(char) *x;
*(test+idx)=(char) (unsigned short) *(test+idx);
}