I met the same problem on OS X 10.8.4 and Geforce 320M or Radeon HD 6970M:(
As a workaround, I replaced an argument which receives a value by one which receives reference(that is a buffer object).
I could get member values correctly. But if it contains an array as its member, I couldn't get each component of the array.
I know this thread is no longer new, I post this report:)
There is still the problem.
I'm having the same problem on a retina macbook pro with Geforce GT650M running 10.8.5. Passing in a struct with 4 float4s freezes up my machine. On an older macbook with an ATI 6750M it doesn't crash, but the values passed in aren't correct.
You cannot pass structs as kernel arguments. There is nothing in the spec that says you can. You can only pass basic types, vector types, and mem_objects. To pass structs you need to upload them as buffers, or as another posted suggested, use a vector type (I've used float16 to pass 16 float parameters in).
There's nothing in the spec that says you can't. Actually major implementations of OpenCL on Windows (Intel, NVIDIA, AMD) can pass structure arguments to a kernel, so this looks like a bug in Apple implementation.
Passing structure arguments is covered by §5.7.2 in OpenCL 1.2 spec "Setting Kernel Arguments" as "other kernel arguments" type. arg_value should be a pointer to the structure data and arg_size the size of the structure.
Technically, passing a structure as argument is nothing more than copying its content into a __global or __constant buffer and transparently passing this buffer to the kernel. So if an implementation is able to pass a mem buffer containing an array of structures to a kernel, it can also pass a structure as value to a kernel.
I can confirm that structs still do not work with Apple OpenCL. If I pass a struct with 4 ints, the kernel receives the first 2 ints correctly but the other 2 members are zero. This is with the Iris Pro GPU on OSX 10.9.
Typically my kernels have between 10 and 20 int size parameters and it's very convent to be able to share the same structure type between CPU and GPU. So this bug really sucks. As a workaround I'm now looking into casting my struct into a cl_ulong16. This has enough space and seems to work even with Apple but it's terribly ugly.
I'm wondering though what is considered the best way to pass kernel parameters of this size? From what I understand kernel parameters map into __private space which is scarce. And __private maps into registers which are per-thread while the kernel parameters are identical and const for all threads in a work group. Or does the compiler recognise this and use shared memory?
Or is uploading my struct to __constant memory a better solution? But this requires an enqueueWriteBuffer for each kernel execution plus waiting for upload completion event, which seems inefficient if it's just a 100 bytes.