PDA

View Full Version : memory alignment for struct members



fangq
04-11-2011, 03:57 PM
I had some troubles to use a structure in the constant memory to pass kernel parameters. I posted my code at nvidia's forum at

http://forums.nvidia.com/index.php?showtopic=197734

but so far nobody has replied me. I did quite a bit search in the meantime, including reading the specification, but unable to find an example for how to use a struct in a kernel.

Can someone take a look at my example and tell me what I should do in this case? I tried replacing all member types with cf_float4/cf_float etc in my host code, but it does not work either on an nvidia card :(

You comments on this is very much appreciated.

david.garcia
04-11-2011, 04:38 PM
I don't see anything wrong with your code and you already said that the number of constant kernel arguments is 4 so that's not an issue either.

Between that and the fact that it works on ATI, it looks quite clearly like a bug in NVidia's OpenCL drivers.

I'm sorry I don't have any advice on how to work around the issue. You could try commenting out some of the struct fields and see if at some point the problem goes away.

fangq
04-11-2011, 07:23 PM
I don't see anything wrong with your code and you already said that the number of constant kernel arguments is 4 so that's not an issue either.

Between that and the fact that it works on ATI, it looks quite clearly like a bug in NVidia's OpenCL drivers.

I'm sorry I don't have any advice on how to work around the issue. You could try commenting out some of the struct fields and see if at some point the problem goes away.

great, thank you for confirming on this. I feel a lot better now :)

fangq
04-12-2011, 12:46 PM
I don't see anything wrong with your code and you already said that the number of constant kernel arguments is 4 so that's not an issue either.

Between that and the fact that it works on ATI, it looks quite clearly like a bug in NVidia's OpenCL drivers.

I'm sorry I don't have any advice on how to work around the issue. You could try commenting out some of the struct fields and see if at some point the problem goes away.


hi David

I printed sizeof(KParam) inside the host and device and found the two sizes are different for the code I posted at nvidia's forum: for the host code, it is 180, for cl kernel, it is 192. I prepended all type names by cl_ for the host definition, and now their sizes are the same.

In your opinion, if I don't prepend cl_ in the types, will there be misalignment when passing the 180-byte host struct to the 192-byte device struct? where the paddings happen? are they at the very end of the struct or can be in between two elements?

I also found out that the segfault error may not solely be caused by the constant parameter, but by some bugs in the nvidia's compiler in handling nested if-statements. I am still investigating on this.

david.garcia
04-12-2011, 02:00 PM
In your opinion, if I don't prepend cl_ in the types, will there be misalignment when passing the 180-byte host struct to the 192-byte device struct? where the paddings happen? are they at the very end of the struct or can be in between two elements?

Ah, I missed that. Yes, you must always use cl_xxx types on the API side as they are guaranteed to match the size of their cousins in OpenCL C (except for size_t and bool). For example, cl_long in the API side is equivalent to ulong on OpenCL C (a 64-bit signed integer).

Padding can happen either between struct members or at the end of the struct. This comes from C99 actually.