PDA

View Full Version : passing in array of struct to kernels (and updating VBO)



memo
10-30-2009, 10:16 AM
Hi All,

I've been working on an openCl particle system and have encountered a problem / query. My original example is here
http://www.memo.tv/opencl_in_openframew ... _particles (http://www.memo.tv/opencl_in_openframeworks_example_1_milion_particle s)

in that example I have:
- a struct Particle which contains physics information for a single particle
- an opencl buffer which contains an array of struct Particles
- a VBO for vertex positions, which also doubles as an opencl buffer
- a kernel which reads the Particle buffer, updates particles and writes new positions to the position buffer (which was created from the VBO)
so this way i can update all particles using their mass, velocity etc, and write straight to the VBO for quite render.

This all works perfectly pretty damn fast and I wanted to enhance this example by adding color. First I tried a similar setup:
- a struct Particle which contains physics information for single particle
- an opencl buffer which contains an array of struct Particles
- a VBO which contains vertex positions and color, interleaved (i.e. struct ParticleRenderData { float2 pos; float4 col } ) and this again doubles as an opencl buffer.
So I setup my glVertexPointer and glColorPointer to point to the same buffer, with stride of sizeof(ParticleRenderData) and gave offset of 8 to the colorPointer. This all worked perfectly.

Finally, I wanted to try having a single buffer to contain everything. This would be easier for code management (I wouldn't have position in two different buffers), and also wanted to see if it would affect performance (caching etc.) So I wrote the code below:

in my OpenCL program:


struct __attribute__ ((packed)) Particle {
float2 pos;
float4 col;
float2 homePos;
float2 vel;
float mass;
};

__kernel void update(__global struct Particle* particles) {
int id = get_global_id(0);
__global struct Particle *p = &particles[id];
// process info and update p->pos; p->col; etc.
}



in my main app I have:




#pragma pack(push, 1)
struct float2 {
float x, y;
};
struct float4 {
float x, y, z, w;
};
struct Particle {
float2 pos;
float4 col;
float2 homePos;
float2 vel;
float mass;
};
#pragma pack(pop)


Particle particles[NUM_PARTICLES]; // contains info required for particles for init
char* posOffset; //number of bytes offset from beginning of Particle to where the position is stored
char* colOffset; //number of bytes offset from beginning of Particle to where the color is stored

glGenBuffersARB(1, vbo);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo[0]);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Particle) * NUM_PARTICLES, particles, GL_DYNAMIC_COPY_ARB);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);

posOffset = (char*) ( (char*)(&particles[0].pos) - (char*)(&particles[0]) );
colOffset = (char*) ( (char*)(&particles[0].col) - (char*)(&particles[0]) );


clFlush(queue);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo[0]);
glVertexPointer(2, GL_FLOAT, sizeof(Particle), posOffset);
glColorPointer(4, GL_FLOAT, sizeof(Particle), colOffset);
glDrawArrays(GL_POINTS, 0, NUM_PARTICLES);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);


I've obviously omitted loads of code above, but that should be enough to explain what i'm trying to do. So I have one big buffer which contains all data for particles interleaved, and when I'm rendering them I set the stride and offsets appropiately. I have debugged those values and they are correct (sizeof(Particle) returns 44, posOffset is 0, colOffset is 8). My problem is, that the rendered output is wrong. It's hard to say exactly what is wrong but everything is pseudo-randomly sprayed out. e.g. I initialize 100 particles in a 10x10 grid with gradiented colors, and I do not get a grid, I just get pseudo-randomly placed particles with colors that are not from the gradient. I tried manually padding my Particle struct, and saw that every time I add a dummy float to the struct (in both my app declaration and the OpenCL program) I would get a new pseudo-random layout + colors. I found that if I add 3 floats to my struct (bringing the size of the Particle struct to 56 bytes) then my particle grid would appear correct. My question is why!? What is the robust solution to this problem? I also found that if I didn't add the 'packed' attribute to the struct in the opencl program, the app would freeze - clearly memory issues, but that makes sense. I thought by tightly packing the struct in both my host app and the opencl program these problems would be avoided. Now I don't get any freezes, but clearly my data isn't lining up properly. Any tips? or alternative suggestions?

memo
10-30-2009, 10:18 PM
as an update, I noticed that using a float[3] dummy at the end doesn't' work like I thought it did above. I also have to insert a float2 between pos and col, so the following worked:


// DEVICE:
struct __attribute__ ((packed)) Particle {
float2 pos;
float2 dummy;
float4 col;
float2 homePos;
float2 vel;
float mass;
float dummy2[3];
};

// HOST:
#pragma pack(push, 1)

struct float2 {
cl_float x, y;
} ;

struct float4 {
cl_float x, y, z, w;
} ;

struct Particle {
float2 pos;
float2 dummy;
float4 col;
float2 homePos;
float2 vel;
cl_float mass;
float dummy2[3];
};

#pragma pack(pop)



I understand why I need to make this change, col is a float4 thus is aligned to 16 bytes. But what is the best way to deal with this?

I thought I could get rid of the first dummy by swapping pos and col so tried this struct:


struct __attribute__ ((packed)) Particle {
float4 col;
float2 pos;
float2 homePos;
float2 vel;
float mass;
float dummy2[3];
};

but that didn't work either, i had to make dummy2 only 2 floats instead of 4, so this worked:

struct __attribute__ ((packed)) Particle {
float4 col;
float2 pos;
float2 homePos;
float2 vel;
float mass;
float dummy2;
};


So my question is, what is the best way to deal with this situation? How do I make sure the data is packed and aligned the same on both host and device? I thought that's what the #pragma pack on host and __attribute ((packed)) on device was for?
Any alternative approaches to this kind of situation is much appreciated.

Cheers,

Memo.