Hi All,

I've been working on an openCl particle system and have encountered a problem / query. My original example is here
http://www.memo.tv/opencl_in_openframew ... _particles

in that example I have:
- a struct Particle which contains physics information for a single particle
- an opencl buffer which contains an array of struct Particles
- a VBO for vertex positions, which also doubles as an opencl buffer
- a kernel which reads the Particle buffer, updates particles and writes new positions to the position buffer (which was created from the VBO)
so this way i can update all particles using their mass, velocity etc, and write straight to the VBO for quite render.

This all works perfectly pretty damn fast and I wanted to enhance this example by adding color. First I tried a similar setup:
- a struct Particle which contains physics information for single particle
- an opencl buffer which contains an array of struct Particles
- a VBO which contains vertex positions and color, interleaved (i.e. struct ParticleRenderData { float2 pos; float4 col } ) and this again doubles as an opencl buffer.
So I setup my glVertexPointer and glColorPointer to point to the same buffer, with stride of sizeof(ParticleRenderData) and gave offset of 8 to the colorPointer. This all worked perfectly.

Finally, I wanted to try having a single buffer to contain everything. This would be easier for code management (I wouldn't have position in two different buffers), and also wanted to see if it would affect performance (caching etc.) So I wrote the code below:

in my OpenCL program:
Code :
struct __attribute__ ((packed)) Particle {
	float2 pos;
	float4 col;
	float2 homePos;
	float2 vel;
	float mass;
};
 
__kernel void update(__global struct Particle* particles) {
	int id = get_global_id(0);
	__global struct Particle *p = &particles[id];
	// process info and update p->pos; p->col; etc.
}

in my main app I have:

Code :
 
#pragma pack(push, 1)
struct float2 {
	float x, y;
};
struct float4 {
	float x, y, z, w;
};
struct Particle {
	float2 pos;
	float4 col;
	float2 homePos;
	float2 vel;
	float mass;
};
#pragma pack(pop)
 
 
	Particle		particles[NUM_PARTICLES];	// contains info required for particles for init
	char*		posOffset;					//number of bytes offset from beginning of Particle to where the position is stored
	char*		colOffset;					//number of bytes offset from beginning of Particle to where the color is stored
 
	glGenBuffersARB(1, vbo);
	glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo[0]);
	glBufferDataARB(GL_ARRAY_BUFFER_ARB, sizeof(Particle) * NUM_PARTICLES, particles, GL_DYNAMIC_COPY_ARB);
	glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
 
	posOffset = (char*) ( (char*)(&particles[0].pos) - (char*)(&particles[0]) );
	colOffset = (char*) ( (char*)(&particles[0].col) - (char*)(&particles[0]) );
 
 
	clFlush(queue);
	glBindBufferARB(GL_ARRAY_BUFFER_ARB, vbo[0]);
	glVertexPointer(2, GL_FLOAT, sizeof(Particle), posOffset);
	glColorPointer(4, GL_FLOAT, sizeof(Particle), colOffset);		
	glDrawArrays(GL_POINTS, 0, NUM_PARTICLES);
	glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);

I've obviously omitted loads of code above, but that should be enough to explain what i'm trying to do. So I have one big buffer which contains all data for particles interleaved, and when I'm rendering them I set the stride and offsets appropiately. I have debugged those values and they are correct (sizeof(Particle) returns 44, posOffset is 0, colOffset is . My problem is, that the rendered output is wrong. It's hard to say exactly what is wrong but everything is pseudo-randomly sprayed out. e.g. I initialize 100 particles in a 10x10 grid with gradiented colors, and I do not get a grid, I just get pseudo-randomly placed particles with colors that are not from the gradient. I tried manually padding my Particle struct, and saw that every time I add a dummy float to the struct (in both my app declaration and the OpenCL program) I would get a new pseudo-random layout + colors. I found that if I add 3 floats to my struct (bringing the size of the Particle struct to 56 bytes) then my particle grid would appear correct. My question is why!? What is the robust solution to this problem? I also found that if I didn't add the 'packed' attribute to the struct in the opencl program, the app would freeze - clearly memory issues, but that makes sense. I thought by tightly packing the struct in both my host app and the opencl program these problems would be avoided. Now I don't get any freezes, but clearly my data isn't lining up properly. Any tips? or alternative suggestions?