PDA

View Full Version : caching variables from arrays



memo
11-10-2009, 11:51 AM
Hi all, I have the following kernel code


typedef struct {
float2 vel;
float mass;
float life;
} Particle;

typedef struct {
float2 pos;
float ejectForce;
float attractForce;
float waveAmp;
float waveFreq;
} Node;

__kernel void update(__global Particle* particles, //0
__global float2* posBuffer, //1
__global float4 *colBuffer, //2
__global Node *nodes, //3
const int numNodes, //4
) {

int id = get_global_id(0);

__global Particle *p = &particles[id];
__global Node *n = &nodes[id % numNodes];


float mass = particles[id].mass;
float2 pos = posBuffer[vboIndex];

In my kernel code after that, I directly use p, n, mass and pos. I've been trying to determine if that is faster, or directly accessing from the arrays, but the results seem roughly the same. I was wondering if those who understand the architecture better than I do can comment on theoretical performance difference? (NVidia 9600GT in Macbook Pro).

dbs2
11-11-2009, 12:30 PM
I don't think it should matter. The compiler will load the memory access into a register, and if it can reuse that load later on it will do so. So in effect you get the version where you use a private variable either way if the compiler thinks it is better. The case where you can get a performance speedup is if you load a lot of data and have reuse, particularly within a workgroup. In that case you can manually load the data into the local memory and access it from there as a software managed cache.

memo
11-13-2009, 05:01 PM
ok makes sense thanks.