I'm new in OpenCL. I check code from somebody else and it's look like this

Code :
struct Scene
{
__global float* vertics;
...
}

"vertics" is an array of float, but inside you have POSITION and NORMAL.
To get the position We have this fonction


Code :
inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
    __global float* offset = 0;
    offset = s->vertics + vertexID * 8;
 
    return (float4)(*offset, 
                    *(offset + 1), 
                    *(offset + 2), 
                    1.0f);

and to get normal


Code :
inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
    __global float* offset = 0;
    offset = s->vertics + vertexID * 8;
    return (float4)(*(offset + 4), 
                    *(offset + 5), 
                    *(offset + 6), 
                    0.0f);


I know, when we program in HLSL it'S better to use float4 directly when we can. Then I try this easy change to see if it's better


Code :
struct Scene
{
    __global float4* vertics;
    ...
}


Code :
inline float4 GetVertexPosition(__local struct Scene *s, uint vertexID)
{
    return s->vertics[vertexID * 2];
}



Code :
inline float4 GetVertexNormal(__local struct Scene *s, uint vertexID)
{
    return s->vertics[vertexID * 2 + 1];
}




I profiled each example. The first one is faster. Not a huge difference, but still faster. I tought it's should be faster to use float4* directly instead of float* and convert into a float4.

I use the same buffer in each situation, then alignement should be the same. I only change what I wrote.

Somebody can explain why it's faster to use float*?

Thanks