I'm currently porting some apps for iPhone and iPad which use a large number of partially overlapping point sprites from ES 1.1 to ES 2.0. I don't really understand GLSL as yet, but I thought I'd just do some quick timing tests on the iPad with a minimal shader with no texturing to see if the speed was equivalent to my old point rendering code with texturing disabled.

I found though that once I pushed the point size (3,500 points with a point size of 40 equivalent to filling the screen about 7 times) up to the point where the app became fill limited the shader was considerably slower than fixed function (25fps vs 40fps).

I thought that fixed function on such hardware just got converted to shaders anyway so I'm guessing that my shader, while it produces equivalent visual output, is doing it in an inefficient way.

My shader code is as follows: (while it's fill limited I've included the vertex shader as for all I know this may affect interpolation)

Vertex Shader
attribute vec4 position;
attribute vec4 color;
varying vec4 colorVarying;
const vec4 scale = vec4(1.0,0.66667,1.0,1.0);

void main()
gl_PointSize = 40.0;
gl_Position = position*scale;
colorVarying = color;

Fragment Shader
varying lowp vec4 colorVarying;

void main()
gl_FragColor = colorVarying;

I wondered whether the shader might perhaps be needlessly interpolating colors (Gouraud shading) whereas the fixed function version does not (it's just a point after all). I was unable to work out how to specify that it was flat shaded. So, just as a test, I set colorVarying in the fragment shader to be a constant which lifted the frame rate to 30fps but not the 40 fps of the fixed function version.

I've checked and the points of both versions are definitely being rendered at the same size. As far as I can tell the GL states are the same between the two versions unless there's some defaults that are different and I'm not setting. Can anyone tell me why my shader version might be slower? Thanks.