Recenty with the release of OpenGL 4.
There are a few extensions that really could improve efficiency.

The drawing of data generated by OpenGL, or external APIs such as OpenCL, without CPU intervention is a really good thing to have here. Especially because the CPU's are very constrained in devices that use OpenGL ES.
Furthermore the following things should be added because they allow performance improvements that count for a lot where OpenGL ES is being used.

* shader subroutines for significantly increased programming flexibility;
* performance improvements, including instanced geometry shaders, instanced arrays, and a new timer query.

The timer query and the new shader flexibility allows for mini benchmarking and self optimizing programs when well written. (By using, trying and timing different approaches at runtime and then choosing the most efficient. Keeping improving until the program has achieved the maximum efficiency.)
Of course all things that allow for pure performance improvements should be added (no new stuff e.g. tessellation does not belong here)

Please add seamless cube maps (filtering) from OpenGL 3.2
And Shader Fragment coordinate convention control
(didn't found them while using the search function in the pdf describing the specs)


Please do consider as much as possible (preferably all) of the features described here.
Especially the drawing without cpu intervention + other performance improvements will be very welcome in OpenGL ES.