I ran into a case where it would be really useful to reverse the vector components. To create fully compliant BLAS functions, one must account for when the steps between elements (e.g., incx), are negative. Since hardware vendors may not support implicitly vectorize, elements may need to be explicitly packed into vector types.

To treat negative and non-unit increments, I copy the relevent elements from low-to-high global memory addresses to low-to-high local memory. Then I do a vload from local memory into private memory, where I currently shuffle (if needed), compute, and shuffle (again if needed) before doing a vstore to local memory, and then back into global memory.

Since the OpenCL specification already includes .hi, .lo, .even, .odd, I think .rev would be a natural addition. Of course I can continue to just use the built-in shuffle function, but then I need to create a reverse mask for each vector length. I think .hi, .lo, .even, and .odd being already in the spec. makes a reasonable argument to include .rev as well.