I have not seen this mentioned elsewhere.

Having a function or mechanism with similar semantics to async_work_group_copy(...) but for reduce (very useful) and map (somewhat useful) would be quite nice for my particular work (scientific computing).

Also, atomic operations on floats would be wonderful, but I'd imagine that's more of a hardware issue.