Half data types are in the core OpenCL 1.0 specification as a "data storage format". However, async_work_group_copy is only extended to work on halfs in the fp16 extensions. Isn't async_work_group_copy just a bit-wise copy? Why does it need special hardware support to work on halfs?
Is it dangerous for me to just cast my half pointers to short pointers so I can use async_work_group_copy?