In OpenCL spec, there are 2 versions of this kind of build-in functions for half type. the only difference I found is that they have different requirement of alignment. does it mean that vloada_half() will have an higher performance? And what is the purpose of this differentiation. Thanks!