PDA

View Full Version : vload_half and vloada_half



shawn_zhou
12-23-2009, 01:42 AM
Hi all
In OpenCL spec, there are 2 versions of this kind of build-in functions for half type. the only difference I found is that they have different requirement of alignment. does it mean that vloada_half() will have an higher performance? And what is the purpose of this differentiation. Thanks!

affie
12-23-2009, 10:51 AM
vload_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to a 16-bit i.e. size of a scalar half boundary.

vloada_halfn allow you to load a 1, 2, 4, 8 or 16 component half-vector where the alignment requirement is that p be aligned to the size of half vector. vloada_halfn should, in most cases, give you better memory access performance compared to the unaligned vload_halfn version.