Quote Originally Posted by david.garcia
Considering that a device can be simultaneously a GPU, CPU, and accelerator (notice that cl_device_type is a bitfield), then multiple of these macros may be enabled at once?
I‘ve been thinking about this issue and it dawned on me that predefined macros would not be as good as a get_device_type() function that can dynamically change as the kernel is shifted around basic hardware architectures during execution.

I‘ve also been wondering how would one determine the preferred_work_group_multiple for a device like you describe? The least common multiple of the preferred_work_group_multiple for each subdevice? Better hope they‘re not all distinct prime numbers