I have this kernel:
__kernel void some_kernel(__global int* in_array, __global int* out_array)
uint tid = get_global_id(0);
out_array[tid] = in_array[tid] *3;
in_array is initialized in the beginning of my program - and never changes.
1) Would it be beneficial if I change the kernel to "__global const int* in_array" ? And if so, why?
2) Would it be OK to process one element per kernel - or would it be better to do a loop?
Or in a few words: What's the best way to minimize the effects of memory latency here?