Imagine I have some sort of filter algorithm.
- in_array has the input data (vectorized, for faster access)
- out_array gets the result of the filtering
- filter is the filter itself.
The kernel would look something like this:
__kernel void vec_iii_1d(__global float4 *filter, __global float4* in_array, __global float4* out_array)
out_array[tid] = in_array[tid] * filter[fid];
1) If I change "__global float4 *filter" to "__constant float4 *filter", would the data then be automatically cached in the constant cache + kept there for all subsequent kernel calls (the kernel is called several times) ?
2) If I change "__global float4 *filter" to "__local float4 *filter" - what will happen then?
2a) Is the data in global memory first, and then copied automatically to local memory when the kernel is executed?