I am having a problem with the efficiency of my kernel due to too many global reads. Therefore, I would like to copy the global array into a "shared" array inside my kernel. The code that I have does not work though. How would I go about changing this so that the barrier is properly working and the assignment is correct. Currently, I am getting random values for the array.
Code ://correctly getting index local float* temp = new float[SIZE]; temp[index] = input[index]; //input == the array passed to the kernel barrier(CLK_GLOBAL_MEM_FENCE);