Overlap when Using Local Memory
Backround Info: I'm working on a image debayering algorithm in Opencl. Basically what this does is take 1 channel image data taken by a camera (which can only capture one color at each point) , and interpolates for the other two color values by using the adjacent color values and then storing this information as a three channel image. The particular image size i'm working with is 3280 by 4904, so a fairly large image. My original program used only global memory, and processed a 2x2 square at each kernel call. I have tons of memory accesses in the program and I was wondering if it is possible to use local memory to somehow improve the run time. Right now it runs at about .22 seconds, i'd like to get it below .1 seconds.
So essentially the problem i'm dealing with is that I have a data array that's way to large to read into local memory at once, and by doing it in a segmented fashion (as I am now) i'm going to end up rereading data that has already been read to local memory (each 2 by 2 square need the surrounding 12 pixels to debayer the image, thus resulting in an overlap). Plus, I can think of a way to read the data into local memory in a coalesced fashion since the data each work group needs is spread apart by multiple rows of image data. So in my situation is local memory even a viable solution? I'm new to local memory and I'm sure I'm probably missing a lot, so any help really would be greatly appreciated. I think part of my problem is that I really don't understand the connection between creating local memory objects, and the work group size, if there even is one. Also, an other general ideas on how performance could be improved would also be great. Here's the code.
Ok, so this is my attempted implementation of the algorithm using local memory. This might be a little more helpful in figuring out where my thinking has gone wrong than the non local memory code.
It runs a little bit faster but not much, and the output image is jagged (the non local memory version did not produce a jagged image).
See my post below for updated information.