I'm trying to optimize my OpenCL program and to this end I thought of precompiling the OpenCL kernel so that I can use clCreateProgramWithBinary to load the kernel and run the program. Doing that however, I notice no change in execution time. I'm using OpenCL on an Nvidia gtx295 so I'm creating a .ptx file. Is that a naive expectation? Would the precompiled kernel run faster? Or am I missing the point completely?
thanks in advance.