PDA

View Full Version : Question about Kernel Arguement Speed



Gauge
04-21-2011, 01:11 PM
Would it be faster to declare variables in my kernel and assign them to arguments passed if I'm going to be using them a lot?

Or is just using the arguments fast.

For example

if I pass a float as an argument and it is located in memory, would it be just as fast to use the memory as to make a local object in the kernel and assign it to the memory?

I guess the real question is do stream processors have local cache memory or register memory, and if they do, do kernels use it?

My 8800GTS is supposed to get 200+ gigaflops and I'm getting about 1.6 lol. Which I know I won't get anything near the 200 as my algorithm does much more than just floating point operations, but to say 1.6 compared to 200...seems like my kernels could be sped up a bit.

david.garcia
04-21-2011, 02:56 PM
The first step of performance tuning in any language is measuring where time is being spent.

You mention you are using an NVidia platform. Why not give Visual Profiler (http://developer.nvidia.com/object/visual-profiler.html) a look? (The page seems to be down, maybe due to AWS' downtime)

It's also a good idea to read some general guides on how to write OpenCL code, such as NVidia's OpenCL programming guide (http://www.nvidia.com/content/cudazone/download/OpenCL/NVIDIA_OpenCL_ProgrammingGuide.pdf) or AMD's OpenCL programming guide (http://developer.amd.com/gpu_assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf).

As for the other questions, I can't quite make sense of them. Try rephrasing them in terms of "kernel arguments", "kernel scope variables", "global memory", "local memory", etc.