Hello guys,

I'm new to opencl and I'm experiencing a weird issue with it! I have a reduction kernel and I repeat it several times! The problem is that when I profile the execution of kernel the elapsed time (queued->end) is almost same and a bit increasing but when I measure the elasped time within "C++" code the time for the execution of line "clEnqueueNDRangeKernel" increases with a rapid rate!! I have attached both the code and the output of profiling!

Code :
	// execute the kernel
	globalWorkSize[0] = this->reduction_NumBlocks * this->reduction_NumThreads;
	localWorkSize[0] = this->reduction_NumThreads;
 
	//Start Time
	ttt.start();
 
clErrNum = clEnqueueNDRangeKernel(clCommandQueue, kernelReduction, 1, 0,
			globalWorkSize, localWorkSize, 0, NULL, &timing_event);
	// check if kernel execution generated an error
	oclCheckError(clErrNum, CL_SUCCESS);
 
	clFinish(clCommandQueue);
	ttt.stop();
 
	//Check Elapsed Time
	clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_QUEUED,
	sizeof(time_start), &time_start, NULL);
	clGetEventProfilingInfo(timing_event, CL_PROFILING_COMMAND_END,
	sizeof(time_end), &time_end, NULL);
	cout<<"ElapseTime(Execute):"<<(time_end - time_start)/1000<<"us\tTTT:"<<ttt.getElapsedTimeInMicroSec()<<endl;

output::
  • Code :
    GeForce GTX 550 Ti
    Device Timer Resolution:1000ns
    GpuExecutionTime:160us	C++ElapsedTime:177
    GpuExecutionTime:156us	C++ElapsedTime:167
    GpuExecutionTime:156us	C++ElapsedTime:166
    GpuExecutionTime:189us	C++ElapsedTime:242
    GpuExecutionTime:158us	C++ElapsedTime:215
    ...
    GpuExecutionTime:156us	C++ElapsedTime:253
    GpuExecutionTime:162us	C++ElapsedTime:261
    GpuExecutionTime:157us	C++ElapsedTime:262
    GpuExecutionTime:156us	C++ElapsedTime:254
    GpuExecutionTime:157us	C++ElapsedTime:254
    GpuExecutionTime:160us	C++ElapsedTime:261
    GpuExecutionTime:167us	C++ElapsedTime:279
    GpuExecutionTime:157us	C++ElapsedTime:264
    ...
    GpuExecutionTime:159us	C++ElapsedTime:263
    GpuExecutionTime:157us	C++ElapsedTime:261
    GpuExecutionTime:157us	C++ElapsedTime:260
    GpuExecutionTime:157us	C++ElapsedTime:263
    GpuExecutionTime:183us	C++ElapsedTime:287
    GpuExecutionTime:159us	C++ElapsedTime:275
    GpuExecutionTime:158us	C++ElapsedTime:285
    GpuExecutionTime:184us	C++ElapsedTime:289
    GpuExecutionTime:163us	C++ElapsedTime:271
    GpuExecutionTime:264us	C++ElapsedTime:384
    ..
    GpuExecutionTime:156us	C++ElapsedTime:304
    GpuExecutionTime:161us	C++ElapsedTime:314
    GpuExecutionTime:157us	C++ElapsedTime:308
    GpuExecutionTime:160us	C++ElapsedTime:305
    GpuExecutionTime:158us	C++ElapsedTime:311
    GpuExecutionTime:156us	C++ElapsedTime:308
    GpuExecutionTime:157us	C++ElapsedTime:307
    GpuExecutionTime:164us	C++ElapsedTime:320
    GpuExecutionTime:159us	C++ElapsedTime:328
    GpuExecutionTime:157us	C++ElapsedTime:306
    GpuExecutionTime:157us	C++ElapsedTime:309
    GpuExecutionTime:157us	C++ElapsedTime:312
    ...
    GpuExecutionTime:157us	C++ElapsedTime:326
    GpuExecutionTime:158us	C++ElapsedTime:326
    GpuExecutionTime:159us	C++ElapsedTime:330
    GpuExecutionTime:158us	C++ElapsedTime:328
    GpuExecutionTime:158us	C++ElapsedTime:335

Any kind of help is appreciated.

P.S. The size of input and other related vairables are fixed!