PDA

View Full Version : Can enqueueNDRangeKernel leak memory even without events?



wwmm
02-08-2013, 07:11 AM
Hi,

I have posted this https://devtalk.nvidia.com/default/topic/529018/cuda-programming-and-performance/memory-leak-in-opencl-under-linux-when-the-number-of-kernels-calls-is-huge/ on nvidia forum but I still could not find a solution. I am facing a memory leak in OpenCL that only happens when I call the kernel a large number of times, around 10^8 times. I am aware that memory leaks are expected when events are used and not released, but can this happens when not using them? My code is a little big but I can post parts of it here if it helps. I am using the c++ wrapper and calling the kernel this way



a.global_size = {p.N_first_reduction};
a.local_size = {p.local_size};
p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size);


As I sad on nvidia forum I tried



a.global_size = {p.N_first_reduction};
a.local_size = {p.local_size};
p.queue.enqueueNDRangeKernel(p.E_int, a.kernel_offset, a.global_size, a.local_size, NULL, NULL);


But nothing has changed. Is there a way to find what is goig wrong?

Thanks in advance

Dithermaster
02-09-2013, 08:13 AM
I found issues with older drivers where they leaked small amounts of memory that could accumulate over very large numbers of kernel calls. Try to figure out about how many bytes per kernel you are losing (by dividing the leak size by the kernel count after running many kernels). Then try changing aspects of how you are calling the kernel to see if you can get the number to change. See if it is the same or different between vendors (e.g., AMD and NVIDIA).

wwmm
02-09-2013, 01:50 PM
Hi,

I will look into that but I don't know if such calculation will be possible... As far as top can tell the used memory remains the same almost the whole time(around 2 hours)... When suddenly it explodes. In a few seconds it goes from 70 mb to around 4 gb and then the executable is killed by the kernel...

Unfortunately I do not have an AMD card to perform tests... I will see if I can test the code on the processor, but it could take days to get to the same point running on it...