Does Nvidia API grow an unbounded command queue unless clFinish() is called?
I have a simulator which runs under all of the major OpenCL APIs but if I run it for a long time under Nvidia it eventually runs out of memory. My code involves repeatedly calling the same kernel function. After about 10^8 calls the Nvidia machine kills the process (a self protective measure) due to running out of memory. I suspect that the API is maintaining some kind of handles to each of my previous calls to the kernel, either in the command queue (as completed tasks) or as events returned when writing buffers and running kernels. I'm thinking it's probably the command queues, does anybody have any experience of this? I am not currently calling clFinish() anywhwere in my code as I only do blocking writes and reads and they are performed before and after each kernel run, so in the other APIs I definitely don't need to run them. But I've read elsewhere that Nvidia interpret that part of the API spec differently from everyone else.
I'm going to add the calls to clFinish() to my code now and perhaps that will solve my problem. But my code will take 36 hours to reach a crash point and I'd also really like a definitive answer on this one if anyone has one.
Thanks very much,