PDA

View Full Version : CL_OUT_OF_RESOURCE on NVIDIA GPUs only.



Elphinstone
06-16-2013, 03:09 AM
Hi all!

I already started a thread in NVIDIA forums (https://devtalk.nvidia.com/default/topic/547231/cuda-programming-and-performance/bug-cl_out_of_resource-when-using-switch-or-if-else-/), but it looks like noone is interested in things different from CUDA... Here is my problem again:

Some month ago I started working on a fractal raytracer (https://synthverse.wordpress.com/) in C++/OpenCL. I already met a lot of bugs in the NVIDIA OpenCL compiler (access violations when declaring variables without using them) but I was Always able to find some workaround. This time it looks a bit more serious: In my raytracer I have to select the color of the nearest shape. I implemented this using a switch, but I always get a CL_OUT_OF_RESOURCES error when reading the output buffer (CL_INVALID_COMMAND_QUEUE if I call clFinish() before). This happens only with NVIDIA GPUs, works correctly with AMD GPU and Intel CPU. This is the important part of the code:


TracerOut Trace(CameraOut in, SceneParams params)
{
float dist[2];
TracerOut out;

Mandelbulb1_OO Mandelbulb1_oo = Mandelbulb1_Object(in, params);
dist[0] = distance(Mandelbulb1_oo.intersection, in.origin);
Mandelbulb2_OO Mandelbulb2_oo = Mandelbulb2_Object(in, params);
dist[1] = distance(Mandelbulb2_oo.intersection, in.origin);

uint nearestId = 0;
float nearestDist = 10000000.0f;
for (uint i = 0; i < 2; i++)
{
if (dist[i] < nearestDist)
{
nearestDist = dist[i];
nearestId = i;
}
}

// Trick needed to avoid access violation bug
Mandelbulb1_SO Mandelbulb1_so;
Mandelbulb1_so.color.x = 0.0f;
Mandelbulb2_SO Mandelbulb2_so;
Mandelbulb2_so.color.x = 0.0f;

switch (nearestId)
{
case 0:
Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
out.color = Mandelbulb1_so.color;
break;
case 1:
Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);
out.color = Mandelbulb2_so.color;
break;
default:
out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);
break;
}

return out;
}



I imagined that the switch construct can cause the problem, so i tried with simple if's:


TracerOut Trace(CameraOut in, SceneParams params)
{
//...

Mandelbulb1_SO Mandelbulb1_so;
Mandelbulb1_so.color.x = 0.0f;
Mandelbulb2_SO Mandelbulb2_so;
Mandelbulb2_so.color.x = 0.0f;

out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);

Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);

if (nearestId == 0)
out.color = Mandelbulb1_so.color;

if (nearestId == 1)
out.color = Mandelbulb2_so.color;

return out;
}



And I still have the same problem. Removing one or both the if's solves the problem:


TracerOut Trace(CameraOut in, SceneParams params)
{
//...
out.color = (float4)(0.0f, 0.0f, 0.0f, 0.0f);

Mandelbulb1_so = Mandelbulb1_Shader(in, Mandelbulb1_oo, params);
Mandelbulb2_so = Mandelbulb2_Shader(in, Mandelbulb2_oo, params);

out.color = Mandelbulb1_so.color;

if (nearestId == 1)
out.color = Mandelbulb2_so.color;

return out;
}

Removing the declaration of the structs also lets it run fine:


TracerOut Trace(CameraOut in, SceneParams params)
{
//...

switch (nearestId)
{
case 0:
out.color = Mandelbulb1_Shader(in, Mandelbulb1_oo, params).color;
break;
case 1:
out.color = Mandelbulb2_Shader(in, Mandelbulb2_oo, params).color;
break;
default:
break;
}

return out;
}

But of course this is not what I want. I know that conditionals are very bad for GPUs, but at the moment I don't have other solutions (someone has suggestions? :) ), optimization will come later. This should be supposed to work so I believe this is a bug in the NVIDIA OpenCL driver, right? Anyone had similar problem? Any fix coming?

I tried to run my program on different PCs. I can run it without problems on the fallowing devices:

CPU Intel i7 2600K
CPU Intel i7 920
CPU Intel i7 2620M
GPU Intel HD Graphics 3000
GPU AMD HD 6470M

I get the CL_OUT_OF_RESOURCES / CL_INVALID_COMMAND_QUEUE errors on:

GPU NVIDIA GTX 680 (EVGA) 320.18
GPU NVIDIA GTX 560 Ti OC (Gigabyte) 320.18
GPU NVIDIA GTX 470 (Zotac) 320.18


Last remark: some parts of the code my look bad written... This is because I'm not writing directly the OpenCL code. I'm writing a program that assembles OpenCL scripts dynamically and then runs them.

Thank you!
Mattia.