PDA

View Full Version : OpenCL error at clEnqueueReadBuffer



DP007
11-12-2009, 08:56 AM
Hi,

my program crashes when I try to call clEnqueueReadBuffer() to read data from the device. If I comment some lines out of the cl file, the copy process finishs successfully. But there aren't any compile errors in these lines. This seems a bit weird to me, cause I try to catch the error after the function call, but the error code isn't specified in the cl specification.

Any tips for me? Thanks in advance.

Daniel

dbs2
11-12-2009, 11:34 AM
Sometimes executing the next command will hit on an error from a previous command that wasn't reported correctly. This can especially be true if you are doing a blocking read which will cause the kernel execution before it to finish. I would suspect that the issue is that your kernel is writing out of bounds and causing a problem that only manifests itself at the next attempt to access the data. Try running on the CPU and see if you get an access violation from your kernel.

DP007
11-13-2009, 01:26 AM
Ah, thanks. Is there a way to emulate the kernel on the CPU like in CUDA? Or do I have to write my own CPU-function for testing?

DP007
11-13-2009, 01:36 AM
The thing is, I didn't make any array write operations on the device mem I try to copy later to the CPU. If I comment the following lines out it works:

middle.x = round(middle.x * (res_x-1));
middle.y = round(middle.y * (res_y-1));

It seems more like an instruction limit to me, but that wouldn't make sense either.

DP007
11-13-2009, 03:31 AM
Hi,

I tried to copy a few lines of the working code, doing some adds and assignments, and after a few copied lines the copy process fails again. So there isn't any error or exception in the cl code. What else could this be? Any suggestions?

affie
11-13-2009, 03:09 PM
Can you post the test and kernel source that shows the problem? What platform does this error occur on?

DP007
11-14-2009, 06:49 AM
Hi,

this is my test kernel code:


__kernel void
calc_triangle_mapping(__global const Mesh* mesh, __global Triangle* triangles,
const int res_x, const int res_y, __global int* triangle_mapping) {

int index = get_global_id(0);
__global Triangle *t;
Point2D l0, l1, l2;
Point2D top, bottom, middle;
float inc, inc_bottom_top, inc_bottom_middle, inc_middle_top;
bool right_side = false;
int begin, end;

// check bounds
if (index >= mesh->num_triangles)
return;

// get the triangle of the current thread
t = &triangles[mesh->triangles_offset + index];

// detect top, middle and bottom lighmap coordinate
if (t->l0.y >= t->l1.y) {
if (t->l0.y >= t->l2.y) {
top = t->l0;
if (t->l1.y >= t->l2.y) {
middle = t->l1;
bottom = t->l2;
}
else {
middle = t->l2;
bottom = t->l1;
}
}
else {
top = t->l2;
middle = t->l0;
bottom = t->l1;
}
}
else {
if (t->l1.y >= t->l2.y) {
top = t->l1;
if (t->l0.y >= t->l2.y) {
middle = t->l0;
bottom = t->l2;
}
else {
middle = t->l2;
bottom = t->l0;
}
}
else {
top = t->l2;
middle = t->l1;
bottom = t->l0;
}
}

if ((middle.y == bottom.y) && (bottom.x > middle.x)) {
Point2D tmp = bottom;
bottom = middle;
middle = tmp;
}


// for testing purposes repeat a round operation
top.x = round(top.x * (res_x-1));
top.y = round(top.y * (res_y-1));
top.x = round(top.x * (res_x-1));
top.y = round(top.y * (res_y-1));
top.x = round(top.x * (res_x-1));
top.y = round(top.y * (res_y-1));
}

I cannot imagine that the instruction limitation is that low. Maybe I'm doing something wrong when bulding the program? Can you specify the compute capability before building the program? Any suggestions?

Daniel