PDA

View Full Version : Can't write in buffer



wrx
03-16-2011, 02:35 AM
Hello,

A simple write in buffer fail, host :


cl_mem gpu_buffer_in = clCreateBuffer ( context,
CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
sizeof(unsigned char)*SIZE,
&cpu_buffer_in,
&error);
...
const size_t global_work_size[2] = {WIDTH, HEIGHT};

// NDRANGEKERNEL
result = clEnqueueNDRangeKernel ( command_queue,
kernel,
2,
NULL,
&global_work_size[0],
NULL,
0, NULL, NULL);


kernel :



__kernel void Closing(__global const unsigned char *a, __global unsigned char *c)
{
int row = get_global_id(0); //width
int col = get_global_id(1); //height

a[row*WIDTH + col] = 100 ; // writing problem
c[row*WIDTH+col] = a[row*WIDTH + col];
}


The problem occurs when I use :
a[row*WIDTH + col] = /*something*/ 100 ;

(c[row*WIDTH+col] = /*something*/ 100; ) works well.

Did you have any experience and knowledges about this problem?
Thanks!

wrx
03-16-2011, 04:32 AM
I thought that was :


__kernel void Closing(__global const unsigned char *a, __global unsigned char *c)
{...}

but


__kernel void Closing(__global unsigned char *a, __global unsigned char *c)

doesn't work too.

wrx
03-16-2011, 04:58 AM
I think it works without const.

Sorry, for the post. Maybe can be helpful.

david.garcia
03-16-2011, 11:07 AM
Notice also that "row" and "col" are inverted.

get_global_id(0) will return values between 0 and WIDTH -1, so it should be assigned to "col".

wrx
03-17-2011, 01:12 AM
Yes, you're right. thanks. It works cause height and width have same size.

Furthermore, I think I can continue here, cause the code than I exposed was a part of the problem.
I'm working on dilate, erode, closing, opening filter. Dilatation and erosion works well.But closing operation give partial non-treated (or something else) pixels lines:

http://img638.imageshack.us/img638/6127/newclosing.png (http://img638.imageshack.us/i/newclosing.png/)

If I re-execute the same code, the new image have other lines with the problem :
http://img37.imageshack.us/img37/6127/newclosing.png (http://img37.imageshack.us/i/newclosing.png/)

writing to buffer (CL_MEM_READ_WRITE) is done after barrier:



__kernel void Closing(__global unsigned char *a, __global unsigned char *c)
{
int row = get_global_id(1);
int col = get_global_id(0);
unsigned char pixel,neighborhood[9];

//*
if(row > 0 && row < HEIGHT-1 && col > 0 && col < WIDTH-1)
{
neighborhood[0]= a[(row-1)*WIDTH + col-1];
neighborhood[1]= a[(row-1)*WIDTH + col];
neighborhood[2]= a[(row-1)*WIDTH + col+1];

neighborhood[3]= a[row*WIDTH + col-1];
neighborhood[4]= a[row*WIDTH + col];
neighborhood[5]= a[row*WIDTH + col+1];

neighborhood[6]= a[(row+1)*WIDTH + col-1];
neighborhood[7]= a[(row+1)*WIDTH + col];
neighborhood[8]= a[(row+1)*WIDTH + col+1];

pixel = Max(neighborhood, 9); //dilate
}
else
{
pixel = a[row*WIDTH+col];
}
//*/

barrier(CLK_GLOBAL_MEM_FENCE);
a[row*WIDTH + col] = pixel ; // writing problem ? or barrier problem or maybe not here
barrier(CLK_GLOBAL_MEM_FENCE);

//*
pixel = a[row*WIDTH+col];

if(row > 0 && row < HEIGHT-1 && col > 0 && col < WIDTH-1)
{
neighborhood[0]= a[(row-1)*WIDTH + col-1];
neighborhood[1]= a[(row-1)*WIDTH + col];
neighborhood[2]= a[(row-1)*WIDTH + col+1];

neighborhood[3]= a[row*WIDTH + col-1];
neighborhood[4]= a[row*WIDTH + col];
neighborhood[5]= a[row*WIDTH + col+1];

neighborhood[6]= a[(row+1)*WIDTH + col-1];
neighborhood[7]= a[(row+1)*WIDTH + col];
neighborhood[8]= a[(row+1)*WIDTH + col+1];

pixel = Min(neighborhood, 9); //erode

}
else
{
pixel = a[row*WIDTH+col];
}
//*/
c[row*WIDTH+col] = pixel;
}



Did you have an opinion on this issue?
Thanks!

wrx
03-17-2011, 07:27 AM
I tested it several time and the problem occurs when I write something to buffer.

For exemple, the following code create correct image:



__kernel void Dilate(__global unsigned char *a, __global unsigned char *c)
{
int row = get_global_id(1);
int col = get_global_id(0);
unsigned char pixel,neighborhood[9];

//*
if(row > 0 && row < HEIGHT-1 && col > 0 && col < WIDTH-1)
{
neighborhood[0]= a[(row-1)*WIDTH + col-1];
neighborhood[1]= a[(row-1)*WIDTH + col];
neighborhood[2]= a[(row-1)*WIDTH + col+1];

neighborhood[3]= a[row*WIDTH + col-1];
neighborhood[4]= a[row*WIDTH + col];
neighborhood[5]= a[row*WIDTH + col+1];

neighborhood[6]= a[(row+1)*WIDTH + col-1];
neighborhood[7]= a[(row+1)*WIDTH + col];
neighborhood[8]= a[(row+1)*WIDTH + col+1];

pixel = Max(neighborhood, 9); //dilate
}
else
{
pixel = a[row*WIDTH+col];
}
//*/

// a[row*WIDTH + col] = pixel ; // writing problem ? or barrier problem or maybe not here
c[row*WIDTH+col] = pixel;
}


If I decomment :


// a[row*WIDTH + col] = pixel ;

, it create the problem cited previously.

david.garcia
03-17-2011, 03:51 PM
What you are seeing is correct. In other words, the algorithm is doing exactly what you told it to do.

Notice that in the boundary between two work-group, your algorithm is not doing what you think it's doing. After one work-group has finished executing, the pixels in that region of the image have already been altered. When another work-group reads from those pixels, the values are not what they were originally in the image.

In other words, as long as you try to do the image operation in place you will see artifacts. What you should do instead is to read from one image and write into a different image. Then the artifacts will disappear.

wrx
03-18-2011, 12:53 AM
Notice that in the boundary between two work-group, your algorithm is not doing what you think it's doing. After one work-group has finished executing, the pixels in that region of the image have already been altered. When another work-group reads from those pixels, the values are not what they were originally in the image.


Isn't barrier supposed to synchronize work-item in work-group together, or it does it on work-group internally?



In other words, as long as you try to do the image operation in place you will see artifacts. What you should do instead is to read from one image and write into a different image. Then the artifacts will disappear.


I tried to do before and I tried now, with 3th argument in kernel like (brief code):


__kernel void Closing(__global unsigned char *a,__global unsigned char *b, __global unsigned char *c)
{
....
pixel = dilated_pixel(...); // reading "a" buffer, like doing previous post
b[row*WIDTH + col] = pixel ;
barrier(CLK_GLOBAL_MEM_FENCE);
pixel = eroded_pixel(...); // reading "b" buffer
c[row*WIDTH + col] = pixel ;
}

, and same problem occurs.

The following code create black image, probably because of big size which can't be allocate :


__kernel void Closing(__global unsigned char *a, __global unsigned char *c)
{
...
unsigned char b[HEIGHT*WIDTH];
pixel = dilated_pixel(...); // reading "a" buffer, like doing previous post
b[row*WIDTH + col] = pixel ;
barrier(CLK_GLOBAL_MEM_FENCE);
pixel = eroded_pixel(...); // reading "b" buffer
c[row*WIDTH + col] = pixel ;
}

wrx
03-18-2011, 01:03 AM
You can no longer edit or delete that post.

-> P.S.: Should I do dilate, then erode operation one after another in 2 step , I mean transferring datas to GPU 2 times, first for dilate, second for erode?

wrx
03-18-2011, 07:04 AM
So, the way to do by send datas to GPU 2 times works. My goal was to do it in 1 time on the kernel. If this is the only way to do that , it's OK for me. Else I'm open to suggestions.

Thank you for guiding me. :wink:

david.garcia
03-18-2011, 03:42 PM
Okay, since the explanation didn't work I will show one simple way to make it work (there are better ways):



__kernel void Closing_first(const __global unsigned char *in,
__global unsigned char *out)
{
int row = get_global_id(1);
int col = get_global_id(0);
unsigned char pixel,neighborhood[9];

// Read from input image
if(row > 0 && row < HEIGHT-1 && col > 0 && col < WIDTH-1)
{
neighborhood[0]= in[(row-1)*WIDTH + col-1];
neighborhood[1]= in[(row-1)*WIDTH + col];
neighborhood[2]= in[(row-1)*WIDTH + col+1];
...
pixel = Max(neighborhood, 9); //dilate
}
else
{
pixel = in[row*WIDTH+col];
}

// Write into destination image.
// Do not write into input image! You would destroy it and cause artifacts.
out[row*WIDTH + col] = pixel ;
}


__kernel void Closing_second(const __global unsigned char *in,
__global unsigned char *out)
{
int row = get_global_id(1);
int col = get_global_id(0);
unsigned char pixel,neighborhood[9];

pixel = in[row*WIDTH+col];

// Read from input image
if(row > 0 && row < HEIGHT-1 && col > 0 && col < WIDTH-1)
{
neighborhood[0]= in[(row-1)*WIDTH + col-1];
neighborhood[1]= in[(row-1)*WIDTH + col];
neighborhood[2]= in[(row-1)*WIDTH + col+1];
...

pixel = Min(neighborhood, 9); //erode
}
else
{
pixel = in[row*WIDTH+col];
}

// Write into destination image.
// Do not write into input image! You would destroy it and cause artifacts.
out[row*WIDTH+col] = pixel;
}


From the API side you have to enqueue two kernels: one for Closing_first and then another for closing_second. Make sure that the output from Closing_first becomes the input of Closing_second.

Once you get that version working, you can try a better algorithm using local memory. The idea is to store the intermediate result in local memory instead of having two enqueue two kernels.

wrx
03-21-2011, 01:36 AM
Yes, that's exactly what I meant by "transfer datas to GPU 2 times" and I did exactly what you wrote, and it had worked well.



Once you get that version working, you can try a better algorithm using local memory. The idea is to store the intermediate result in local memory instead of having two enqueue two kernels.


For now, I try to do some example on global memory only. Moreover, I have a real question about this, generally:
- , should I do lot of example on global memory first for understand how kernels work, then improve them by working on local memory?
- or, should I work on local memory as soon as possible, and not waste time. ?

(I can ask the question on more appropriate thread ?)

david.garcia
03-21-2011, 05:31 AM
should I do lot of example on global memory first for understand how kernels work, then improve them by working on local memory?

Yes, that's what I would do. It's easier to write a first version that works using global memory only. Also, it may be fast enough for what you need.

This is a general rule for writing software that has worked well in the past: always start with the smallest and simplest version of your program that you can imagine. After you get that one working you can add features and complexity later.

wrx
03-21-2011, 06:44 AM
OK, thank you!

And I'll remember (this kind of) trick you told me with local memory when I'll try to improve my code.