Results 1 to 9 of 9

Thread: Block edges updates

  1. #1
    Junior Member
    Join Date
    Mar 2011
    Posts
    14

    Block edges updates

    Hello,

    I am working on an algorithm where each pixel needs its four neighbors.

    So for each 16*16 block of threads, I want to copy the global memory into 18*18 shared memory blocks (This way pixels on block edges will have correct neighbors) :

    Code :
            -> [z z z z z]
    [x x x] -> [z x x x z]
    [x x x] -> [z x x x z]
    [x x x] -> [z x x x z]
            -> [z z z z z]

    This is my simple code:

    Code :
    __kernel void test(__global float* u,  __local volatile float* uLocal)
    {
        // Block index
        int bi = get_group_id(1);
        int bj = get_group_id(0);
     
        // Local coordinates
        int li = get_local_id(1)+1;
        int lj = get_local_id(0)+1;
     
    	// Global coordinates
    	int gi = get_global_id(1);
    	int gj = get_global_id(0);
     
    	// Local height and width
    	int lHeight = get_local_size(1)+2;
    	int lWidth  = get_local_size(0)+2;
     
    	// Global height and width
    	int gHeight = get_global_size(1);
    	int gWidth  = get_global_size(0);
     
    	if( (gi-1<0) || (gi+1>gHeight-1) || (gj-1<0) || (gj+1>gWidth-1) )
    		return;
     
    	uLocal[li*lWidth+lj] = u[gi*gWidth+gj];
     
    	for(int i=0 ; i<N ; i++)
    	{
    		// put edges in local memory
    		if(li-1 == 0)
    			uLocal[(li-1)*lWidth+lj] = u[(gi-1)*gWidth+gj];
    		else if(li+1 == lHeight-1)
    			uLocal[(li+1)*lWidth+lj] = u[(gi+1)*gWidth+gj];
     
    		if(lj-1 == 0)
    			uLocal[li*lWidth+lj-1] = u[gi*gWidth+gj-1];
    		else if(lj+1 == lWidth-1)
    			uLocal[li*lWidth+lj+1] = u[gi*gWidth+gj+1];
     
    		//do something with uLocal...
    		uLocal[li*lWidth+lj] = 0.5*uLocal[li*lWidth+lj+1] - 0.5*uLocal[li*lWidth+lj-1];
     
    		barrier(CLK_LOCAL_MEM_FENCE);
     
    		u[gi*gWidth+gj] = uLocal[li*lWidth+lj];
    		barrier(CLK_GLOBAL_MEM_FENCE);
    	}
    }

    It does not work (even without the for loop).

    It is different from the oclMedianFilter example (which also read and write edges), I really don't understand the point of this complex code.

    What should I do ??
    Help !

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Block edges updates

    Could you elaborate a bit on how the code doesn't work? Do you have sample inputs and outputs? How are the outputs different from what you expected?
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Mar 2011
    Posts
    14

    Re: Block edges updates

    Well, I have 3 problems, but what I want to know is: is my code correct ? Why does the "edge update" differ from the oclMedianFilter example ?

    On of my problem is that with my previous code, I get some vertical black lines between the blocks depending on which thread has been executed first.



    My second problem:

    I tried this:

    Code :
    __kernel void test2(__global float* u,  __local volatile float* uLocal)
    {
            // same init...
     
    	uLocal[li*lWidth+lj] = 0;
     
    	if( bi == 2 && bj == 2 )
    		uLocal[li*lWidth+lj] = u[gi*gWidth+gj];
     
    	if( bi == 2 && bj == 4 )
    	{
    		uLocal[li*lWidth+lj] = u[gi*gWidth+gj];
    		if(li-1 == 0)
    			uLocal[(li-1)*lWidth+lj] = 1;
    		else if(li+1 == lHeight-1)
    			uLocal[(li+1)*lWidth+lj] = 1;
     
    		if(lj-1 == 0)
    			uLocal[li*lWidth+lj-1] = 1;
    		else if(lj+1 == lWidth-1)
    			uLocal[li*lWidth+lj+1] = 1;
     
    		// do something....
     
    		if(li-1 == 0)
    			u[(gi-1)*gWidth+gj] = uLocal[(li-1)*lWidth+lj];
    		else if(li+1 == lHeight-1)
    			u[(gi+1)*gWidth+gj] = uLocal[(li+1)*lWidth+lj];
     
    		if(lj-1 == 0)
    			u[gi*gWidth+gj-1] =  uLocal[li*lWidth+lj-1];
    		else if(lj+1 == lWidth-1)
    			u[gi*gWidth+gj+1] = uLocal[li*lWidth+lj+1];
    	}
     
    	u[gi*gWidth+gj] = uLocal[li*lWidth+lj];
    }

    And the bottom edge of the second block does note appear.
    (I know that this program is not a perfect example because execution paths diverge)



    And my third problem concerns the for loop:
    How to synchronize threads when I must update edges (the threads on blocks' edges will have more work) in the for loop. The barrier() and if() else() statements are not compatible and the whole program is skipped when threads are not synchronized (the GPU does nothing, the execution time is almost 0).

    Maybe I must remove the for loop from the kernel and put the kernel into a for loop, but I guess that would require many clEnqueueRead/WirteBuffer and slow down the algorithm.

    Thanks for helping !
    Arthur

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Block edges updates

    Thanks for the additional info. Notice that you are changing the image in place. What measures are you taking to prevent a previous work-group from updating the image before you load the border pixels into local memory?

    Let's say that your work-groups are of size 3x3 and your original image looks like this:

    Code :
    a a a b b b
    a a a b b b
    a a a b b b

    What you want is each work group to load a tile of data into local memory like this:

    Code :
    work-group 'a' wants to load:
     
    x x x x x
    x a a a b
    x a a a b
    x a a a b
    x x x x x
     
    work-group 'b' wants to load
     
    x x x x x
    a b b b x
    a b b b x
    a b b b x
    x x x x x

    However, with the code you've shown, when work-group a (3x3) is executed the image is updated and now looks like this:

    Code :
    c c c b b b
    c c c b b b
    c c c b b b

    ...so when it's the turn of work-group 'b' it will read this in local memory:

    Code :
    x x x x x
    c b b b x
    c b b b x
    c b b b x
    x x x x x

    instead of

    Code :
    x x x x x
    a b b b x
    a b b b x
    a b b b x
    x x x x x

    oclMedianFilter possibly gets away with it because the values from 'a' and 'c' look very similar (it's computing a median filter after all). However, in your example 'a' and 'c' will look very different and that's why you notice the artifacts.

    You will have to read from an image X and write into a different image Y if you wan to avoid the artifacts.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  5. #5
    Junior Member
    Join Date
    Mar 2011
    Posts
    14

    Re: Block edges updates

    Thank you very much for this clear answer !
    If I understand correctly, this problem can not be fixed with barrier or mem_fence functions since it would only synchronize memory within blocks ?

    Anyway I tried to write in another variable, that works well, thanks.

    I still have the other problems though

  6. #6
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Block edges updates

    If I understand correctly, this problem can not be fixed with barrier or mem_fence functions since it would only synchronize memory within blocks ?
    That's right. There's no way in OpenCL to communicate between work-groups.

    I'll look at the other questions after work
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  7. #7
    Junior Member
    Join Date
    Mar 2011
    Posts
    14

    Re: Block edges updates

    ahah, thank you

    For my loop problem, I guess there is no other way than executing the kernel n times (instead of having the for loop inside the kernel).

    It works but it really slows down the program...

    And I still don't know why the local edges did not update (my second problem), I tried to write the result in another variable and that did not fix the problem...

  8. #8
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Block edges updates

    It works but it really slows down the program...
    Does it make such a big difference? What size is the image and how do you make the calls to repeat the operation multiple times?

    And I still don't know why the local edges did not update (my second problem)
    It's basically the same problem as your first one, right? The white borders you are drawing in work group (2,4) are overwritten by the other work groups. This code is causing the issue:

    Code :
          if(li-1 == 0)
             u[(gi-1)*gWidth+gj] = uLocal[(li-1)*lWidth+lj];
          else if(li+1 == lHeight-1)
             u[(gi+1)*gWidth+gj] = uLocal[(li+1)*lWidth+lj];
     
          if(lj-1 == 0)
             u[gi*gWidth+gj-1] =  uLocal[li*lWidth+lj-1];
          else if(lj+1 == lWidth-1)
             u[gi*gWidth+gj+1] = uLocal[li*lWidth+lj+1];

    If you want it to go away then comment the last line in the kernel:

    Code :
    //u[gi*gWidth+gj] = uLocal[li*lWidth+lj];
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  9. #9
    Junior Member
    Join Date
    Mar 2011
    Posts
    14

    Re: Block edges updates

    Yes that's right, this is the same problem. I realized it when I woke up this morning

Similar Threads

  1. Convolution with Wrapped Edges
    By nelsong in forum OpenCL
    Replies: 3
    Last Post: 04-20-2011, 06:42 AM
  2. Re: block edges
    By arthur.sw in forum OpenCL
    Replies: 0
    Last Post: 04-07-2011, 05:22 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •