Results 1 to 7 of 7

Thread: OpenCL workgroup synch min/max values for a group

  1. #1
    Newbie
    Join Date
    Nov 2013
    Posts
    4

    OpenCL workgroup synch min/max values for a group

    Hello,
    I'm trying to work the following problem.
    I am trying to create a tiled version of an image where per tile i need to find the min/max luminance value.
    To start with i began with a 1024x512 image with tiles of 16x16 each.

    Based on what i've been reading this is the kernel i came up with using barriers to synch the local threads in a workgroup. I assumed that by synchronising it and creating a __local variable for Min and Max they could actually come up with a correct value for it, but it's not the case.. My results are every frame the image changes in a way i can't really understand what values it comes up with:


    on the host side here's what i'm doing:

    1024*512 work items in workgroups of 16x16

    Code :
        clProgram->Set( 0, clTexMem );  // 1024x512 res input
        clProgram->Set( 1, clMinMaxLumMem ); // 64x32 res output
        clProgram->Set( 2, 0.0f ); // default min value
        clProgram->Set( 3, 1.0f ); // default max value
        clProgram->Set( 4, tileSize ); // tile size (default: 16)
        clProgram->Set( 5, AppParams::showTileMaxZ?1:0 ); // a condition in the kernel that renders the min OR max just for debugging
        int workItemsX = Math::Min( (int)clDevice->info.maxWorkGroupSize, tex->GetWidth()/tileSize );
        int workItemsY = Math::Min( (int)clDevice->info.maxWorkGroupSize, tex->GetHeight()/tileSize );
        clDevice->Run2D( workItemsX*tileSize, workItemsY*tileSize, tileSize, tileSize );



    Here's the kernel code:


    Code :
    __kernel void Image_GetMinMaxLum( read_only image2d_t srcImage, write_only image2d_t dstImage, float nearClipPlane, float farClipPlane, int tileSize, int showTileMaxZ )
    {
        const float3 lumColor = (float3)( 0.30, 0.59, 0.11 );
     
        int uStart = get_global_id(0);
        int vStart = get_global_id(1);
        int lx = get_local_id(0);
        int ly = get_local_id(1);
     
        float4 color;
     
        // input image coords
        int2 coords = (int2)( uStart, vStart );
     
        // output image coords
        int2 coordsDst = (int2)( get_global_id(0)/tileSize, get_global_id(1)/tileSize );
     
        // local work group vars that store min/max values
        __local float minZ;
        __local float maxZ;
     
        if( showTileMaxZ > 0 )
        {
            maxZ = nearClipPlane;
     
            color = read_imagef( srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords );
            float Y = dot( lumColor, color.rgb );  // Luminance
            maxZ = max(maxZ, Y);
            color.rgb = (float3)( maxZ );
        }
    //    barrier( CLK_LOCAL_MEM_FENCE );
     
        if( !showTileMaxZ  )
        {
               // attempt to reset minZ to max value when first thread runs (assuming its the first one that runs??)
    //        if( (lx+ly) == 0 )
                minZ = farClipPlane;
     
            color = read_imagef( srcImage, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, coords );
            float Y = dot( lumColor, color.rgb );  // Luminance
            minZ = min(minZ, Y);
            color.rgb = (float3)( minZ );
        }
        barrier( CLK_LOCAL_MEM_FENCE );
     
        write_imagef( dstImage, coordsDst, color );
    }


    I'm pretty new with OpenCL but the current tests above where all based on what i read online and found in sources around the web.

    I have to say that i did a test with FOR loops and it seems to work, but the thing is it's probably not the way to do it.
    Basically its a 16x16 loop to compute the min/max of all items in a block of pixels and save it. With this method i create less work items, but also have 16x16 image reads more.
    I'm looking for advice on this, hopefully someone can help ?

    Thanks
    f

  2. #2
    Newbie
    Join Date
    Nov 2013
    Posts
    4
    No one? No ideas?

  3. #3
    Senior Member
    Join Date
    Oct 2012
    Posts
    109
    This is called "reduction" and it is such a usual operation that OpenCL C 2.0 now has built-in functions for it.

    You will find a good introduction to reduction with OpenCL here:

    http://developer.amd.com/resources/d...le-reductions/

  4. #4
    Newbie
    Join Date
    Nov 2013
    Posts
    4
    Thanks for the help, I'll have a read an try it.

  5. #5
    Senior Member
    Join Date
    Oct 2012
    Posts
    165
    Your access to the local variables seems to be incorrect. you arent synchronizing the access and all threads of your local workgroup access the same variable. so t1 till write maxZ, t2 will write maxZ and then you sync both threads. you should use an array like __local maxZ[get_local_size(0) * get_local_size(1)], write the max in each element and use the reduction. because your calculations are to simple to use the local memory for such a simple preoblem you could otherwise use the scan reduction utnapishtim talked about.

  6. #6
    Newbie
    Join Date
    Nov 2013
    Posts
    4
    Hi Clint,

    I understood that, but what you telling me now sounds confusing.

    So the idea is to create a min/max value from all the local threads, let's say i have 32x32 workgroup = 1024 threads.
    Each will read a pixel and from all those 1024 i need the minimum and maximum values, from 1024 i need 2 values

    What would be the reason to save a min/max value per thread? How would that work? Not really clear for me

    I also did not understand completely on how to work the reduction part there too.

  7. #7
    Senior Member
    Join Date
    Dec 2011
    Posts
    161
    For a large enough image I'd just make each work item process a 16x16 section (using a pair of nested for loops) and output the single min/max pair. This would avoid doing reduction and using atomics. It would be simple and achieve sufficient parallelism given a large enough image.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •