Results 1 to 4 of 4

Thread: Convolution Example/Tutorial from AMD

  1. #1
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Convolution Example/Tutorial from AMD

    Udeepta Bordoloi at AMD has posted the following convolution tutorial for OpenCL:

    http://developer.amd.com/gpu/ATIStreamS ... penCL.aspx

    The tutorial focuses just on the CPU, but includes a nice description of how to vectorize your kernel. There is also a performance comparison to OpenMP. Unfortunately the example does not include the use of local memory which is really important for performance on the GPU, but it's a good place to look for a non-trivial OpenCL example program.

  2. #2
    Junior Member
    Join Date
    Oct 2009
    Posts
    5

    Re: Convolution Example/Tutorial from AMD

    Agree about local memory...it is on my to-do list.

  3. #3
    Senior Member
    Join Date
    Jul 2009
    Location
    Northern Europe
    Posts
    311

    Re: Convolution Example/Tutorial from AMD

    A friend of mine suggested that it would be better to vectorize by processing N pixels at a time, rather than vectorizing for each pixel. This would also allow you to use 16-length vectors and let the compiler take care of mapping it to the right size for the hardware.

  4. #4

    Re: Convolution Example/Tutorial from AMD

    Quote Originally Posted by dbs2
    A friend of mine suggested that it would be better to vectorize by processing N pixels at a time, rather than vectorizing for each pixel. This would also allow you to use 16-length vectors and let the compiler take care of mapping it to the right size for the hardware.
    The problem with that is that it all depends on how your data is stored in memory. Assuming your colour components are interleaved (as is normal), then reading 16 pixels of red into a single vector will require gathering from non-contiguous locations, and similarly writing will require scattering the write.

    I suspect there would be a penalty for that on various architectures.

    Now vectorising to do n * m-component pixels in a single vector (i.e. 5 * 3-component or 4 * 4-component in a vec16) might get you the best of both worlds.

Similar Threads

  1. 2d convolution kernel
    By nabeelabasy in forum Interoperability issues
    Replies: 0
    Last Post: 03-03-2013, 06:34 AM
  2. Errors Running AMD OpenCL Tutorial Code
    By krishnan in forum OpenCL
    Replies: 2
    Last Post: 01-01-2011, 09:02 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •