Results 1 to 4 of 4

Thread: Allocating memory for kernels

  1. #1
    Junior Member
    Join Date
    Jun 2008
    Posts
    13

    Allocating memory for kernels

    Hi,

    two fast questions:

    1] is it possible to allocate dynamically memory in a kernel? (i suppose that the answer is
    no and that only constant memory can be allocated at compile time).

    2] Suppose you have to write a kernel that scans a gray-level image and store the coordinates of every pixel whose value is > 128. You don't know in advance how many pixel will meet that conditions. So you have to allocate a buffer of W*H*sizeof(pixel), where W is the width of the image, h is the height and pixel is a pod struct:

    struct pixel
    {
    int x, y;
    }

    Is this the only way to pass an output buffer to the kernel?

    3] Suppose that you pass to the kernel a buffer object of size K*sizeof(pixel), where K is a parameter decided at run-time (it can be < W*H). Is there a way to say to the kernel: ok, you have to stop know since we does not have enough memory left???

    Thank you.

  2. #2
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Allocating memory for kernels

    1] is it possible to allocate dynamically memory in a kernel?
    No, it's not possible.

    2] Suppose you have to write a kernel that scans a gray-level image and store the coordinates of every pixel whose value is > 128. You don't know in advance how many pixel will meet that conditions. So you have to allocate a buffer of W*H*sizeof(pixel), where W is the width of the image, h is the height and pixel is a pod struct:

    struct pixel
    {
    int x, y;
    }

    Is this the only way to pass an output buffer to the kernel?
    You got the general idea right. You can use a parallel prefix sum to efficiently compute this.

    3] Suppose that you pass to the kernel a buffer object of size K*sizeof(pixel), where K is a parameter decided at run-time (it can be < W*H). Is there a way to say to the kernel: ok, you have to stop know since we does not have enough memory left?
    Sure, you can pass an argument to the kernel specifying the size of that buffer.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

  3. #3
    Junior Member
    Join Date
    Jun 2008
    Posts
    13

    Re: Allocating memory for kernels

    Ok. To sum up: you allocate enough memory on the host (for example, if the maximum objects you expect to find is M, you allocate a buffer to contain M objects), run the GPU algorithm. You could end up with a buffer with empty elements ([O1, X, X, O2, X, O3..]), for this reason run a compaction algorithm (prefix-sum) to compact the buffer (empty slots will begin only after the last object). Is this right?

    Ok, i see that patterns in parallel algorithm are popping out everytime. A lot of things that are "natural" to do with a single CPU becomes less natural when switching to parallel algorithms.

    Is there something like the bible of patterns when dealing with parallel algorithms?
    Or better, a book that shows how to write a parallel version of the most common image processing algorithms (the most interesting at least). There is a lot of info on the net, and it is not simple to select what to read first. For example, Mark Harris wrote a presentation entitled "Data Parallel Algorithm on the GPUs". I think is a worth reading, since it presents "the building blocks", but it is only a brief overview.

    Thanks!

  4. #4
    Senior Member
    Join Date
    May 2010
    Location
    Toronto, Canada
    Posts
    845

    Re: Allocating memory for kernels

    Ok. To sum up: you allocate enough memory on the host (for example, if the maximum objects you expect to find is M, you allocate a buffer to contain M objects), run the GPU algorithm. You could end up with a buffer with empty elements ([O1, X, X, O2, X, O3..]), for this reason run a compaction algorithm (prefix-sum) to compact the buffer (empty slots will begin only after the last object). Is this right?
    That's right!

    Ok, i see that patterns in parallel algorithm are popping out everytime. A lot of things that are "natural" to do with a single CPU becomes less natural when switching to parallel algorithms.
    Yeah, parallelism is an additional layer of difficulty on top of whatever you are trying to solve.

    Is there something like the bible of patterns when dealing with parallel algorithms?
    If there is, I haven't found it yet. The work by Mark Harris and the earlier work by Daniel Hillis is the best I've found so far.
    Disclaimer: Employee of Qualcomm Canada. Any opinions expressed here are personal and do not necessarily reflect the views of my employer. LinkedIn profile.

Similar Threads

  1. Multiple kernels and constant memory
    By eugine in forum OpenCL
    Replies: 1
    Last Post: 09-03-2010, 12:59 PM
  2. Replies: 5
    Last Post: 04-23-2010, 08:50 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •