Results 1 to 8 of 8

Thread: Stream compaction

  1. #1
    Junior Member
    Join Date
    Feb 2014
    Posts
    12

    Stream compaction

    I'm facing a stream compaction problem, exactly as described in sect. 39.3.1 of

    http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html

    My vector is in global memory and I have to compact it and place the result back in global memory. In the above cited article it is mentioned that
    The addition of a native scatter in recent GPUs makes stream compaction considerably more efficient
    Still, I cant understand the exact meaning of that sentence. Are there native OpenCL C instructions that allows to compact streams in global memory? More generally, which is the best way to compact a vector?
    Thanks
    Last edited by khronos; 03-19-2014 at 08:51 AM. Reason: Fixed broken link

  2. #2
    Senior Member
    Join Date
    Oct 2012
    Posts
    115
    "Recent GPU" probably means less than 10-year old here...

    Gather means that the GPU can do random-access loads, while scatter means that the GPU can do random-access stores.

    It dates from the time when vertex shaders could not read data other than related to the vertex being processed (i.e. no texture fetch capability) and fragment shaders could not write data not related to the fragment being processed.

    Such a GPU would not be OpenCL-compatible anyway.

  3. #3
    Junior Member
    Join Date
    Feb 2014
    Posts
    12
    Thanks, I was suspecting that this could be the answer but now I'm sure
    In the meanwhile I went on with my stream compaction implementation. I think that it is impossible to compact a stream "in place" using multiple working groups, since there is no guarantee on the execution order and this can lead to a data race situation where one working group could overwrite a portion of the input array before the working group in charge of it can read its content. For this reason I will use an auxiliary buffer where compacted elements will be written. Since this compacted stream is needed only as input to another kernel I will copy it to the original one with cl::CommandQueue::enqueueCopyBuffer (I need the auxiliary buffer to compact many streams). So I won't need host memory for this buffer: is there a way to allocate a buffer only on the GPU without allocating host memory?

  4. #4
    Senior Member
    Join Date
    Dec 2011
    Posts
    170
    Yes, clCreateBuffer will create a GPU buffer without allocating host memory (as far as you know; an implementation could if it wanted). I'd suggest starting with some of the OpenCL examples to get a hang of the easy stuff before attempting something more difficult.

  5. #5
    Junior Member
    Join Date
    Feb 2014
    Posts
    12
    Thanks, I already checked some example, read a book and experimented a bit. But understanding buffer creation is in my opinion the hardest thing for beginners, especially the meaning of CL_MEM_USE_HOST_PTR et al. The web is full of threads asking for clarifications about memory allocation, some of them even contradict each other in some aspects... I still have to find the definitive to this topic.

  6. #6
    Senior Member
    Join Date
    Oct 2012
    Posts
    115
    Quote Originally Posted by snack14 View Post
    So I won't need host memory for this buffer: is there a way to allocate a buffer only on the GPU without allocating host memory?
    Just use clCreateBuffer() with CL_MEM_READ_WRITE flag. You can also add the hint flag CL_MEM_HOST_NO_ACCESS if your device has support for OpenCL 1.2.

  7. #7
    Junior Member
    Join Date
    Feb 2014
    Posts
    12
    Quote Originally Posted by utnapishtim View Post
    Just use clCreateBuffer() with CL_MEM_READ_WRITE flag. You can also add the hint flag CL_MEM_HOST_NO_ACCESS if your device has support for OpenCL 1.2.
    Nice tip, thanks. Unfortunately I'm targeting NVIDIA GPU atm, so I cannot rely on that flag...

  8. #8
    Senior Member
    Join Date
    Oct 2012
    Posts
    115
    Quote Originally Posted by snack14 View Post
    Nice tip, thanks. Unfortunately I'm targeting NVIDIA GPU atm, so I cannot rely on that flag...
    CL_MEM_READ_WRITE flag will create a buffer in device memory. CL_MEM_HOST_NO_ACCESS is just an optional hint.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •