Results 1 to 5 of 5

Thread: Global memory "bitfield" technique

  1. #1
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    Global memory "bitfield" technique

    My main question is if it is possible to create bit-field in global memory.

    Now I know for various reasons Khronos group rejected bit-fields, so only way is to "emulate" them with bigger data types. The most suitable is probably byte (uchar), so I would use byte-field as storage, but treat it as bit-field with some bitwise magic.

    The problem is how to access that data storage correctly. Because there's no synchronization at global level, I can't read & write to same buffer in single kernel run.. it could mess the data (At least I think.. I may read the value and right away other item would write there, making my recently read item "expired"). So reading the byte, writing at desired bit and writing byte back could result in error.

    I can come up with only one solution - to write only that one bit and leave others untouched, but I don't think that's possible in current hardware.

    Of course one can simply use bytes as bits, but with small memory sizes and even more limited buffer sizes, their number may be insufficient.

    Any suggestions? I would also appreciate if someone has different idea how to pack results that consist only of "true" and "false".

  2. #2
    Junior Member
    Join Date
    Nov 2011
    Posts
    10

    Re: Global memory "bitfield" technique

    Hi!

    Yeah I had the same problem, I think it depends on the problem that you'd like to solve.
    I wrote a program which used Schönhage's algorithm to calculate the factorial of a number
    in the fastest way. Using the long16 number and tricking with bits was a nice for this problem, but
    I don't think that would fit your problem because mine wasn't so good at "parallelizing" so I could wirte my own "bitfield".
    As you said working bad with the data can cause inconsistency, so it's really hard to use bitfield technique.

  3. #3
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    Re: Global memory "bitfield" technique

    Quote Originally Posted by SAdam
    Hi!
    Yeah I had the same problem, I think it depends on the problem that you'd like to solve.
    I agree. If it would be in scenario where each run index map to one bit in such way that { idx0 -> bit0, idx1 -> bit1 } I could use what I've already done.. Specify local workgroup size as multiple of 8 (which is needed for performance reasons anyway with sizes multiples of 32), and appoint first processor of that octet to build whole byte from bits and save the result into memory.

    However, now I need something that would work with random access too. Starting with elements initialized to false, only requirement is that when a bit is set to true, it won't change it's value again.

    I think without global synchronization (which would be anyway unsuitable.. just imagine the access pattern) or more grained access to level of bits this is impossible, but maybe people with more experience bypassed this with other solution.. other than changing data-structure ofc.

  4. #4
    Junior Member
    Join Date
    Jul 2012
    Posts
    3

    Re: Global memory "bitfield" technique

    At last I have found solution to this:

    http://www.khronos.org/registry/cl/sdk/ ... ic_or.html

    I had in memory that I read about usefulness of atomics while operating with global memory, but haven't investigated this possibility until now.

  5. #5
    Senior Member
    Join Date
    Aug 2011
    Posts
    271

    Re: Global memory "bitfield" technique

    Quote Originally Posted by Dark_Raven
    At last I have found solution to this:

    http://www.khronos.org/registry/cl/sdk/ ... ic_or.html

    I had in memory that I read about usefulness of atomics while operating with global memory, but haven't investigated this possibility until now.
    global atomics are really slow on some hardware, and even if they weren't - if you're doing a lot of them it can still be a real bottleneck.

    It might be faster to write to the smallest atomic write size (which might not be byte, although you could always use a byte image), and then have a subsequent pass which compresses it back to bits - and store them at least using 32-bit ints.

    Or if the data is really sparse, create an edit list which is executed sequentially afterwards.

Similar Threads

  1. "vgSetParameterfv" vs "vgSetColor"
    By gthm159 in forum OpenVG and VGU
    Replies: 1
    Last Post: 08-15-2008, 02:28 AM
  2. "required extension" and "core addition"
    By wycwang in forum OpenGL ES general technical discussions
    Replies: 2
    Last Post: 09-19-2007, 02:11 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •