Page 2 of 5 FirstFirst 12345 LastLast
Results 11 to 20 of 47

Thread: Memory and Speed issues

  1. #11
    Member
    Join Date
    Aug 2004
    Location
    SCEJ - Tokyo, Japan
    Posts
    34
    As a reference: my binary format is only about 2-3x faster than Collada.
    Only 2 to 3 times faster? Hmmm, most teams would kill to have a 10% speed increase in any part of there system and your sneezing at 200-300%???

    byte-order independence
    Is a non-issue. Define what it is in the spec and the issue is solved

    I would be willing to make a bet that by the end of the cycle of the next-generation you'll find a text based format for these large chunks of data to be a bad thing. If it's such a good then thing then I propose we make collada also store the textures. RGBA, one ascii float per channel per pixel. Clearly that's the argument here. By your standards that would be "fast enough".

    All I'm arguing for is a little foresight. I'm not suggesting a single thing that would make collada less generic, less useful, less anything. I'm only suggesting a simple optimization.

    os wrote:
    greggman wrote:
    The Unreal 3 site claims 100 million polys in their outdoor levels so at 3 seconds for 0.5 million polys that would take 10 minutes to load.


    That's the source data, before the detail bump-map generation! That's not what you see in the runtime.
    Yes, and if I needed to write my own bump-map generator I'd need to export the file to some format, all 100 million polys of it. The proposal on this board is that format be collada. Why limit it's use from day one? I haven't heard a single actual argument against the optimization. Only that you're not complaining about the speed today. That's not an actual argument against the suggestion.

  2. #12
    Quote Originally Posted by greggman
    byte-order independence
    Is a non-issue. Define what it is in the spec and the issue is solved.
    Even if the file format is standardized on, say big-endian, you still need to convert ints floats etc. if you are on a little-endian machine (and vice-versa). So, I'm not sure what you mean here.

    Quote Originally Posted by greggman
    ...If it's such a good thing then I propose we make collada also store the textures.
    There are already good standard 2D formats out there...

    Quote Originally Posted by greggman
    All I'm arguing for is a little foresight. I'm not suggesting a single thing that would make collada less generic, less useful, less anything. I'm only suggesting a simple optimization.
    We're not arguing with your suggestion, (we like suggestions and we're reviewing it ), just trying to put things in perspective:
    Collada will not be the bottleneck for a while:
    The real bottleneck are the current commercial modelers. Some take as much as 60 (yes six-zero) times longer to load a file from their own binary format than it takes the reference Collada implementation to load the same scene.
    Of course that doesn't mean that we are not trying to optimize Collada even further.

    I'm not convinced that your binary-in-encoded-ASCII float representation would be any faster to read than our optimized float parser.
    You'd need to do a lot of this: add (or OR), bit-shift, store on a 4-byte aligned slot, pointer-cast, read from memory etc., instead of just using a float register...
    If you're suggesting inlining the actual binary value of the float, that would of course break XML parsers...
    I'd love to see a speed comparison if you've done one.

  3. #13
    Member
    Join Date
    Aug 2004
    Location
    SCEJ - Tokyo, Japan
    Posts
    34
    Quote Originally Posted by gabor_nagy
    If you're suggesting inlining the actual binary value of the float, that would of course break XML parsers...
    Yes, I'm suggesting inlining the actual binary value of the float, int, etc. It would not break any XML parsers. XML supports the CDATA format exactly for the purpose of storing binary data inside the XML file.

  4. #14
    Quote Originally Posted by greggman
    Yes, I'm suggesting inlining the actual binary value of the float, int, etc. It would not break any XML parsers. XML supports the CDATA format exactly for the purpose of storing binary data inside the XML file.
    You can't put arbitrary bytes in CDATA, because, for example a '<' character will make a parser think that it's the beginning of an XML tag.
    We had this issue when inlining Cg shaders.
    This line can't be in a CDATA section (and it's not even binary data!):
    <code>
    ...
    LDiffuse = LDiffuse < 0.0 ? 0.0 : LDiffuse;
    ...
    </code>

    These 5 characters must be "escaped" in XML content and here's the encoding:

    &lt; < less than
    &gt; > greater than
    &amp; & ampersand
    &apos; ' apostrophe
    &quot; " quotation mark

    See for example:
    http://www.fawcette.com/vsm/2002_11/onl ... _11_05_02/


    So the example would look like this in the file:
    <code>
    ...
    LDiffuse = LDiffuse &lt; 0.0 ? 0.0 : LDiffuse;
    ...
    </code>

    Fortunately XML libraries like LibXML do the encoding/decoding for you, but there is a slight overhead (in addition to the conversion overhead I mentioned earlier) and I'm not sure they could handle binary data there.
    Not to mention that text/XML editors would go crazy if you tried to edit a partially binary file.


    Regards,

    Gabor

  5. #15
    Member
    Join Date
    Aug 2004
    Location
    SCEJ - Tokyo, Japan
    Posts
    34
    Quote Originally Posted by gabor_nagy
    Quote Originally Posted by greggman
    Yes, I'm suggesting inlining the actual binary value of the float, int, etc. It would not break any XML parsers. XML supports the CDATA format exactly for the purpose of storing binary data inside the XML file.
    You can't put arbitrary bytes in CDATA, because, for example a '<' character will make a parser think that it's the beginning of an XML tag.
    That's a bug in your XML parser

    From the XML spec


    2.7 CDATA Sections
    [Definition: CDATA sections MAY occur anywhere character data may occur; they are used to escape blocks of text containing characters which would otherwise be recognized as markup. CDATA sections begin with the string "<![CDATA[" and end with the string "]]>":]

    CDATA Sections
    [18] CDSect ::= CDStart CData CDEnd
    [19] CDStart ::= '<![CDATA['
    [20] CData ::= (Char* - (Char* ']]>' Char*))
    [21] CDEnd ::= ']]>'

    Within a CDATA section, only the CDEnd string is recognized as markup, so that left angle brackets and ampersands may occur in their literal form; they need not (and cannot) be escaped using "&lt;" and "&amp;". CDATA sections cannot nest.

    An example of a CDATA section, in which "<greeting>" and "</greeting>" are recognized as character data, not markup:

    <![CDATA[<greeting>Hello, world!</greeting>]]>
    If you wanted your shaders not to need to be escaped all you needed to do was change them to this

    Code :
    <code> 
    <![CDATA[
    LDiffuse = LDiffuse < 0.0 ? 0.0 : LDiffuse; 
    ]]>
    </code>

    Then no escaping is necessary. You'l see lots of examples of this in RSS syndications. Example

  6. #16
    Thank you for the reference, but I think It's easier to "escape" 5 individual characters than keep checking that you don't have the CDATA terminator in your string...

    I still wouldn't want to inline binary data in an XML file. Besides the fact that it just feels dirty, it would make the file un-editable by most text editors.
    However unlikely, it is possible that arbitrary binary data would produce the CDATA terminating sequence.
    When doing recursive "grep"s, I have many times founds matching "strings" in random data segments of binary files...


    Even with strings, it's a funny thing. Let's say you want to include an "Essay about XML" in a CDATA section.
    Of course the strings for starting and ending a CDATA section would be "taboo", so you'd be referring to them as "you know, that string, I can't say it here, but I'll spell it...".

    The only bullet-proof way I know for storing arbitrary byte sequences directly is in a binary file and with the length of the sequence defined/stored...
    Maybe you can enlighten me.

    Regards,

    Gabor

  7. #17
    Member
    Join Date
    Aug 2004
    Location
    SCEJ - Tokyo, Japan
    Posts
    34
    Maybe you need to go back and read the original proposal.

    I'm not suggesting you can't use text. I'm suggesting the default be binary for speed reasons but that text would still work.

    I'm also not suggesting you have to search for the end of the CDATA, I'm suggesting that given that you know the format since it would be defined in the standard and you know the number of elements in the piece in question since that would also be defined in the standard, then if you know for example you are reading an array, marked as binary of 6000 ints you know that once you see <<[CDATA[ you can read exactly 24000 bytes directly into an array of ints. If you get another array marked as binary with 107237 floats then again, once you see <<[CDATA[ you know you can read 428948 bytes directly into memory. At that point you can check if the next byte sequence is ]]> If it's not your XML is malformed.

    If you want to edit your arrays in a text editor then you can run it through some tool that spits it back out as text or pick "export as text" in your exporter. That will let you have your text but still have the default be much faster to load the file.

    The company that has the largest level sizes to date, Naughty Dog, refused to use a text format in the past precisely because they felt with their data being so big and always getting bigger that that would be an issue. Supporting a binary option would address similar and well grounded fears.

    I feel like this issue is one of those things like using 2 digits for the year or 32 bits for the number of seconds since Jan 1st, 1970. It sounds good at the time since you assume someone will fix it later. Like you assume parsing will be fast enough or you assume CPUs will get faser. Of course that's never been true in the past. We always manage to need more power than we have and we end up having to do stuff to compensate. Why work around this later? Make it fast now and the problem goes away.

    As far as I can tell your only real objection is "ew, that's sounds icky to me".

    The ability to edit these files in a text editor should be a non-issue. You shouldn't be editing middle format files because your edits will get lost every time your artists re-export. Being able to edit as text is great for getting it all working or testing out simple ideas or looking for bugs but that shouldn't be the deciding factor since 99.999% of the time that's not the point.

  8. #18
    Quote Originally Posted by greggman
    ...I'm suggesting that given that you know the format since it would be defined in the standard and you know the number of elements in the piece in question since that would also be defined in the standard, then if you know for example you are reading an array, marked as binary of 6000 ints you know that once you see <<[CDATA[ you can read exactly 24000 bytes directly into an array of intss
    ...
    If you get another array marked as binary with 107237 floats then again, once you see <<[CDATA[ you know you can read 428948 bytes directly into memory. At that point you can check if the next byte sequence is ]]> If it's not your XML is malformed.
    Unfortunately an XML library would not know about this construct, so this would eliminate the major advantage of using a standard format (XML) and force people to implement the whole XML parser from scratch and hard wire it for Collada.

    While it's not rocket science (I sure have written my share of XML parsers), many people like to use libXML2, because it does a big chunk of the work

    This sounds an awful lot like Microsoft's "embrace and extend" scheme which you can't possibly be advocating (look what they did to Java...)...

    If a compliant XML parser can't parse the file with a 100% certainty, it in my mind it's not an XML file.

    For example, you would not be able to look at the file in Mozilla's graphical XML viewer which we use quite frequently to examine the structure of a file (you can collapse elements etc.).

    This is what I meant by "dirty". No, it is not a "feeling". It means that it is technically unacceptable, because it breaks things and it is unreliable.


    So, do we agree that straight binary in XML is bad?


    If so, we can go back to the ASCII encoding...

  9. #19
    Member
    Join Date
    Aug 2004
    Location
    SCEJ - Tokyo, Japan
    Posts
    34
    so, do we agree that straight binary in XML is bad?
    No, we don't agree. CDATA was designed specifically to allow binary in XML so suggesting putting binary in XML is suggesting we use XML as it was designed to be used.

    Maybe I'm missing something but so far, the way the Collada schema is designed, when I use any XML library I've used to date and I look up an array element from an XML file, all I'm going to get is a pointer to a large string which I have to then manually parse on my own. Under my suggestion, that string would be binary data which I could copy diretly into an array of whatever type the array is.

    Or does libxml2 actually convert array to a contiguous array of floats for you? If not there is no difference between a proprietary string of ascii data and a proprietary piece of binary data as far as an off the shelf XML lib is concerned.

    As for your comment that binary would break looking at the files in Mozilla. Don't you think first and foremost the concentration should be one whether or not the format facilitates making games, not on whether or not it can be looked at in some non game related browser?

  10. #20
    Quote Originally Posted by greggman
    so, do we agree that straight binary in XML is bad?
    No, we don't agree. CDATA was designed specifically to allow binary in XML so suggesting putting binary in XML is suggesting we use XML as it was designed to be used.
    Well, designs are sometimes flawed...
    Unfortunately it is not fully reliable, as many people in the field know:
    http://builder.com.com/5100-6374-1050529.html
    http://webservices.xml.com/pub/a/ws/200 ... oints.html
    etc. (just do a Web search... )

    As I said, you need to know the lenght of a binary chunk to parse/skip it.
    The terminating sequence scheme would only work if CDATA would let ME (or the XML library) specify it when I save the data. That way I could pick a sequence that my binary chunk does not contain for sure.
    I guess the designers of XML did not think of this.

    You could still save the float array in binary format in an external file. If that file only contains raw floats, we don't need to invent a new binary sidekick to Collada.

Page 2 of 5 FirstFirst 12345 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •