Difference between revisions of "ColladaDOM 3"

From COLLADA Public Wiki
Jump to navigation Jump to search
Line 194: Line 194:
*There are open questions around simple-type content models that also appear to have text punctuated by content such as comments and processing-instructions. At present this can appear ambiguous due to the handling of default values, and there is no recourse if a text is not convertible to the underlying binary data-type of the simple-type value.
*There are open questions around simple-type content models that also appear to have text punctuated by content such as comments and processing-instructions. At present this can appear ambiguous due to the handling of default values, and there is no recourse if a text is not convertible to the underlying binary data-type of the simple-type value.
*Furthermore, the default values of simple-type content are difficult to attribute due to how XML Schema assigns the values to the child, and so the default for the type of value (unlike with attributes) is dependent on the child, and not the type of the value, nor the type of the element itself.
*☑Furthermore, the default values of simple-type content are difficult to attribute due to how XML Schema assigns the values to the child, and so the default for the type of value (unlike with attributes) is dependent on the child, and not the type of the value, nor the type of the element itself. '''Functional.'''
*Change-notices are always issued by changes that go through the metadata back-end. Changes that move/create element-based content always go through the back-end, because they are exported at least until the contents-array insertion routines can leverage compile-time-constants to insert directly into the content-model via the ordinal framework. (This is itself a future objective.)
*Change-notices are always issued by changes that go through the metadata back-end. Changes that move/create element-based content always go through the back-end, because they are exported at least until the contents-array insertion routines can leverage compile-time-constants to insert directly into the content-model via the ordinal framework. (This is itself a future objective.)

Revision as of 09:19, 17 November 2020

Product information
  • Name: ColladaDOM 3 (COLLADA-DOM)
  • Purpose: XML Schema in modern C++
  • Last updated: February 16, 2019
  • Current version: 2.5.0
  • Status: non-Release/Preview-only
  • OS: Platform-Independent
  • Forum
  • Report bugs
  • Maintainer: Mick Pearson
  • Contact for technical issues:


This article is part of the COLLADA products directory
Adding to the list of products

The COLLADA Document Object Model (COLLADA DOM) is an application programming interface (API) that provides a C++ object representation of a COLLADA XML instance document. ColladaDOM 3 "rebrands" this label--stylizing it--in order to draw attention to the sharp contrast between it and previous versions of the library, starting with version 2.5.0.

The DOM was created by Sony Computer Entertainment America using a code generator directly from the COLLADA schema. The DOM is open source and available for download on Sourceforge. It has since been systematically reconstituted to a degree that bears little resemblance to the original code, but retains backward compatibility with it. This work was done in the public domain, for all. It is a good question as to whether or not the original copyright claims pertain still, but it is not of importance, and the library remains inspired by the workings of the original and indebted to and continually interested in its legacy.

A cyberspace manifesto in brief

This page is a one page user guide to the new library. It's hoped that one page suffices. But before getting into code examples, a short explanation of some of the terminology and predicaments entailed:

COLLADA was a valiant effort in the middle of the first decade of the new millennium. It failed to catch on. Its afterimage is in the form of various import and export plug-in utilities for various software. Whether or not it ever achieved in-house status as a pre/post-processing communication format anywhere is difficult to say, and if so, does it matter? Import/export support is limited, and fails to realize a fraction of what the manual imagines for COLLADA, and indeed anything of great use in real world terms.

Version 2.5 (ColladaDOM 3) tries to rectify this by placing an emphasis on lossless transformation and ease of use from the user's position, in order to better encourage and facilitate meaningful applications of COLLADA. It further expands the mission of COLLADA by imagining it as a future standard 3-D model format--a role that many take for granted, being unaware of subtle distinctions in terms of the designers original intent. It puts forward that this role, as a storage format, is more important in the coming years, especially as it concerns noncommercial applications.

The original choice to apply XML Schema to COLLADA may have felt at the time as if it was an easing of a burden. Yet in retrospect, it actually establishes a very high bar for software development. Indeed, one that it is impractical to take lightly. For this reason, it seems as if developers could use a good amount of assistance. The new library approaches this in two ways: The first is simply to offer features that had been missing, which no XML document is complete without. This includes supporting externally defined XML schemas, as prescribed by the COLLADA manual and organization itself. The second is to use modern C++ idioms to form a maximally expressive grammar for writing algorithms against the "DOM." The remainder of this section focuses on this.

The C++ Standard Library has grown both in terms of concreteness and acceptance since the advent of the original COLLADA-DOM library. But also a "DOM" is a creature originally of HTML and incubated by JavaScript. To this end, the library leverages the "operator overloading" feature of C++ to produce a language neutral syntax for manipulating the "DOM" in code that is significantly more convenient than The C++ Standard Library, and even JavaScript itself, but having more in common with high-level interpreted programming languages than the rigid strictures of the C++ Standard Library. And in addition to this there is an even greater need for a symbolic expressiveness that owes much to the need to reserve the use of language (identifiers) to the schemas themselves.

This quality of the new library is called "DAEP" or: Digital Artifact Excavation Protocol. It can alternatively use the less-neutral "digital asset exchange" (dae) language as originally established by COLLADA. It aims to establish an abstract/portable API, which COLLADA-DOM is but one implementation thereof. There also exists a twin API known as DAPP that is not part of COLLADA-DOM, where the P stands for Preservation. DAPP's role is to convert old-world "digital artifacts" into capital D, capital A, "Digital Artifacts." DAEP is not a layer on top of COLLADA in so much as it merely brings the interface closer to working natively with an XML document in a "WYSIWYG" ethos. It trades the language of methods (function identifiers) with C++ operators, and informs free-form code--as one expects from operators.

Large schemas like COLLADA are hard to make sense of. The DAEP philosophy is to leverage code-completion and comments alongside the strong-type correctness nurtured by C++ in order to bring work with such schema into reach. The reason this is required now is so to establish a long sought after all encompassing lingua franca for 3-D model data (digital artifacts) in order to prepare a path to a modern 3-D era at home in the noncommercial sphere. This basic notion it refers to as "cyberspace:" a term that's long been misapplied only to defy the ability to formulate a working model, or to relabel things which already exist and as such have no import. COLLADA is the most obvious--but not the only--vector for realizing this space. To this end it must be a platonic space of works of art, and not a "place" as often conceived, like the WWW or Internet, or not a landscape in new clothes, but rather a platonic space of digital artifacts.

This is a trajectory for COLLADA but by no means its only modern application. At this moment there is no noncommercial storage format if one is required. That is less grandiose than "cyberspace" but no less important, and in fact it is a natural predecessor to a cyberspace century. In order to speak meaningfully in this language there must be an ecosystem of applications wherein each must be able to faithfully transform the language, even if it cannot or does not need to comprehend it. If this basic end is not met then it is as if there is a legendary Tower of Babel where all is lost in translation. This is the first order of business. Secondary is supporting a feature-rich, multi-representational conception of COLLADA documents, that is more like the WWW than a tradition import/export framework; only that 3-D is a much taller order than the 2-D printed page. We must develop a more complete understanding of COLLADA and its analogues, so that we may appreciate the challenges we face ahead.

Only once all of this is accomplished is there reason at all to hammer out a new COLLADA-like standard. Without the software means, then what good is another standard? This is the more important work. This is the work of our day.

Quick Start Guide

The first thing that is different about version 2.5 is the addition of a C++ namespace that is called COLLADA but is actually an alias to the ColladaDOM_3 namespace used by the linker. The "dae" prefix is retained by low-level APIs. The new generator offers two flavors: 2 and 3. 2 is backward compatible, while 3 does not use the low-level APIs--as they have the effect of polluting the namespace.

There are some reserved second-level namespaces, however when an XML Schema is added by a #include directive, a second-level namespace is created for it. C++ is not designed to do this, so it is necessary to define a special C Preprocessor macro called COLLADA__x__namespace where the x is a C identifier that is generated based on the targetNamespace attribute of active <xs:schema> element:

#include <ColladaDOM.inl>
#define COLLADA__inline__
#define COLLADA__http_www_collada_org_2008_03_COLLADASchema__namespace \
#include <COLLADA 1.5.0/COLLADA.h>

In the above example an additional COLLADA_DOM_NICKNAME macro assigns a short name (Collada08) to be used by C++. This makes the identifiers shorter, which can make printouts and pop-up titles easier to read. The macro adds the longer name as an alias. (The generator cannot know the short name, and so depends on the alias.)

#include <ColladaDOM.inl> includes the entire library. Just include the whole thing, it's not much of a thing. COLLADA__inline__ must also be defined. It controls the linkage of the metadata registration routines. If blank the routines will have external linkage, or default linkage. If it is inline then inline linkage is used. This can affect the compiler's performance and the sizes of the outputted object-files.

When external linkage is used, it is necessary to control which translation-unit contains the definitions. This is done with the COLLADA_DOM_LITE macro. If it is defined, the definitions are not included in the current translation unit. ("Lite" is a U.S. corruption of "light.")

#include <COLLADA 1.5.0/COLLADA.h>[1] is the generated header for the root node of the schema. Its inclusion has the side-effect of recursively including the rest of the schema. It is important to understand that this file is not part of the COLLADA-DOM package/library. It is user-generated/installed--although pregenerated files[2] are provided as supplemental material, not packaged with the library source code. The mechanism provided to mask parts of the schema, in order to improve compilation wait times, is to combine COLLADA_DOM_LITE with include guards, like so:

#define __profile_bridge_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_gles2_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_glsl_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_cg_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_gles_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__

The above example prevents inclusion of the FX profiles, which are quite large chunks of the COLLADA schemas. The class definitions do not depend on the definitions of their children, however the metadata registration routines do so, presently, and this is why COLLADA_DOM_LITE is needed to not include those. The include-guards should just be copied from the generator output. This is not how include-guards are traditionally used. In fact, by defining them ahead of the recursive inclusion operation, the effect is to exclude.

Note that this page is a Quick Start Guide only. After a project is up and running, it's possible to begin exploring the further ins-and-outs of the core library directly and by example. Working with user-generated schemas across static libraries can present unique challenges. The Viewer project packaged with the core library coordinates this by locating config.h files organized in a central xmlns folder wherein each C "mangled" targetNamespace is assigned a folder. Their config.h files define the above namespaces and another macro that is used to locate the user's preferred install-base (in order to #include its headers.) (This somewhat unorthodox approach is justified by not defining and redefining long, multi-line build-parameters for a tree of static-libraries at every level; and by not artificially imposing strictures on where user-generated headers are located, what they contain, nor how they are used.)

Visual Studio 2019, C++17, and /bigobj

Currently only VS2019 can build the library with P0127R1 C++17 auto-template feature. The /bigobj flag is still recommended for the metadata registration object. Advise for old revisions and VS versions are here[3]. Note, VS2017 has the necessary features but its compiler behaves randomly, indicating programmer error.


The new library's design lends itself to data structures that do not lend themselves to memory inspection. There is a NatvisFile.natvis file located somewhere within the core library files, that if added to newer Visual Studio projects will enable a rich memory inspection experience when debugging against many of the data structures of the core library, including the procedurally generated classes representing XML like content. For example, it will iterate over the contents-arrays displaying a view of the container that is visually approximate to an XML file.

GCC and Clang since C++17, rev. 940

GCC 7 could build the library. Sometime until GCC 10 casting pointer-to-member to a different class became disallowed, and Clang never supported casting in any form, and so never worked. Adopting P0127R1 was the forward looking solution to this breaking-change, that should enable Clang, although it's never been built with Clang. GCC 10 on the other hand uses too much memory and takes 30 to 50 times longer to build than Visual Studio. When debug building COLLADA's XML Schema GCC 10 fails to build even the basic precompiled header that doesn't include the metadata registration routines, before dying from memory exhaustion on an 8GB of ram system. With optimization enabled it can build the 3D app and reference-implementation, but fails to execute. It probably has a small bug, owing to one of a few changes since the previous revision or changes to GCC itself, trivial to debug with a debug build.

All of this paints a picture of GCC not quite being up to the task. The two COLLADA schemata are each less than 1MB combined. There's no reason a compiler should struggle to build C++ files generated from a comparatively small schema. It seems that C++ templates could be built a couple orders of magnitude more efficiently, at least in this case. Visual Studio could do the work 10 times quicker, and GCC should be able to 100 times quicker and with much less memory bloat. The good news is there's much room for improvement.

The full build time is on the order of 5 to 10 minutes. Many projects have longer build times, so it seems the problem is the ballooning memory. The schema files seem to take longer to compile than the 3D processing component. That suggests it shouldn't take so long, with so much more happening in the processing code.

Portable DAEP APIs versus Low-Level "dae" Interfaces

The portable/abstract DAEP interfaces do not extend beyond user defined document transformation routines. In order to instantiate the central daeDOM class (previously called DAE. This name is now deprecated) it is now necessary to define a daePlatform interface. The new library is platform agnostic, and does not provide defaults for any operating systems or use cases. Neither is a daeIO implementation provided.

Some skeletal example implementations of those user provided interfaces will surely accrue in time. These classes and more are laboriously documented within the source code in a Doxygen style. In theory it is possible to generate documentation, but of utmost importance is how comments appear inside of pop-up titles in the code editors.

Harder to document are overloaded C++ operators. DAEP defines a number of them, and they can be used with backward compatible class headers even though previously unavailable. In the old-style headers it's now possible to access the attributes and children directly via their data-members, albeit with camel-case prefixing. In addition to operators, methods that ape the C++ Standard Library are also part of DAEP. There are macros for excluding non-DAEP methods, but the library depends on many of those, but the goal is to gradually exclude more and more in order to not pollute the namespaces. Though the methods are often useful too. It might be nice to figure out a middle ground. In general, at this early stage, DAEP is still very experimental for a "portable/abstract" API.


If XML children and values share names, there is a name-clash. Also two built-in names are value and content. Clashing data-members have suffixes added to their names. An overloaded-method is added to the class that permits access to the data-members via the original name--albeit with () instead. The overload selectors are DAEP::ATTRIBUTE, DAEP::ELEMENT, and DAEP::CONTENT.

Data-members implement a pass-through parenthetical operator() for use in generic-programming. Because fixes (suffix/prefixes) are an inescapable fact of mapping schemata to C++'s identifier-space it's difficult to make a form-over-function argument in favor of () where a suffix is sufficient.


This operator denotes a pointer. Children enjoy C array like semantics, which means the first child that shares a name with its siblings can be accessed as if it is a pointer. The converse also applies vis-a-vis the zeroth subscript of operator[].

If a logical child is not const then accessing it via this operator creates an instance of the corresponding element and assigns it to that child. This is an automatic process, such that the old add method that receives a space separated chain of child names can be replaced with a chain of -> operators, except for the last child in the chain, which requires #operator+. (The + in that case takes the place of the word "add" in the older construction.)

For attributes and simple-type values, this operator depends on if change-notices are enabled for the value. If so it always returns a const pointer to ensure code does not inadvertently circumvent the notification framework.

Deferred dereferencing of const operator-> when chained

When an element or document is accessed via a const view, it seems like a good idea, for better or worse, for this library to help facilitate better code by not dereferencing const -> chains. This feature would not be complete without the const pointer conversion operator--logically the chain must end, and the first link in the chain is also a chain nevertheless.

No other facilities work in this way by design; not even operator* (not described here.)

Deferred dereferencing of const operator->* values & attributes

Yet another special operator is ->*. It came as a surprise. It was invented to simulate the C++ x?y:z so-called "ternary" operator in order to work around the DAEP::Value template which in extreme cases can preclude deduction of the underlying type. (This can be compiler implementation dependent.)

It turned out--however--to have a much more practical use, because it is able to extract the XML attribute and value data from a nonviable (nullptr) object, by providing a default value as its second operand. In this way it's possible to query any non-subscript value within the const document if it exists or not.

->* is best thought of as ending a -> chain with a value or attribute. It is part of a family of operators that does not include #operator[]. To use [] it is necessary to dereference the contents-array.

//Example. If any successive node does not exist, then use Y_UP.
RT::Main.UpAxis = COLLADA->asset->up_axis->value->*RT::Up::Y_UP;


Accesses children in an array, and can be used with children with names that can only appear once as well. The subscript can be any number, however presently use of #operator+ and #operator-> is constrained to the subscripts of existing children, or one more, such that the subscript would create a child on the back of the array if not const.

Note that this is of special interest because the the new library introduces a contents-array model that is universal. This model replaces the old practice of storing children in individual arrays in favor of having a linear arrays that is sorted according to the parent element's content and can also include text-content and other kinds of nodes, which the new library considers to be "text" and which the old library had not facilitated.

For attributes and simple-type values, this operator uses the underlying value's own [] operator. Like -> if the value is subject to change-notices, the subscript is returned by const reference. Per-subscript change notices were tried, but were ultimately deemed inelegant and not worthwhile. In the future [] may be generalized to all data-types, but it is not the same as <xs:list> items; for example, a string uses [] to access its individual codepoints. [] is closer to C++ than XML Schema.


This operator in its prefix form replaces the older add method. It does not do arithmetic. It adds children to the content of a parent element. Note that for attributes and simple-type values arithmetic operators should work as expected owing to implicit conversion operators, which are not highlighted in this wiki page.

It's important to note that for names of children which may appear more than once, + only operates on the first so named child, and if the first--or only--child already exists, then it will not--it cannot--be added.


This operator in its prefix form is only available for child names for which there may be more than one child so named. It is functionally equivalent to accessing the "Nth" subscript and using operator+ on it, where N is the current count of the children with the name in question.

operator=("") or symbolic interpretation of the "" string-literal

The "" string-literal (technically const daeStringCP[1]) has special symbolic meaning when assigned to classes that implement DAEP.

It removes elements by converting them into an empty text node. This works differently from assigning another element or nullptr in that no element-or-placeholder takes the place of the assigned/removed element.

It removes simple-type values trivially, since the absence of a value is unambiguous. The xs::anySimpleType value takes on the VOID representation that is synonymous with absence.

XML attributes are a complicating factor in-that XML allows string and list type data to be represented by an empty value, that in XML is written identically to "". In order to accomplish the XML code my-attribute="" applicable datatypes possess a clear() method. C-pointer and non-const, non-size-1 C-array are able to represent the empty string. xs::string has C-pointer datatype. xs::string_view is a C-pointer, size tuple, which if assigned the special nullptr value changes the xs::anySimpleType representation to VOID. The next/final paragraph speaks on VOID attributes:

"" assignment reverts to the schemata provided default value. If no value is provided, a default value is equivalent to zero-initialization or default-initialization. Removing attributes with "" unsets write-masks. Accessing or assigning attributes sets write-masks. Plugins consult write-masks in writing documents. Schema-backed attributes have write-masks. The rest use variant representation, wherein VOID is semantically identical to unsetting the attribute's write-mask. write_mask() is able to retrieve an attribute's write-mask object.

Elements from thin air

The daeSmartRef class has been extended to act as a "factory" for the type of its stored pointer. This use case is invoked by constructing it with a C++ reference to an instance of daeDOM. This new technique makes it possible to return a free element, or element and content, to be added to a document or other container by the receiver of the returned element.

Note that daeSmartRef has many guises. (It is often a typedef or base class.)

QName & NCName assignment with operator->*

daeSmartRef (and related types) use operator->* to obtain an xs::complexType reference. ->* is activated by the appearance of a C-string object on the right side. The value of the string is not meaningful and looks like ref->*xs::QName(). Because of name-clash considerations, there is not an actual data-member for the names of elements. It's seldom necessary to assign names to elements, since named children assume correct names upon insertion into their parent element's contents-array. Although operator precedence-and-associativity differ for ->* this notation is meant to be suggestive of a "getter" accessor with an asterisk (*) departing from the attributes and children identifier-space.

An xs::complexType can convert to an NCName string (xs::string_view) or to xmlns datatype that is an xmlns namespace, prefix tuple. The xmlns::tuple datatype combines xmlns with an NCName string-view. These types can be assigned to xs::complexType to change an element's name and/or prefix. Tuples can be constructed with xmlns::operator() and one-off prefix assignment can be accomplished with C++11's std::initializer_list which can also include an NCName. If doing multiple assignments it's more efficient to prepare an xmlns object.

Other than this xs::complexType can access low-level non-DAEP APIs via its own operator-> overload and generally behaves like an xs::any smart-pointer; except that xs::any is not guaranteed to be a pointer to the low-level (non-portable) element object, whereas xs::complexType does guarantee this. (getNCName() is an example of a low-level API offered by COLLADA-DOM's daeElement object.)

ref->*xs::complexType() is a valid construction because it is convertible to a C-string (the element's NCName C-string.) xs::QName() works because it is no more than a C-string itself; as all string types are defined to be. (String values use a daeStringRef storage type that is binary compatible with a C-string, but possesses a header.) Of course xs::NCName() works as well. The choice is a form of programmer's notation: meaning that assigning a string to "xs::QName()" assigns an NCName! And vice versa (xmlns to "xs::NCName()".) These are offered as portable DAEP solutions to the name-clash problem, by repurposing the xs namespace put forth by XML Schema.

Template Specialization

In addition to operators and Standard Library like methods, COLLADA::DAEP offers some "template metaprogramming" like facilities that can permit users to drop in custom type definitions. The extension process is portable, but the class definitions themselves are not.

The change-notification framework is controlled in the same way. Change notices are enabled for all "id" and "sid" attributes by default.

Legacy Features

The new library takes a hard line on legacy features. You can say that it knows what it is, and it knows what it's not, and it only supports legacy features because they had previously existed for the most part. Often the features do offer some instruction in terms of how the library may be used.

Where it is necessary these features are controlled by the getLegacy method of daePlatform.

Future Work

  • ☑The library very much requires a facility that matches <xs:anyAttribute>. This would greatly simplify implementation of the old domAny class as it relates to all other elements, which at present must bend somewhat to make domAny work. This facility will not be limited to schema that use this element, so to see that users are not forced to shed data that conflicts with the active schema for whatever purpose. Completed.
  • ☑There is a plan to introduce qualified names (QName) via a tag field that all smart-ref addressable objects have. The prefix itself is not to be stored in the element data. Completed.
  • ☑The xmlns pseudo-attribute needs to be supported in the form of multiple qualified instances of itself. Presently the library can only support naked instances of the attribute. It seems apparent the library had never implemented COLLADA's <technique> elements prior to 2.5. (Their sole purpose is to introduce a one or more qualified xmlns declarations.) Completed.
  • There are open questions around simple-type content models that also appear to have text punctuated by content such as comments and processing-instructions. At present this can appear ambiguous due to the handling of default values, and there is no recourse if a text is not convertible to the underlying binary data-type of the simple-type value.
  • ☑Furthermore, the default values of simple-type content are difficult to attribute due to how XML Schema assigns the values to the child, and so the default for the type of value (unlike with attributes) is dependent on the child, and not the type of the value, nor the type of the element itself. Functional.
  • Change-notices are always issued by changes that go through the metadata back-end. Changes that move/create element-based content always go through the back-end, because they are exported at least until the contents-array insertion routines can leverage compile-time-constants to insert directly into the content-model via the ordinal framework. (This is itself a future objective.)
To remedy this it is necessary to add some bits to either the metadata or element data or both that the schema-unaware routines are able to work with. Update: Element insertion/migration change-notices are always issued by the new namespace architecture. In order to not issue a notice would require a check to ensure the namespace relationship is trivial first. As such the question of selective change-notices is narrowed to the value spaces of attributes and simple-type content. Version 2.5 won't be released absent this feature.
  • Currently the type of an element is determined by a pointer associated with its metadata. This is limited to a single module unless special care is taken to export the definitions. One future approach is to assign a global identifier to binary-compatible, lexically appropriate metadata.

See also

External links

COLLADA DOM - Version 2.4 Historical Reference
List of main articles under the DOM portal.
User Guide chapters:  • Intro  • Architecture  • Setting up  • Working with documents  • Creating docs  • Importing docs  • Representing elements  • Working with elements  • Resolving URIs  • Resolving SIDs  • Using custom COLLADA data  • Integration templates  • Error handling

Systems:  • URI resolver  • Meta  • Load/save flow  • Runtime database  • Memory • StringRef  • Code generator
Additional information:  • What's new  • Backward compatibility  • Future work
Terminology categories:  • COLLADA  • DOM  • XML