ColladaDOM 3

From COLLADA Public Wiki
Jump to navigation Jump to search

This page is in the process of being written over a two or three day period.

Product information
  • Name: ColladaDOM 3 (COLLADA-DOM)
  • Purpose: XML Schema in modern C++
  • Last updated: January 8, 2017
  • Current version: 2.5.0
  • Status: non-Release/Preview-only
  • OS: Platform-Independent
  • Forum
  • Report bugs
  • Maintainer: Mick Pearson
  • Contact for technical issues:


This article is part of the COLLADA products directory
Adding to the list of products

The COLLADA Document Object Model (COLLADA DOM) is an application programming interface (API) that provides a C++ object representation of a COLLADA XML instance document. ColladaDOM 3 "rebrands" this label--stylizing it--in order to draw attention to the sharp contrast between it and previous versions of the library, starting with version 2.5.0.

The DOM was created by Sony Computer Entertainment America using a code generator directly from the COLLADA schema. The DOM is open source and available for download on Sourceforge. It has since been systematically reconstituted to a degree that bears little resemblance to the original code, but retains backward compatibility with it. This work was done in the public domain, for all. It is a good question as to whether or not the original copyright claims pertain still, but it is not of importance, and the library remains inspired by the workings of the original and indebted to and continually interested in its legacy.

A cyberspace manifesto in brief

This page is a one page user guide to the new library. It's hoped that one page suffices. But before getting into code examples, a short explanation of some of the terminology and predicaments entailed:

COLLADA was a valiant effort in the middle of the first decade of the new millennium. It failed to catch on. Its afterimage is in the form of various import and export plug-in utilities for various software. Whether or not it ever achieved in-house status as a pre/post-processing communication format anywhere is difficult to say, and if so, does it matter? Import/export support is limited, and fails to realize a fraction of what the manual imagines for COLLADA, and indeed anything of great use in real world terms.

Version 2.5 (ColladaDOM 3) tries to rectify this by placing an emphasis on lossless transformation and ease of use from the user's position, in order to better encourage and facilitate meaningful applications of COLLADA. It further expands the mission of COLLADA by imagining it as a future standard 3-D model format--a role that many take for granted, being unaware of subtle distinctions in terms of the designers original intent. It puts forward that this role, as a storage format, is more important in the coming years, especially as it concerns noncommercial applications.

The original choice to apply XML Schema to COLLADA may have felt at the time as if it was an easing of a burden. Yet in retrospect, it actually establishes a very high bar for software development. Indeed, one that it is impractical to take lightly. For this reason, it seems as if developers could use a good amount of assistance. The new library approaches this in two ways: The first is simply to offer features that had been missing, which no XML document is complete without. This includes supporting externally defined XML schemas, as prescribed by the COLLADA manual and organization itself. The second is to use modern C++ idioms to form a maximally expressive grammar for writing algorithms against the "DOM." The remainder of this section focuses on this.

The C++ Standard Library has grown both in terms of concreteness and acceptance since the advent of the original COLLADA-DOM library. But also a "DOM" is a creature originally of HTML and incubated by JavaScript. To this end, the library leverages the "operator overloading" feature of C++ to produce a language neutral syntax for manipulating the "DOM" in code that is significantly more convenient than The C++ Standard Library, and even JavaScript itself, but having more in common with high-level interpreted programming languages than the rigid strictures of the C++ Standard Library. And in addition to this there is an even greater need for a symbolic expressiveness that owes much to the need to reserve the use of language (identifiers) to the schemas themselves.

This quality of the new library is called "DAEP" or: Digital Artifact Excavation Protocol. It can alternatively use the less-neutral "digital asset exchange" (dae) language as originally established by COLLADA. It aims to establish an abstract/portable API, which COLLADA-DOM is but one implementation thereof. There also exists a twin API known as DAPP that is not part of COLLADA-DOM, where the P stands for Preservation. DAPP's role is to convert old-world "digital artifacts" into capital D, capital A, "Digital Artifacts." DAEP is not a layer on top of COLLADA in so much as it merely brings the interface closer to working natively with an XML document in a "WYSIWYG" ethos. It trades the language of methods (function identifiers) with C++ operators, and informs free-form code--as one expects from operators.

Large schemas like COLLADA are hard to make sense of. The DAEP philosophy is to leverage code-completion and comments alongside the strong-type correctness nurtured by C++ in order to bring work with such schema into reach. The reason this is required now is so to establish a long sought after all encompassing lingua franca for 3-D model data (digital artifacts) in order to prepare a path to a modern 3-D era situated foremost in noncommercial spheres. This basic notion it refers to as "cyberspace:" a term that's long been misapplied only to defy the ability to formulate a working model, or to relabel things which already exist and as such have no import. COLLADA is the most obvious--but not the only--vector for realizing this space. To this end it must be a platonic space of works of art, and not a "place" as often conceived, like the WWW or Internet, or not a landscape in new clothes, but rather a platonic space of digital artifacts.

This is a trajectory for COLLADA but by no means its only modern application. At this moment there is no noncommercial storage format if one is required. That is less grandiose than "cyberspace" but no less important, and in fact it is a natural predecessor to a cyberspace century. In order to speak meaningfully in this language there must be an ecosystem of applications wherein each must be able to faithfully transform the language, even if it cannot or does not need to comprehend it. If this basic end is not met then it is as if there is a legendary Tower of Babel where all is lost in translation. This is the first order of business. Secondary is supporting a feature-rich, multi-representational conception of COLLADA documents, that is more like the WWW than a tradition import/export framework; only that 3-D is a much taller order than the 2-D printed page. We must develop a more complete understanding of COLLADA and its analogues, so that we may appreciate the challenges we face ahead.

Only once all of this is accomplished is there reason at all to hammer out a new COLLADA-like standard. Without the software means, then what good is another standard? This is the more important work. This is the work of our day.

Quick Start Guide

The first thing that is different about version 2.5 is the addition of a C++ namespace that is called COLLADA but is actually an alias to the ColladaDOM_3 namespace used by the linker. The "dae" prefix is retained by low-level APIs. The new generator offers two flavors: 2 and 3. 2 is backward compatible, while 3 does not use the low-level APIs--as they have the effect of polluting the namespace.

There are some reserved second-level namespaces, however when an XML Schema is added by a #include directive, a second-level namespace is created for it. C++ is not designed to do this, so it is necessary to define a special C Preprocessor macro called COLLADA__x__namespace where the x is a C identifier that is generated based on the targetNamespace attribute of active <xs:schema> element:

#include <ColladaDOM.inl>
#define COLLADA__inline__
#define COLLADA__http_www_collada_org_2008_03_COLLADASchema__namespace \
#include <COLLADA 1.5.0/COLLADA.h>

In the above example an additional COLLADA_DOM_NICKNAME macro assigns a short name (Collada08) to be used by C++. This makes the identifiers shorter, which can make printouts and pop-up titles easier to read. The macro adds the longer name as an alias. (The generator cannot know the short name, and so depends on the alias.)

#include <ColladaDOM.inl> includes the entire library. Just include the whole thing, it's not much of a thing. COLLADA__inline__ must also be defined. It controls the linkage of the metadata registration routines. If blank the routines will have external linkage, or default linkage. If it is inline then inline linkage is used. This can affect the compiler's performance and the sizes of the outputted object-files.

When external linkage is used, it is necessary to control which translation-unit contains the definitions. This is done with the COLLADA_DOM_LITE macro. If it is defined, the definitions are not included in the current translation unit. ("Lite" is a U.S. corruption of "light.")

#include <COLLADA 1.5.0/COLLADA.h> is the generated header for the root node of the schema. Its inclusion has the side-effect of recursively including the rest of the schema. It is important to understand that this file is not part of the COLLADA-DOM package/library. It is user-generated/installed--although pregenerated files are provided as supplemental material, not packaged with the library source code. The mechanism provided to mask parts of the schema, in order to improve compilation wait times, is to combine COLLADA_DOM_LITE with include guards, like so:

#define __profile_bridge_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_gles2_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_glsl_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_cg_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__
#define __profile_gles_type_h__http_www_collada_org_2008_03_COLLADASchema__ColladaDOM_g1__

The above example prevents inclusion of the FX profiles, which are quite large chunks of the COLLADA schemas. The class definitions do not depend on the definitions of their children, however the metadata registration routines do so, presently, and this is why COLLADA_DOM_LITE is needed to not include those. The include-guards should just be copied from the generator output. This is not how include-guards are traditionally used. In fact, by defining them ahead of the recursive inclusion operation, the effect is to exclude.

Note that this page is a Quick Start Guide only. After a project is up and running, it's possible to begin exploring the further ins-and-outs of the core library directly and by example. Working with user-generated schemas across static libraries can present unique challenges. The Viewer project packaged with the core library coordinates this by locating config.h files organized in a central xmlns folder wherein each C "mangled" targetNamespace is assigned a folder. Their config.h files define the above namespaces and another macro that is used to locate the user's preferred install-base (in order to #include its headers.) (This somewhat unorthodox approach is justified by not defining and redefining long, multi-line build-parameters for a tree of static-libraries at every level; and by not artificially imposing strictures on where user-generated headers are located, what they contain, nor how they are used.)

Visual Studio

Microsoft's Visual Studio compilers prior to the 2015 edition have very long wait times, owing to the use of templates to greatly simplify the generator output. If older compilers are used to iterate (and not simply to do a one off compile of a a package/release) then these long waits can be infeasible. The newer compiler is very quick, and the waits may not be a problem even if inline linkage is used, but still, understanding the C++ inclusion model can affect "compile-times" considerably whether on the order of minutes or seconds, depending on need.

2013 and older compilers (/bigobj & /Zm)

The Visual Studio compilers will also likely require the /bigobj flag, as its documentation describes the COLLADA-DOM library precisely; and the older versions probably will require the /ZmN option, where N is a larger number than the default. The compile operation will hint that these options are appropriate.

The /bigobj option can be enabled for just one translation unit; a precompiled-header for example (of which more than one may be required in order to best work with the procedually generated schemas.) The /Zm option may need to be consistent project wide. If files are larger than 500MB then older versions of Visual Studio may generate them and then fail to map them into 32-bit memory. This is observed with the 2010 edition.


The new library's design lends itself to data structures that do not lend themselves to memory inspection. There is a NatvisFile.natvis file located somewhere within the core library files, that if added to newer Visual Studio projects will enable a rich memory inspection experience when debugging against many of the data structures of the core library, including the procedurally generated classes representing XML like content. For example, it will iterate over the contents-arrays displaying a view of the container that is visually approximate to an XML file.


#define nullptr 0 must be done if the nullptr keyword is unavailable. It is used as a compromise between the extremes of using 0 or NULL. The C++ Standard neglected to provide a way to test that it exists. It may be necessary to #undef any macros that impeded the library or included schemas and or to "push/pop" the C Preprocessor state.

Portable DAEP APIs versus Low-Level "dae" Interfaces

The portable/abstract DAEP interfaces do not extend beyond user defined document transformation routines. In order to instantiate the central daeDOM class (previously called DAE. This name is now deprecated) it is now necessary to define a daePlatform interface. The new library is platform agnostic, and does not provide defaults for any operating systems or use cases. Neither is a daeIO implementation provided.

Some skeletal example implementations of those user provided interfaces will surely accrue in time. These classes and more are laboriously documented within the source code in a Doxygen style. In theory it is possible to generate documentation, but of utmost importance is how comments appear inside of pop-up titles in the code editors.

Harder to document are overloaded C++ operators. DAEP defines a number of them, and they can be used with backward compatible class headers even though previously unavailable. In the old-style headers it's now possible to access the attributes and children directly via their data-members, albeit with camel-case prefixing. In addition to operators, methods that ape the C++ Standard Library are also part of DAEP. There are macros for excluding non-DAEP methods, but the library depends on many of those, but the goal is to gradually exclude more and more in order to not pollute the namespaces. Though the methods are often useful too. It might be nice to figure out a middle ground. In general, at this early stage, DAEP is still very experimental for a "portable/abstract" API.


This operator denotes a pointer. Children enjoy C array like semantics, which means the first child that shares a name with its siblings can be accessed as if it is a pointer. The converse also applies vis-a-vis the zeroth subscript of operator[].

If a logical child is not const then accessing it via this operator creates an instance of the corresponding element and assigns it to that child. This is an automatic process, such that the old add method that receives a space separated chain of child names can be replaced with a chain of -> operators, except for the last child in the chain, which requires #operator+. (The + in that case takes the place of the word "add" in the older construction.)

For attributes and simple-type values, this operator depends on if change-notices are enabled for the value. If so it always returns a const pointer to ensure code does not inadvertently circumvent the notification framework.


Accesses children in an array, and can be used with children with names that can only appear once as well. The subscript can be any number, however presently use of #operator+ and #operator-> is constrained to the subscripts of existing children, or one more, such that the subscript would create a child on the back of the array if not const.

Note that this is of special interest because the the new library introduces a contents-array model that is universal. This model replaces the old practice of storing children in individual arrays in favor of having a linear arrays that is sorted according to the parent element's content and can also include text-content and other kinds of nodes, which the new library considers to be "text" and which the old library had not facilitated.

For attributes and simple-type values, this operator may either return a change-agent that facilitates per-item change-notices, and so like -> produces a const value if necessary, or if future plans are enacted it will more likely not facilitate per-item notifications, and will simply prevent changes if necessary.


This operator in its prefix form replaces the older add method. It does not do arithmetic. It adds children to the content of a parent element. Note that for attributes and simple-type values arithmetic operators should work as expected owing to implicit conversion operators, which are not highlighted in this wiki page.


This operator in its prefix form is only available for child names for which there may be more than one child so named. It is functionally equivalent to accessing the "Nth" subscript and using operator+ on it, where N is the current count of the children with the name in question.

Elements from thin air

The daeSmartRef class has been extended to act as a "factory" for the type of its stored pointer. This use case is invoked by constructing it with a C++ reference to an instance of daeDOM. This new technique makes it possible to return a free element, or element and content, to be added to a document or other container by the receiver of the returned element.

Note that daeSmartRef has many guises. (It is often a typedef or base class.)

Template Specialization

In addition to operators and Standard Library like methods, COLLADA::DAEP offers some "template metaprogramming" like facilities that can permit users to drop in custom type definitions. The extension process is portable, but the class definitions themselves are not.

The change-notification framework is controlled in the same way. Change notices are enabled for all "id" and "sid" attributes by default.

Legacy Features

The new library takes a hard line on legacy features. You can say that it knows what it is, and it knows what it's not, and it only supports legacy features because they had previously existed for the most part. Often the features do offer some instruction in terms of how the library may be used.

Where it is necessary these features are controlled by the getLegacy method of daePlatform.

Future Work

  • The library very much requires a facility that matches <xs:anyAttribute>. This would greatly simplify implementation of the old domAny class as it relates to all other elements, which at present must bend somewhat to make domAny work. This facility will not be limited to schema that use this element, so to see that users are not forced to shed data that conflicts with the active schema for whatever purpose.
  • There is a plan to introduce qualified names (QName) via a tag field that all smart-ref addressable objects have. The prefix itself is not to be stored in the element data.
  • There are open questions around simple-type content models that also appear to have text punctuated by content such as comments and processing-instructions. At present this can appear ambiguous due to the handling of default values, and there is no recourse if a text is not convertible to the underlying binary data-type of the simple-type value.
  • Furthermore, the default values of simple-type content are difficult to attribute due to how XML Schema assigns the values to the child, and so the default for the type of value (unlike with attributes) is dependent on the child, and not the type of the value, nor the type of the element itself.
  • Currently the type of an element is determined by a pointer associated with its metadata. This is limited to a single module unless special care is taken to export the definitions. One future approach is to assign a global identifier to binary-compatible, lexically appropriate metadata.
  • Change-notices are always issued by changes that go through the metadata back-end. Changes that move/create element-based content always go through the back-end, because they are exported at least until the contents-array insertion routines can leverage compile-time-constants to insert directly into the content-model via the ordinal framework. (This is itself a future objective.)
To remedy this it is necessary to add some bits to either the metadata or element data or both that the schema-unaware routines are able to work with.

See also

External links

COLLADA DOM - Version 2.4 Historical Reference
List of main articles under the DOM portal.
User Guide chapters:  • Intro  • Architecture  • Setting up  • Working with documents  • Creating docs  • Importing docs  • Representing elements  • Working with elements  • Resolving URIs  • Resolving SIDs  • Using custom COLLADA data  • Integration templates  • Error handling

Systems:  • URI resolver  • Meta  • Load/save flow  • Runtime database  • Memory • StringRef  • Code generator
Additional information:  • What's new  • Backward compatibility  • Future work
Terminology categories:  • COLLADA  • DOM  • XML