Skip to main content

Khronos Blog

Deliver Interactive Experiences with glTF: A Node Graph-Based Approach

Note: this blog summarizes the September 27, 2022 Webinar “Delivering Interactive Experiences with glTF. Access the full webinar recording here.

The Khronos 3D Formats Working Group is constantly assessing emerging requirements of the glTF ecosystem and asking how the group can make the most impactful progress. Over the past 18 months, one issue has consistently bubbled to the top of these discussions: interactivity.

The urgency of developing new interactivity and behaviors capabilities for glTF has been fueled in part by the evolution towards the open metaverse. It’s clear that glTF can and should have an important role to play in this ecosystem, but we have some important functionality gaps to close first. We’ve spent the past few months cooperatively refining proposals for how we might build interactivity into glTF 3D assets. This blog will outline our current approach and reasoning, as well as invite the community to weigh in.

The Need for a New Approach

First, let’s acknowledge that developers have successfully been using glTF assets on interactive webpages for years. Generally, this is accomplished by adding the asset to a webpage and writing javascript to define any associated interactions. So why isn’t this enough? That approach breaks down when:

  • You’re creating an asset that will be used on a 3rd party page where you can’t run your own javascript,
  • Your target is something other than a browser, like a headset or a mobile app,
  • You want an asset to have the same interactivity features everywhere: across web, mobile, headsets, and more.

As creators and developers have found more use cases for glTF, we’ve encountered a need to expand our thinking about how behaviors and interactivity should work for glTF assets. Now, we’re endeavoring to create controls that will allow you to deploy interactive glTF content to 3rd party viewers.

The Continuum of Interactive Applications

The glTF community works on a wide range of applications, ranging from simulations and immersive experiences with very high expressive power to applications with very high security and control requirements where interactivity and behaviors must be constricted to ensure that the asset performs predictably. Our goal is to create a system that works for different layers of capability, bridging the whole continuum.

Starting at the top

At the highest level of the continuum, “Layer N,” we’re trying to achieve a level of interactivity and expressive power on par with the top gaming engines available. So naturally, we looked at how they do it.

High-end gaming and metaverse engines use behavior graphs to represent behaviors. Behavior graphs are a set of interlinked behavior nodes that form a directed acyclic graph (DAG). The graph is made up of nodes, sockets (behavior node inputs and outputs), and links between sockets.

Blueprint Visual Scripting system, within Unreal Engine, was the first example of this approach. First released in 2013, Blueprint allows artists to create content for game engines without the need for any C++ coding. Unity’s Visual Scripting and NVIDIA’s Omnigraph followed suit. This node-based, low-code approach supports rapid iteration.

There are several different categories of nodes available, including:

  • Events
  • Actions
  • Variables
  • Queries
  • Logic
  • Flow

The execution of behaviors is driven by flow-type sockets and links.

Each node is intentionally simple – you create complex interactions by linking them together. Behavior graphs avoid compound nodes (which combine an action, event, and/or condition in a single node) in favor of composability. The simpler each individual node is, the more ability you have to combine and remix them in myriad ways to define new behaviors. You don’t have to pre-suppose what the user will do with the content. Instead, you provide them with all the options, linked together so that each action triggers the right reaction.

Are behavior graphs secure?

The security model of a behavior graph system is based on the finite set of nodes available to the user. Behavior graph nodes are pre-defined. A user can’t add nodes that don’t exist in the system already; they can only call the existing nodes that you’ve provided.

Because the nodes are explicitly defined, the behavior graph model allows for load-time validation. Before executing a behavior graph, the system can check that all nodes are valid, properly defined, and contain valid values. This prevents corrupt, improperly structured, or unsupported node types from running.

You can also set limits on how many nodes you want to process in each time slice. This enforces limited time slices even if an asset somehow gets in an infinite loop, preventing DOS attacks.

The high degree of flexibility inherent in behavior graphs gives some people pause from a security perspective. Behavior graphs are Turing-Complete systems (i.e., they can solve any problem that can be expressed in code and has a computable answer.) In theory, an attacker can run any algorithm in a Turing-Complete system.

However, practically all behavior systems are Turing-Complete. Instead of aiming to prevent Turing-Completeness, this model mitigates it. The fact that it’s set up to execute only pre-defined nodes creates a sandbox, even when executing within a C++ context that isn’t secure by default. As long as the nodes are implemented correctly, and the graph only calls your pre-defined nodes, the system should be relatively secure.

Are behavior graphs prohibitively complex?

If you’re thinking “It sounds like these graphs could get extremely complex,” you’re not wrong. When you leverage the system fully, you can end up with some very complex behavior graphs – but the degree of interactivity and complexity is ultimately up to you.

We don’t want the potential complexity of behavior graphs to slow or limit the adoption of glTF’s emerging interactivity features – but behavior graphs are more approachable than you might think. In order to assist in the adoption of behavior graph-based glTF interactivity, we’ve created and open-sourced a behavior graph library.

Can behavior graphs work for more constricted applications?

The flexibility of the behavior graph model allows you to create really powerful interactive assets—but what about when you’re specifically trying to create something that’s not powerful?

So far, we’ve been talking about how we might achieve the very highest levels of interactivity with glTF assets. Now, let’s think about the center of our graph – the applications that require the tightest control.

There are scenarios where it’s desirable to place strict limits on asset behaviors and interactivity, because you need assets to be highly predictable. Consider, for example, an asset that will be displayed in a digital art gallery or virtual store run by a 3rd party. The 3rd party system owners can’t allow this asset to run amok, pushing aside or interacting with other assets in unpredictable ways – but the expressive power of behavior graphs could make it very difficult for 3rd party systems to enforce rules on content added to their platforms. Sure, they can have the system simply “undo” any prohibited behaviors at runtime, but that makes for a frustrating and unpredictable user experience.

Our question is, can we use the same framework for everything on the glTF interactivity continuum, from the highly controlled scenarios at Layer 0 to the highly expressive assets at Layer N? That is, can we create a system of node graphs, with named inputs, named outputs, and connections between them, that applies even at the most controlled levels?

Yes, we think so – as long as certain limitations are implemented. In behavior node graphs for highly constrained applications:

  • All parameter values must be constants
  • “Next” is the only output
  • No interaction can occur without an initiating user action
  • The overall number of actions is kept small, so that the graph is easy to implement, optimize, and review

Even with these constraints, we can still create some quite useful interactive assets.

Interactive bicycle configuration demo adhering to the requirements for a “Level 0” node graph system

Advantages of a unified approach

We believe the glTF community will need the flexibility of the node graph model at all levels because in truth, the interactivity continuum isn’t layered like an onion. It’s more like a sprouted potato, with some interactions working very well at all levels given the proper context, while others should be permitted only in a narrow slice of applications

The flexibility of a node-based system, combined with the security model of explicitly defined nodes, will allow developers and content creators to build glTF assets with both the functionality and constraints required for a wide array of applications.

Next Steps for glTF Interactivity

The 3D Formats Working Group has launched a task sub-group (TSG) to advance glTF interactivity. The TSG is actively seeking input from the glTF community on the node-based graph model. All glTF stakeholders are invited to:

We want to hear your feedback on existing node-graph systems, concerns about the node-graph security model, feedback on our glTF behavior graph reference implementation, and more. Join the discussion and help us define the next generation of glTF capabilities.