NNEF and ONNX: Similarities and Differences

NNEF and ONNX are two similar open formats to represent and interchange neural networks among deep learning frameworks and inference engines. At the core, both formats are based on a collection of often used operations from which networks can be built. Because of the similar goals of ONNX and NNEF, we often get asked for insights into what the differences are between the two. Although Khronos has not been involved in the detailed design principles of ONNX, in this post we explain how we see the differences according to our understanding of the two projects. We welcome constructive discussion as the industry explores the need for neural network exchange and hope this post may be a constructive start to that conversation.

Here are the main differences between NNEF and ONNX:

A stable open specification authored by a non-profit consortium with well-defined multi-company governance, that will also have open source implementations An open-source project
Structure description is a text based, procedural format Uses protobuf, which is data-structure oriented
Describes networks in a flat format, and is also capable of describing compound operations Describes networks in a flat manner
Has the possibility to describe dynamic graphs in the future via a familiar procedural syntax May describe dynamic graphs via control-flow operations
Avoids references to machine representation and approaches quantization on a conceptual level to enable flexible optimizations for accelerating inference Uses concrete data types for tensors

NNEF is the result of a working group of a non-profit standardization organization that includes many hardware and software developers as its members, where any company or university is welcome to join the standardization process under a well proven multi-company governance model. The NNEF working group was officially created in mid-2016 when a public call for participation was issued, including inviting framework developers, with the goal to create a stable standard for the industry. The first preliminary version of NNEF was ready by the end of 2017.

The primary goal of NNEF is to be able to export networks from deep learning frameworks and import them to inference engines of hardware vendors. NNEF aims to remain independent of the implementation details of inference engines, while enabling as much optimization for inference as possible. At the same time, we expect that the format can additionally be used for interchange between training frameworks in the future. On the other hand, ONNX started out as an internal effort at Facebook for interoperation between two research groups, which used PyTorch and Caffe2. After open-sourcing ONNX on GitHub in mid-2017, additional companies joined and are taking part in the development.

The Khronos OpenVX API standard has an existing NN extension that was developed prior to NNEF. The NNEF functionality is not constrained in any way by the set of features and functions in the OpenVX specification and the NN extension. The relationship between OpenVX and NNEF is a subject for a future blog post.

One of the main design principles of NNEF is to enable the description of networks on multiple levels of granularity. NNEF aims to represent networks on a higher level, from which multiple ‘lowered’ representations can be derived. To achieve this, NNEF offers compound operations which describe how higher-level operations are built from lower level ones. This has multiple advantages: first, accelerator chips have different capabilities; they target different primitives and can optimize operations on different levels. Second, it allows newly emerging operations to be described as a sequence of existing primitives, allowing hardware to recognize it as one operation. As an example, LSTM cells can be described as a sub-graph built from matrix-multiplication and element-wise arithmetic operations. Some hardware would execute that on the low level as matrix-multiplication and element-wise arithmetic, while other libraries would execute it as a single higher-level operation.

ONNX, on the other hand, supports a flat graph description, meaning that operations are described on a single level of granularity. The same network can in principle be described on different levels, but requires separate exports, whereas a single NNEF description can be interpreted on different levels by the importer.

On the technical side, NNEF uses a text based, human-readable format to describe the network structure, giving an intuitive tool to describe compound operations in a procedural way. ONNX relies on protobuf, which results in an approach that describes the graph as a data structure. Of course, flexibility of a text-based format has a cost; parsing NNEF syntax can be more complex than parsing protobuf, for which many parsing tools readily exist. To ease usage, NNEF syntax is divided into two parts: basic syntax elements required for a flat description, and more advanced syntax for compositional descriptions. Read more on this in an earlier blog post. The flat syntax is aimed to be easily parsable, having an expressive power similar to data description languages, like protobuf.

One interesting direction of evolution in recent deep learning frameworks is the ability to intuitively describe dynamic graphs, which requires control flow. Control flow operations can be added to a data-structure oriented description; however, the result is somewhat unintuitive and cumbersome. For example, the clean dynamic graph concept of PyTorch has been compared favorably to TensorFlow's more complicated approach to make the basically static graph concept somewhat dynamic. ONNX's protobuf is more tailored to the static graph approach, while NNEF procedural syntax is more suited for an intuitive dynamic graph approach in the future, although such elements are not included at the moment.

Another difference between the two formats is their approach to representing quantized networks. ONNX includes various data types for activation tensors, such as integers and floats of various bit widths. Because NNEF aims to be independent of machine representation, it deliberately avoids usage of concrete data types for activation tensors. Instead, it describes quantization algorithms on a conceptual level (via real arithmetic) and lets the inference engine choose the appropriate representation for optimal execution. Of course, when storing parameter data, NNEF enables flexible bit widths for float and integer data. NNEF does not support complex type which is available in ONNX.

Additionally, ONNX represents a network (structure and data) as a single protobuf file. NNEF separates the structure and data; each data parameter is exported as a separate binary file. To get a single-file package, NNEF suggests wrapping the resulting files around with a general purpose, compressible container such as tar or zip. Separation may be advantageous for tools that would like to process only the structure or the data.

Khronos’ primary goal is always to help the industry move forward to the benefit of all. In that spirit, we warmly welcome comments, feedback, and discussion in this active field. Learn more or join the discussion here.

Posted by