Previous blog posts "NNEF from Khronos will enable universal interoperability for machine learning developers and implementers" and "Machine learning’s fragmentation problem — and the solution from Khronos" have stressed that the deployment process of neural networks to inference engines is becoming fragmented. An accepted standard can facilitate the industrial use of artificial intelligence by creating mutual compatibility between deep-learning frameworks and inference engines. The Neural Network Exchange Format (NNEF) is the Khronos Group’s solution to this problem.
The goal of NNEF is to enable data scientists to easily move networks between frameworks and inference engines; the primary focus being on transfer from frameworks to inference engines, but also supporting transfer between frameworks. It does this by providing a distilled description of neural network structure and accompanying data. To create a unified description, we examined the similarities and differences among deep learning frameworks (including Torch, Caffe, TensorFlow, Theano, Chainer, Caffe2, PyTorch, MXNet). Unsurprisingly, these frameworks have a lot in common. This stems from the underlying operations used to build networks, partly because they are often backed by the same GPU implementation. However, there are slight differences in the formulas used for similar operations. This is one particular problem standardization can solve.
More notably, frameworks differ in how they describe network structures. Some use a flat description composed of layers of medium complexity. Others use scripting languages to assemble compound operations from primitive ones. These build on various higher-level libraries that make common sub-structures easier to describe. Scripting has the advantage of describing new compound operations with existing primitives, which may be crucial to future extensibility. Furthermore, it provides hardware implementers information on the relation of operations and the importance of certain primitives. At the same time, describing neural network computation as a hierarchy of operations, on multiple levels of granularity, provides an opportunity for implementations to target different levels and optimize execution. However, the compositional approach is both conceptually and technically much more complex than a flat description. Which of the two approaches will be more beneficial in the future is yet to be seen.
To support both scenarios, NNEF offers two options: a flat and a compositional description. Both utilize syntactic elements borrowed from Python and the compositional description is an extension of the flat approach. A flat description is simple and easy to interpret and is sufficient for many use cases. These include transferring networks with only standardized operations from frameworks to inference engines. Ease of use may be a crucial feature in the case of libraries or drivers written in low-level languages. On the other hand, a compositional description allows for complex cases other than simple exports-imports, such as those that require the definition of custom compound operations.
The goal of NNEF is to transfer all relevant information from deep learning frameworks to inference engines. To remain independent of the framework used to produce a network, and the inference engines used to execute it, the description deliberately avoids referencing machine representation of data. NNEF contains algorithmic information related to quantization that may occur during training and inference. Quantization is an important area of optimization, which the standard supports without imposing restrictions on hardware. For this, NNEF relies on conceptual quantization. This means that quantization algorithms are expressed as compound operations on real-valued data, built from readily available primitives. The description is accompanied by a binary data-storage format for saving network parameters in simple and general formats, including quantized ones.
NNEF covers a wide range of use-cases and network types. The primary uses cases are image processing (classification, segmentation, object detection), language, and audio and video processing. NNEF contains operations required for classical fully connected and modern convolutional networks, including feedforward, encoder-decoder, and recurrent architectures. For example, linear/convolutional operations, activation functions, element-wise, pooling and normalization operations are all supported.
The NNEF Khronos working group provides open source sample tools to interested parties through the NNEF repositories on Github. These can be used to parse and validate NNEF documents for correct syntax, and also to validate other execution independent aspects of the documents; for example, to decide whether the computational graph is properly connected. Network structure and data are stored separately. This allows utility tools to process network structure without loading network data and easily answer questions such as whether an inference engine can execute a given network. Separating structure and data is also useful for network transformation tools that do not alter the data itself.
Taking a vast number of aspects into consideration, NNEF will a serve as a standard in a currently fragmented field. Standardization, achieved through the cooperation of industry stakeholders supports the diverse use of neural networks without leading to a high number of proprietary solutions.