OpenVX Neural Network Extension  02b8d012
Neural Network Extension

# Acknowledgements

This specification would not be possible without the contributions from this partial list of the following individuals from the Khronos Working Group and the companies that they represented at the time:

• Frank Brill - Cadence Design Systems
• Kari Pulli - Intel
• Tomer Schwartz - Intel
• Mostafa Hagog - Intel
• Chuck Pilkington - synopsis
• Thierry Lepley - Cadence Design Systems
• Jesse Villarreal - TI
• Victor Eruhimov - Itseez3D
• Xin Wang - Verisilicon

# Background and Terminology

Deep Learning using Neural Networks techniques is being increasingly used to perform vision classification and recognition tasks. Deep Neural Networks have significantly improved image recognition capabilities over previous technologies. The Neural Network extension for OpenVX is intended to enable the implementation of Deep Neural Network in the OpenVX framework. It is well known that the Deep learning domain for vision, has two fundamental stages. At first the network topology is designed and trained given a collection of labelled data. The network topology is represented as a graph of several nodes comprising Neural Network building block. The trained data represents the problem to be addressed. During the training Phase, the parameters (also referred to as weights/biasses or coefficients) are determined for the given network topology. The network topology solution can then be deployed.

In Deployment the network topology as well as parameters are fixed which allow optimizing in hardware and software. In certain scenarios an additional intermediate step is performed to optimize the parameters to a certain target hardware. As an example, using fixed point calculations. When Deployed, the Neural Network is used for inferences on input data. The main objective of the Neural Network Extension for OpenVX is to enable the deployment phase (in other words inferences).

This section provides the definition of the basic terminology to be used across the document, in an attempt to address the various use and different naming in the academy as well as the industry. Those names refer to the same fundamental concept of Deep Neural Networks in the deep learning domain. We refer to the term Deep Neural Network to the network topology of the deep learning network, that is composed of multiple layers in which one of the main layer is Convolution. Other names used in the academia and industry to refer to the same type of network topologies are CNN (Convolutional Neural Networks) and ConvNets. Throughout this document we will use the Deep Neural Network to refer to the Neural Network, CNN and ConvNet.
Weights - Will use the term Weights to refer to the parameters or coefficients that are the result of training the Deep Neural Network. Weights can be shared or non shared. Or have local connectivity.
Biasses - Will use the term Biasses to refer to the parameters or coefficients, per output only, that are the result of training the Deep Neural Network.
Convolution Layer - A type of layer in the Deep Neural Network that has local connectivity and shared weights, other naming are Locality connected with shared weights.
Fully Connected Layer - All inputs to the layer affect outputs of the layer , in other words connection from every element of input to every element of output.
Activation Layer - A layer that performs operations on every input data and is inspired by the neuron activation function approximated usually using non-Linear functions.

The documentation below uses the abbreviations IFM and OFM, which stand for “Input Feature Maps” and “Output Feature Maps,” respectively. Each feature map is a 2 dimensional image. A CNN input or output tensor will typically have 3 dimensions, where the first two are the width and height of the images, and the third is the number of feature maps. For inputs, the third dimension is the number of IFMs, and for outputs, the third dimension is the number of OFMs.

# Introduction

The Neural Networks extension enables execution and integration of Deep Neural Networks in OpenVX processing graphs. The extension is dependent on a vx_tensor object which is introduced in OpenVX 1.2. Therefore this extension is extending OpenVX 1.2 and not previous OpenVX specifications. The vx_tensor object is a multidimensional array with an arbitrary number of dimensions. The vx_tensor object can represent all varieties of data typically used in a Deep Neural Network. It can represent 2-dimensional images, 3-dimensional sequences of images (usually the input and outputs of a Deep Neural Network)and 4-dimensional weights.
Application can build an OpenVX graph that represents Deep Neural Network topologies where the layers are represented as OpenVX nodes (vx_node) and the vx_tensor as the data objects connecting the nodes (layers) of the OpenVX graph (Deep Neural Network). The application can as well build an OpenVX graph that is a mix of Deep Neural Network layers and Vision nodes. All graphs (including Deep Neural Networks) are treated as any OpenVX graph, and must comply with the graph concepts as specified in section 2.8 of OpenVX 1.1, especially but not limit to the graph formalisms in section 2.8.6. Additionally, this extension defines several auxiliary functions to create, release, and copy vx_tensor objects. Moreover, the extension introduces the concept of “view” for vx_tensor objects, which is similar to the ROI of a vx_image. The use of "view" enables splitting and merging vx_tensor objects, which are common operations in Convolutional Networks. The layers of the Deep Neural Network (represented by vx_node objects) perform the computations on the tensor data objects and form a dataflow graph of computations. The extension defines the following layer types: convolution, activation, pooling, fully-connected, and soft-max.

# Weights/Biasses Setting

It is assumed that the Deep Neural Networks are trained in framework external to OpenVX and imported. This requires the application to allocate a memory area for the weights/biasses, read the weight values from a file into this memory area, and then use the vxCopyTensorPatch API to copy the weights/biasses from the memory area into the appropriate OpenVX Tensor object. The vxCopyTensorPatch function will convert the application memory to the implementation-specific format before putting it into the Tensor object. While effective, this method has the drawback that an intermediate memory area needs to be allocated and a copy and conversion needs to be done.

A separate “import/export” extension defines a vxImportBinary function that can be implemented more efficiently. Implementations of vxImportBinary could read a weight file or perhaps an entire graph description directly without the need for an intermediate copy. The format of this binary will be implementation-dependent. OpenVX implementations that support both the Neural Network extension and the binary import/export extension can use this more efficient method to set the Deep Neural Networks weights/biasses. The vxImportBinary function will return a handle to an object that can be queried to get handles for the individual objects within it via the vxGetImportReferenceByName or vxGetImportReferenceByIndex functions. Further details and alternate usages of the vxImportBinary function are provided in the specification of the “import/export” extension.

OpenVX objects (tensors, scalars, enums) for weights, biases and other static parameters of CNN layers must have actual data loaded into them before vxVerifyGraph() is called, therefore implementation may cache them prior to execution or use them for other optimizations. Optionally, implementation may explicitly define support to change weights after  vxVerifyGraph() was called or between vxProcessGraph() calls. For convenience we tag [static] the parameters that must have actual data loaded into them before vxVerifyGraph().

# Kernel names

When using vxGetKernelByName the following are strings specifying the Neural Networks extension kernel names:

org.khronos.nn_extension.convolution_layer

org.khronos.nn_extension.fully_connected_layer

org.khronos.nn_extension.pooling_layer

org.khronos.nn_extension.softmax_layer

org.khronos.nn_extension.normalization_layer

org.khronos.nn_extension.activation_layer

org.khronos.nn_extension.roi_pooling_layer

org.khronos.nn_extension.deconvolution_layer

# 8-bit extension and 16-bit extension

The Neural Network Extension is actually two different extensions. Neural Network 16-bit extension and Neural Network 8-bit extension. The 8-bit extension is required. The 16-bit extension is optional. For 8-bit extension, VX_TYPE_UINT8 and VX_TYPE_INT8, with fixed_point_position 0, must be supported for all functions. For 16-bit extension, VX_TYPE_INT16 with fixed_point_position 8, must be supported for all functions. The users can query VX_CONTEXT_EXTENSIONS, the extension strings are returned to identify two extensions. Implementations must return the 8-bit extension string, and may return the 16-bit extension string. If implementations return the 16-bit extension string, the 8-bit extension string must be returned as well. The 8-bit extension string is "KHR_NN_8" and the 16-bit extension string is "KHR_NN_16". The legal string combinations are "KHR_NN_8" or "KHR_NN_8 KHR_NN_16".