Arm NN is an open-source inference engine for CPUs, GPUs and NPUs. It bridges the gap between existing NN frameworks and the underlying IP. Arm NN is built on top of the Arm Compute Library (ACL). This contains a collection of highly optimized low-level functions to accelerate inference on the Arm Cortex-A family of CPU processors and the Arm Mali family of GPUs. For GPUs, ACL uses OpenCL as its compute API. The OpenCL memory model closely maps to the GPU architecture making it possible to implement optimizations that significantly reduce the accessing of global memory. Read on to learn how.