Apple debuted the M4 chip with the iPad Pro on May 7. While Apple stated significant improvements in chip’s overall performance, the most prominent change in the M4 is the 16-core Neural Engine, which is Apple’s term for Neural Processing Unit (NPU). Apple called the M4 an “outrageously powerful chip for AI." It is the improvements in the NPU that is backing Apple’s claim of the M4 being significantly more powerful than any other chip powering an AI PC. So, what exactly is NPU and why is it gaining prominence in the chip industry? Let us find out:
What is NPU (Neural Processing Unit)
An NPU, or a Neural Processing Unit, is a dedicated processor designed specifically for accelerating neural network processes. A neural network is essentially a type of machine learning algorithm that mimics the human brain for processing data. Therefore, the NPU is highly capable for handling machine learning operations that form the basis for AI-related tasks, such as speech recognition, natural language processing, photo or video editing processes like object detection, and more.
In most consumer-facing gadgets such as smartphones, laptops and tablets, the NPU is integrated within the main processor, adopting a System-on-Chip (SoC) configuration. However, for data centres, the NPU might be an entirely discrete processor, separate from any other processing unit such as the central processing unit (CPU) or the Graphics processing unit (GPU).
How is NPU different from CPU and GPU
CPUs employ a sequential computing method, issuing one instruction at a time, with subsequent instructions awaiting the completion of their predecessors. In contrast, NPU harnesses parallel computing to simultaneously execute numerous calculations. This parallel computing approach results in swifter and more efficient processing. That said, CPUs are good at sequential computing, executing one process at a time, but running AI tasks requires the processor to execute multiple calculations and processes simultaneously.
More From This Section
This is where the Graphic Processing Units, or the GPUs, come in. These have parallel computing capabilities and have integrated circuits embedded to carry out AI workloads, but they are generally meant for handling other processes, such as graphic rendering and resolution upscaling. It essentially makes a case for NPUs, which simply replicate those circuits and make it dedicatedly work on carrying out machine learning operations. This allows the AI workload processing to be more efficient and less power consuming.
GPUs are still used in the initial development and refinement of AI algorithms, while NPUs later on takes the mantle by running those refined language models on the consumer’s device.
NPU and on-device AI
Large language models (LLMs) are too big to run on-device and service providers generally take up the processing to the cloud to offer AI features based on their language models. However, recently big technology companies have released small language models such as Google’s Gemma, Microsoft’s Phi-3 and Apple’s OpenELM, indicating a trend towards small scaled AI models, capable of running entirely on-device. As on-device AI models gain more prominence, the role of NPUs get even more crucial as they are the one deploying AI-powered applications on the hardware.