Arm has introduced automotive enhanced Arm Cortex-A76AE, the first high performance Cortex-A CPU with Split-Lock capability.
The appetite for further autonomy in our vehicles is spurring advances in the underlying technology. As the number of sensors in and around the vehicle increases, so must the capability of CPUs for large-scale data processing.
Traditional system designs with separate Electronic Control Units (ECUs) for the gateway, infotainment and Advanced Driver-Assistance Systems (ADAS), are making way for innovative approaches in fully integrated systems. These systems will consist of multiple applications running at different levels of Automotive Safety Integrity Level (ASIL). In order to realize fully autonomous vehicles in mass production, scalable, high performance computing with inherent safety is required.
The Cortex-A line of products has seen some very exciting innovation in microarchitecture design over the last decade. Some of my personal highlights include the introduction of coherent heterogeneous processing with big.LITTLE, native 64-bit support and a brand-new memory system with DynamIQ technology.
As the name suggests, the Cortex-A76AE is based on the recently announced, Cortex-A76 design. It is a superscalar, out-of-order processor that delivers similar levels of performance as the Cortex-A76 across integer, floating point, memory and machine learning, and achieves similar levels of energy efficiency. Where the Cortex-A76AE is different, is through microarchitectural upgrades for functional safety and added application flexibility.
The Cortex-A76AE is purpose-built for functional safety applications such as ADAS and autonomous vehicles. The main three benefits of the Cortex-A76AE are mentioned below.
1. Safety for autonomous systems
Where the Cortex-A76AE really stands out, is in its ability to deliver the aforementioned performance, at high safety integrity. It achieves this through a significant redesign of the Cortex-A76, becoming the first high performance Cortex-A CPU to include the Dual Core Lock-Step (DCLS) and Split-Lock features. Configuring two CPU cores in ‘Lock-Step’ is a traditional way of achieving high levels of diagnostic coverage – the ability to detect the occurrence of an error condition.
The flexibility offered through Split-Lock also has a safety benefit. It can be extended to support potential fail-operational modes – the ability to continue to operate in a degraded mode rather than completely shutting the system down. For example, when running in lock mode, if one core starts to exhibit a failure condition, the system could be quiesced and the faulty core be taken off-line (split) allowing continuation in a degraded mode of operation. This ‘split available” capability is critical for any autonomous system. The following are the main microarchitectural highlights of Cortex-A76AE for safety:
- Dual Core Lock-Step (DCLS): The Cortex-A76AE is capable of running in Dual Core Lock-Step (DCLS), and hence is able to contribute towards a system’s ASIL D hardware diagnostic coverage requirements.
- Memory protection: The Cortex-A76AE comes with memory protection as standard. It supports Single Error Correction, Double Error Detection (SECDED) ECC and Parity protection in the L1 cache, and SECDED ECC protection with the ability to correct in-line, on the L2 and L3 caches.
- RAS features: As part of the Armv8.2 architecture extension, Cortex-A76AE includes RAS features built in. This includes standardized error reporting across the core and the DSU, error injection as a means of testing fault management, and data poisoning as a way of deferring error aborts till point of execution.
- Integrated comparators: The Cortex-A76AE includes comparators, which are integrated into the design. These blocks compare outputs from the logical and redundant processing elements to detect for divergence. They follow the error reporting scheme as defined in the Armv8.2 RAS architecture.
Apart from the hardware features above, the Cortex-A76AE has been developed on an advanced process for the avoidance of systematic faults. This enables it to meet the ASIL D systematic requirements as standard.
2. Performance for ADAS and Autonomous Driving
Cortex-A76AE has been designed to act as the decision engine in next generation ADAS and Autonomous Vehicle systems. It delivers a 30% uplift in performance over its predecessor, the Cortex-A75, and a whopping 60% increase in performance over Cortex-A72. This massive boost in performance meets the emerging CPU requirements for autonomous driving of more than 250K DMIPS at less than 15 Watts for the compute cluster. This fits well within an SoC power budget of 30 Watts.
The microarchitecture of Cortex-A76AE is largely based on Cortex-A76, with the following highlights that deliver on performance:
- Decoupled branch prediction and instruction fetch: Built to hide latency at high bandwidth, the in-order Cortex-A76AE front-end is able to fetch 4 to 8 instructions per cycle, using multi-level branch target caches and hybrid indirect predictor to sustain the maximum throughput.
- A wider machine: First 4-wide decode core, increasing the maximum instruction per cycle capability. Up to 8 operations per cycle can then be dispatched to the out-of-order core, supporting a wider area-/power-optimized instruction window.
- More integer and vector execution throughput: Quad-issue integer units are integrated in the core including 3x simple ALU and 1x multi-cycle integer. Moreover, Cortex-A76AE supports dual-issue native 16B (128-bit) vector and floating-point units, twice the throughput of any previous Arm CPU and 4x ML uplift over Cortex-A75.
- Enhanced memory system: The full cache hierarchy is co-optimized for latency and bandwidth, with a sophisticated fourth generation prefetcher, deep memory-level parallelism.
3. Flexibility in mixed criticality systems
As mentioned in the introduction, next generation ADAS and autonomous driving systems will consist of multiple applications running at different levels of safety criticality (i.e. different levels of ASIL). This presents a challenge to silicon providers when scoping the compute complex. How do you size-up the performance requirements five to six years ahead of knowing the exact mix of safety critical applications in a vehicle?
Cortex-A76AE solves the challenge of mixed criticality applications through its ability to operate in two modes, performance mode and safety mode. In performance mode, all cores within a cluster operate as Symmetrical Multiprocessors (SMP). In other words, a user is able to utilize all the compute resources within a cluster, coherently. In safety mode, pairs of Cortex-A76AE cores in a cluster are configured to run in Lock-Step.
A functionally-safe coherent interconnect such as the Arm CoreLink CMN-600AE, can support multiple clusters of Cortex-A76AE. In such a system, any mix of clusters can be run in performance mode and safety mode, to achieve a fine-grained balance to match the mix of safety critical applications. The mode of operation can be changed at any time through reset. This means that a Tier1 or car manufacturer is able to tune the platform to fit any mix of safety critical applications, post production.
This flexibility drastically improves the usability of platforms based on Cortex-A76AE, across multiple generations and market segments. To further aid configurability, the Cortex-A76AE is based on Arm DynamIQ technology, meaning that it is also extremely scalable in terms of performance, power and area.
Source: Arm Blog