Why high compute, low latency and low power are key to full autonomy
The autonomous driving journey is well underway, and continues to be an area of focus for major OEMs as well as new EV entrants.
There have already been substantial advances. Notably, Tesla offers level 2 autonomous driving, which allows drivers to take their hands off the wheel while keeping their eyes on the road still requiring significant driver attention. The target, however, is higher levels of autonomy ultimately achieving full autonomy, in which the vehicle can operate safely and efficiently in any environment without driver intervention.
If this is the ultimate destination of the journey, how do we get there? Compute efficiency may not be the first technology that comes to mind when considering autonomous vehicles (AVs), but it plays a crucial role – one that ensures automakers’ advancement in this race to autonomy.
The requirement for “near-perfect” perception
To comprehend the role of computing, we must take a step back and examine the fundamental requirement for a vehicle to navigate in a three-dimensional environment, which requires a multitude of environmental sensing elements and then processing for close to perfect perception.
What does “close to perfect” mean exactly? First and foremost, it entails the capacity to perceive in all directions around the vehicle, including the front, rear, and sides.
This multidirectional perception is crucial for a variety of reasons. Take the example of a truck driving along the highway. This truck could easily collide with a subcompact car in its blind spot while changing lanes if it cannot see accurately behind and to the sides.
Similarly, if a station wagon is driving through the suburbs and encounters a four-way stop sign, it must be able to see at least a hundred meters to the right and left to prevent a reckless driver from passing through the intersection at 50 miles per hour without stopping. Similarly, it would be vital to immediately determine whether a young child has emerged from between a row of trees and run into the street to retrieve their lost soccer ball.
The consequences of not capturing the scene in the first place or misinterpreting the situation by failing to see “the big picture” could be severe. It is therefore essential to be able to see in multiple directions, as well as both close and wide, as with a macro lens, and far and narrow, as with a telephoto lens.
In addition to being able to see in multiple directions, AV perception must be nearly flawless in all weather conditions. We cannot assume that autonomous vehicles will only operate during the day or in dry weather conditions. It is essential to consider the possibility of rain, sleet, snow, and other elements that affect driving conditions and variables such as stopping distance.
Other environmental factors like rain, snow, fog, to name a few, also need to be considered. Stopping distance increases exponentially from a dry road to a wet road to a snowy road to an icy road due to changing vehicle traction with the road. In low visibility conditions such as fog and dust, vehicles have even less time to react than usual, so any misperception could lead to an accident.
Problem #1: Compute Capacity of a deluge of data
To provide vehicles with the “near-perfect” perception they require, automakers have equipped them with a variety of sensory functions, to replicate some human sensory functions.
Vehicles may have radar components with varying degrees of range, LIDAR capabilities, and multiple camera systems. Typically, the radar or LiDAR component serves as an additional source of information, augmenting the images captured by the cameras. However, the higher the resolution of the video captured by the cameras, the more objects can be detected particularly at far distances, reducing the need for multiple sensory components.
As with humans, it is not sufficient for vehicles to simply “see” their surroundings; they must also interpret what they see. Here, perception processing comes into play. The vehicle attempts to decipher the incoming visual data it receives through AI based perception processing. For instance, there may be a tree limb on the road 50 meters ahead, or a vehicle a few car lengths ahead on the highway. This image is captured by the camera, and then artificial intelligence processing determines, “That is a tree branch” or “That is a car.”
Once the perception processing is complete, higher-level AV systems such as motion control are activated – turning the steering wheel to steer you away from the fallen branch or applying the brakes to slow you down so you don’t hit the car in front of you.
By comprehending this sequence of events – collection of visual data, perception processing, and then motion control – we can begin to appreciate why deterministic high-performance computing is so crucial for autonomy.
For instance, eight 8-megapixel cameras positioned on all sides of a vehicle would generate 6.9 petabytes of data per hour if they captured the scene at 30 frames per second. (Any radar or LiDAR component, meanwhile, will produce its own data feed to add to the pile.) The data is relentless and continuously received.
It is impractical to have such a volume of data sent to the cloud, have it processed, and sent back to the vehicle in time for taking the necessary actions. The process will have such a high latency that the vehicle will not be able to react in time. This computation must take place locally so that the vehicle can comprehend its environment and take any necessary evasive action almost instantly.
However, successful execution of local compute is not as simple as placing a processor in the car. There are several crucial factors that computing must overcome, with latency being the first.
Problem #2: Latency
As previously mentioned, after acquiring visual data, the AV must interpret it, which is the most computationally intensive aspect of the entire autonomous vehicle compute platform.
By comprehending all the visual information in a given frame from a camera, as well as the information seen in the previous frame and the previous-to-previous frame, the AV is able to perform trajectory planning and determine what it wants to do in the next few milliseconds.
For instance, if a car sees a person walking from left to right in a crosswalk in frames 1, 2, and 3, it can predict that the pedestrian will be directly in front of it in the next frame, and then through the crosswalk and onto the sidewalk in the following frames. This enables the vehicle to plan its path accordingly.
In this situation, latency must be extremely low because other AV systems, such as the motion control, require time to carry out their duties. The quicker the perception processing can decipher what it sees, the sooner the brakes or steering can engage, and the less likely a vehicle is to be involved in an accident.
Problem #3: Jitter
Afterwards, perception processing must be able to effectively address jitter. For example, the latency for frame 1 from the cameras could be 20 milliseconds, frame 92 could be 250 milliseconds, and frame 100 could be 5 milliseconds.
The problem with jitter is that higher-level systems, such as the motion control system, cannot immediately determine what is occurring, causing their performance to become “jerky.” Before moving, the motion control is essentially awaiting a response from the perception processing engine and needs a deterministic response at the same intervals.
Answers regarding what is being perceived must be delivered predictably, within a fixed time frame, repeatedly – i.e., without jitter. With this consistent input, the vehicle’s control system is smooth; without it, the vehicle’s performance will be jerky because the motion control systems will receive input at highly variable intervals of time.
Problem #4: Power Consumption
Power consumption is the final crucial factor that computing must address. This is a crucial area, particularly as electric vehicles (EVs) gain market share against conventional internal combustion engine vehicles.
Consider an electric vehicle with a 60 kilowatt-hour battery that is driven six hours per day. If compute consumes 1000 watts per hour, then driving for six hours will require 6000 watts (or, six kilowatt hours). In other words, compute will consume 10% of the car’s energy simply by driving around. This indicates the car’s range has been diminished, contributing to range anxiety, and the driver will have to make a charging stop sooner than she would have otherwise.
Consider now a computing platform that consumes only 200 watts per hour. In this scenario, six hours of driving will consume just 1,200 watts, or 2% of the car’s total energy. This scenario is much more operationally reasonable with extending the driving range and overall energy efficiency.
While we need to determine “what needs to be seen,” we also need to consider “how frequent it needs to be seen,” and “at what distances.” Frame rate and resolution have a direct effect on needed compute capacity and in turn on compute power consumption, again, affecting driving range.
The legacy approach is ineffective.
So, what is the solution for AVs to have the necessary compute power they need to continue the autonomy journey?
It is becoming increasingly apparent that legacy technology will not suffice. Historically, autonomous vehicles have utilized CPUs and GPUs that were designed for other purposes, such as powering a graphics display or a server in a data center, and installed them in cars.
This strategy of using off-the shelf components and “use whatever technology is available” either has not worked or lead to expensive and power-hungry systems.
What is required here is compute technology designed specifically for AV applications tailored with a balance of compute capacity that stands the test of time, low power consumption, low latency and preferably zero jitter.
What would a purpose-built chip look like and how would it need to operate? It must be able to process multiple streams of ultra-high resolution & very high frame rate cameras, as well as detect objects up to one kilometer in real time and under varying road and environment conditions.
At 30 frames per second, every 33 milliseconds new camera data is received. At 60 frames per second, every 16 millisecond information is received. Every 8 milliseconds, information is transmitted at 120 frames per second. From last pixel out to perception results, processing should take a minimal amount of time, in the order of a few milliseconds. This affords the vehicle ample reaction time for safe navigation, but it requires Peta-Op class inference.
In the meantime, low power consumption can only be achieved through a combination of factors, including the use of novel computational methods, use of on chip memory to the extent possible to reduce trips to external memory (which significantly reduces compute power consumption) and a highly optimized convolution acceleration engine. These capabilities are not available with legacy chips.
We’re at a turning point
Full autonomy will be attained when the vehicle’s computation system can perceive the environment with near-perfection, allowing the higher level driving functions to operate with fundamentally robust intelligence. Without robustness, high-level decisions are susceptible to error, hindering the ability to achieve full autonomy.
We are at an inflection point where we must solve some of the fundamental constraints around compute if we want autonomous driving to take the next step forward – and that requires moving away from legacy approaches. The industry has made considerable progress on the autonomy front to date, but we are now at a juncture where we must solve some of the fundamental constraints around compute for autonomous driving to take the next step forward.
When concerns about latency, jitter, and power consumption are addressed, the autonomy journey can truly accelerate. With the introduction of new, innovative solutions, the road ahead promises to be an exciting one.
About the Author
Mansour Behrooz is the Vice President of Marketing and Business Development at Recogni, Inc. with more than 35 years in the technology industry, Behrooz has a proven track record in revenue generation, customer management, marketing, and product planning. He has global expertise in semiconductors, systems, voice and video over IP, broadband, AI, as well as in-depth knowledge of various markets including edge AI, industrial, surveillance and monitoring, consumer, and automotive.