The Eyes Behind Self-Driving Cars

By Vaughn Luthringer

Photo credit: Medium - Albert Lai

Computer vision and image recognition are pretty common terms nowadays. But their uses go far beyond Snapchat filters. Computer vision is, by definition, “how computers see and understand digital images and videos.” Yes, that can refer to how that dog filter gets put on your face. But, it can also refer to things much bigger, like, say, self-driving cars!

We’ve all heard of self-driving cars, autonomous cars, whatever you want to call them. We’ve heard a lot about the dilemmas that come with them, and the controversy surrounding the “futuristic” devices. What we’ve gotten less insight into is exactly how they work. So, let’s dive in!

Object detection is at the center of the function of self-driving cars. It’s broken up into two parts: object classification and object localization. In simple terms, what is the object, and exactly where is it?

Object classification is done by what is called a “convolutional neural network.” CNNs assign various levels of “importance” to objects in an input image, and are then able to differentiate objects from one another. The use of “sliding windows” allows the CNN to detect more than just singular objects that take up most of the input image. Sliding windows are “boxes” that move across an image, essentially creating smaller images for the CNN to analyze. Check out the header image on this article to see an example of sliding windows!

What about objects bigger or smaller than our boxes? This is where YOLO—”you only look once”—comes into play. YOLO is another algorithm, and it’s used to create a predictive grid, a “probability map” out of an image. YOLO makes predictions about what each cell of the grid is, using probabilities. These probabilities are then used in creating larger predictions of what the objects in the image are.

Now for object localization. Non-max suppression, another algorithm, is used to take into account that objects may span more than one grid cell. Grid cells with probabilities below a certain threshold are discarded, and the cells with the greatest probabilities are kept. 

There’s obviously much more to learn about CNNs, YOLO, and non-max suppression. This is just a basic overview, but it does break down the way self-driving cars are able to “see” their surroundings. Using these algorithms, the cars can identify and locate pedestrians, traffic lights, other vehicles, and more.

All of this tech has to come together and function properly in order for an autonomous car to work correctly and safely. Object detection needs to work fast and have a very high accuracy. In the future, speed and accuracy can hopefully be improved so that self-driving cars can get out and on the road!

Learn More
“A Comprehensive Guide to Convolutional Neural Networks - the ELI5 Way” (https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)
“How Do Self-Driving Cars See? (And How Do They See Me?)” (https://www.wired.com/story/the-know-it-alls-how-do-self-driving-cars-see/)

Sources
Medium - Sumit Saha (https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53)
Medium - Albert Lai (https://towardsdatascience.com/how-do-self-driving-cars-see-13054aee2503)