Guessing where a photo was taken can be a fun game. In fact, it is a game. But computers have historically had it easy, with the ability to quickly examine the metadata and place a picture on a map. A new machine learning program from Google, however, can place pictures that don’t have metadata and use machine learning instead.
PlaNet, created by Google computer vision specialist Tobias Weyand, can pinpoint 3.6 percent of photos down to a particular street, according to a new paper submitted to arXiv. Sure, that might not seem like a lot, but it fared much better than people at placing images on a map.
When going up against human opponents in the online game GeoGuessr, which asks contestants look at a Street View image and pick out where it is on a map, PlaNet won more than half the matches. Not only did it beat humans more than half the time, but its median error distance was nearly 1,200 km closer than the humans.
“In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” Weyand wrote. “[This] small-scale experiment shows that PlaNet reaches superhuman performance at the task of geolocating Street View scenes.”
But how did the team teach a robot brain to place images? By showing it a ton of them.
Weyand’s team fed their machine 91 million images that had geolocation data, but the machine didn’t actually try to memorize exact coordinates for each image—that would be too much data to sift through when searching later. Instead, it placed each image on a grid and looked at the visual cues in each one. That grid had more squares in denser urban areas (where more pictures were likely to be taken) and less out in the boonies.
The team then verified the neural network with another 34 million images. After that, it was time to test.
Before going up against humans, Weyand’s team tested the machine with 2.3 million Flickr images. That test found that the machine was able to place 10.1 percent of pictures at city-level accuracy and 28.4 percent in the right country. Just under half were placed in the right continent.
Again, those number may not sound great, but you may want to test your own skill before bad-mouthing this machine. And finding photos isn’t necessarily the end goal of this project; instead, it shows the power machines can apply to visual problems and highlights a new way of organizing data for machine consumption.