Under the simple skin of Uber lies complexity you may not have considered: the logistics of predicting how long it will take rides (or meals) to arrive, setting pricing and even where to wait to give a driver the best odds of finding you for pick-up.
Underlying those decisions is machine learning, using computers to find patterns and make predictions without explicitly programming them to do so.
And that very important job falls to Danny Lange, a Danish-born researcher who joined the ride-hailing giant 11 months ago after a nearly two-year stint leading machine learning efforts for Amazon Web Services. Before that, Lange wrangled big data for Microsoft and even launched a Silicon Valley startup.
He heads a growing team of researchers in San Francisco and Seattle. Uber opened an engineering office in Seattle last year, and that office now houses about 150 people working on machine learning, as well as product engineering and operations.
GeekWire caught up with Lange recently to talk about how Uber uses machine learning. Here are edited excerpts from the conversation.
Q: Thanks for chatting with us. Can you start by saying a bit about how Uber is using machine learning?
Lange: My team and I are making machine learning available as a service to everyone in the company. Traditionally, a company would hire Ph.D.’s and data scientists, and each team would have to figure out its own algorithms. We’re creating a way to access machine learning just using web interfaces, APIs and SDKs (software development kits). Our customers inside the company can expect this service to run 24/7 for them, with no need to become specialists in machine learning themselves.
Q: How widely is machine learning being used within Uber?
Lange: Many of program teams are using it, including Uber Eats (a food delivery service), UberX (the basic ride-hailing service), UberPOOL (a ride-sharing variant on UberX), and Uber Maps (an as-yet unreleased service). For example, Uber Eats uses machine learning to estimate the delivery time for your meals. We went from a finite approach — where you compute the time using the distance between you and the restaurant, the average speed and the time to prepare the meal — to taking the delivery times for thousands and thousands of meals and basing the prediction on that. Overnight, that improved our estimates by 26 percent.
Q: What are some other examples of machine learning underlying Uber’s services?
Lange: The ETA for Uber X rides. There’s no one sitting there saying: ‘There’s 7 miles between you and the car and it will take the car 14 minutes to get to you.’ It’s based on data from millions of trips, which lets us take into consideration normal patterns that occur day after day. Also, sometimes the Uber app will advise customers to move a few yards to or around a corner for pick-up. That’s done by an algorithm that detects patterns of successful pick-ups, versus those where there have been challenges. We use the experience of millions and millions of pick-ups. It’s something the system learns. And we’re now using machine learning to detect fraud, such as account take-overs and stolen credit cards.
Q: What technologies underlie Uber’s machine learning?
Lange: We offer about 10 different algorithms, including boosted trees, linear learners and neural networks. We give our internal customers the benefit of taking the algorithm that suits their problems best.
Q. How would Uber be different if it had no machine learning? Would it be able to function as a company?
Lange: I actually don’t think so. The concept of machine learning is that we have this constant feedback loop and we learn every day from it.
Q: Why don’t you just use the machine-learning services you built for Amazon Web Services?
Lange: We have lots of special needs. We have scale issues and challenges around being a global service. We run our own data centers and built a lot of the apps ourselves.
Q: Machine learning is very compute- and data-intensive. Does Uber use any public-cloud services at all, or any external data?
Lange: To some extent (we use cloud computing), but most of what we’re doing is in-house. All the data we use is internal.
Q: Are you building on what you created at AWS, or have you gone in a different direction?
Lange: We are pursuing an open-source path. We’re using Hadoop (massively distributed computing technology), Spark (a large-scale data processing engine from Apache) and MLLib (a scalable machine-learning library). We have created a centralized repository of data, so if you’re launching a new service within Uber, you can use this data that has been gathered from other services and bootstrap yours.