Salesforce is open-sourcing the method it has developed for using machine-learning techniques at scale — without mixing valuable customer data — in hopes other companies struggling with data science problems can benefit from its work.
The company plans to announce Thursday that TransmogrifAI, which is a key part of the Einstein machine-learning services that it believes are the future of its flagship Sales Cloud and related services, will be available for anyone to use in their software-as-a-service applications. Consisting of less than 10 lines of code written on top of the widely used Apache Spark open-source project, it is the result of years of work on training machine-learning models to predict customer behavior without dumping all of that data into a common training ground, said Shubha Nabar, senior director of data science for Salesforce Einstein.
“Data scientists are focused on most customer-facing issues, but there are other issues in the business where machine-learning could transform how a business operates,” Nabar said. The problem is that data scientists are expensive, and while developers aren’t exactly cheap, companies need to hire them anyway and they can implement something like TransmogrifAI without having to learn how to do machine learning at scale within tricky constraints, she said.
A classic running gag in the beloved Calvin and Hobbes comic strip — a series of drawings printed in things called newspapers once upon a time — was the transmogrifier, a cardboard box that could transform a boy and his stuffed tiger into anything they wanted to be. That’s a bit of the idea behind TransmogrifAI, which will allow developers to transform applications that might not be top of mind for their data scientists with machine-learning insights.
“There are too few data scientists, and they are working on the most important problems,” Nabar said.
In most cases, when you’re looking to train a machine-learning model, you throw all the data at your disposal at it in hopes of determining patterns. But Salesforce’s customers are pretty paranoid about having the data sets mixed together, Nabar said, and so the company had to figure out a way to train its models with more limited data sets.
As befits a company like Salesforce, the project is primarily concerned with helping companies predict outcomes in their sales pipelines by automating a lot of the work that is usually done by an expensive data scientist. Figuring out why a customer bailed at a certain stage of the process allows companies to refine their sales tactics over time, and the companies in a given market that figure this out faster than their competitors can start to get an edge on the competition.
[Editor’s Note: Salesforce is a GeekWire annual sponsor. This post was updated to clarify the size of the codebase for the project.]