There are not enough data scientists to go around for companies hoping to capitalize on their growing streams of structured and unstructured data. Academic programs designed to produce more of them are growing like crazy and entry-level pay for data science gigs runs as high as $200,000 a year, according to a recent Bloomberg Businessweek article.
Given that, it is "absolutely crazy" that many data scientists spend up to three-quarters of their time cleaning up data, said Lukas Biewald, founder and CEO of startup CrowdFlower, provider of a data enrichment platform for data scientists, and a former data scientist himself for companies including Yahoo and Powerset, a search company now owned by Microsoft.
Algorithms are useless without what data professionals call "training data," Biewald said. "The training data is the most important piece to making data models work. Many people focus on algorithms, but your models cannot do anything if the training data is not good."
Data science was not a common term in 2009 when Biewald founded his company, then called Dolores Labs. Originally it provided data enrichment as a managed service. About three years ago Biewald decided to market the technology used internally by his employees via a software-as-a-service (SaaS) platform.
He changed the name to incorporate the crowdsourcing aspect of the platform, which handles tasks that include data collection, sentiment analysis, improving machine learning models and tweaking internal search relevance. For the human element, CrowdFlower partners with companies such as CrowdGuru, ClixSense, Listia, Daproim, IndiVillage and iMerit.
Data Science and Artificial Intelligence
Earlier this year CrowdFlower added artificial intelligence (AI) to its platform, which will allow customers to automate more tasks. While the aim is to automate tasks as much possible, Biewald wants to "combine the best of humans and the best of machines," he said. "Your AI tool will solve 80 percent of your problems and you'll have people to solve the last 20 percent of your problems."
Early adopters of CrowdFlower include companies in the retail, financial services and technology sectors. The platform's usage offers "a window into what data science cares about," Biewald said.
Marketing teams, for example, use it to understand how their brands are perceived on social media and to craft product and campaign strategies.
"It's not just about sentiment analysis but asking the next set of questions," Biewald said. "You can see consumers like Samsung, but is it because the new phone they launched is super cool, or do they actually like their carrier, or is a particular ad campaign resonating with them?"
CrowdFlower licenses its platform to customers for a fee, typically six figures, though the cost varies based on volume of tasks. Customers can use their own employees or workers from any of CrowdFlower's partners, some of whom can provide different language capabilities or other areas of expertise. Payment can take place directly through the partners or through CrowdFlower, Biewald said.
While "now is the time to make AI (artificial intelligence) a reality," he said, his company is "not anywhere near being able to automate everything. We are just trying to make sure that data scientists don't spend 80 percent of their time cleaning data."
Fast Facts about CrowdFlower
Founders: Lukas Biewald and Chris Van Pelt
HQ: San Francisco
Product: Data enrichment platform designed to help companies better leverage their data science resources
Customers: eBay, Delectable and Skout, among others
Funding: $28 million, with investors including Canvas Venture Fund, Harmony Partners, Bessemer Venture Partners, Quest Venture Partners, K9 Ventures
Ann All is the editor of Enterprise Apps Today and eSecurity Planet. She has covered business and technology for more than a decade, writing about everything from business intelligence to virtualization.