While it's easy to find marketing hype about Big Data, it's hard to find professionals who understand what it is and know what to do with it.
EMC, the enterprise storage company that has been staking a claim in the burgeoning Big Data market since its 2010 acquisition of Greenplum, is trying an unusual crowdsourced approach to address the dearth of Big Data pros.
EMC/Greenplum earlier this week announced a partnership with Kaggle, an online community that hosts data modeling competitions, in which companies can use Greenplum's Chorus technology to tap into the expertise of the 55,000-plus data scientists using Kaggle.
EMC/Greenplum integrated Kaggle into Greenplum Chorus, a collaboration platform based on the technology of Pivotal Labs, a company it acquired last March. Josh Klahr, Greenplum’s vice president of Product Management, described Chorus as "a Facebook-like social collaboration tool for data science teams to iterate on the development of datasets and ensure that useful insights are delivered to the business quickly."
Big Data Meetup
According to EMC, Chorus users can search, browse and drill into the profiles of Kaggle community members who have opted to receive consulting opportunities through the Greenplum Chorus platform. An integration of Chorus and Kaggle APIs allows them to send messages to Kaggle members. Kaggle certifies Chorus as the source of the messages and forwards them to the desired recipients. Kaggle members review the messages and respond directly to Chorus users to further discuss details.
Anthony Goldbloom, founder and CEO of Kaggle, said the partnership creates an opportunity for Kaggle's community of data scientists to parlay their skills into contract work for companies seeking help with Big Data projects. “Teaming with EMC Greenplum opens up new and exciting opportunities to existing and future Kaggle community members. The partnership also helps to solve the acute shortage of elite data scientists, which prevents companies from taking full advantage of their data,” he said.
Goldbloom said Kaggle members are "able to tackle difficult problems, precisely the kind of problems that you expect Greenplum customers to be dealing with — unstructured text data, graph data, data sets missing values, for example."
EMC/Greenplum expects the Chorus and Kaggle integration to be available in November.
EMC is also releasing the Greenplum Chorus source code under the Apache 2.0 open source license. The OpenChorus Project "will speed innovation and adoption of collaborative data science practices, helping organizations to drive greater business insight and economic value from Big Data," according to EMC.
Ann All is the editor of Enterprise Apps Today. Follow Enterprise Apps Today on Twitter @EntApps2Day.