Big Data Pros Are Made, Not Born

by Ann All

IBM, Hortonworks and Cloudera are among the companies providing training to help people hone their Big Data skills.

Companies are understandably excited about Hadoop's potential to help them derive  business benefits from their swelling volumes of data.  To do that, however, they'll need database administrators, developers and other professionals familiar with the open source project for storing and processing huge amounts of data. Unfortunately, folks with Hadoop skills are in short supply.  

A growing number of technology vendors are addressing this dearth of Hadoop professionals by providing training in Hadoop and other Big Data technologies.  Here is information on a few of the offerings:


Hortonworks, a company founded by a team of former Yahoo software engineers who contributed some 80 percent of code to the Hadoop project, last month launched a Hadoop training program called Hortonworks University. Bob Mahan, the company's global head of Field Services, said Hortonworks and other companies offering Hadoop services have been "overwhelmed" by customers seeking Hadoop training for their developers and database administrators.  "It's amazing to see the demand grow as quickly as it has," he said.

Hortonworks currently offers a four-day course for Java developers who want to better understand how to develop Apache Hadoop solutions and a two-day course for database administrators who want to learn how to deploy and manage Hadoop clusters. The DBA course will likely expand soon and become a three-day course, Mahan said.

Hortonworks, which has partnered with Microsoft to enable Hadoop to run on Windows Server and to create a connector to link Excel with Hadoop, also plans to add a class on developing Hadoop solutions on Microsoft Windows, probably in the third quarter. Thanks to inquiries from customers, Hortonworks also intends to add a course for data analysts.

"We can create all these MapReduce programs and enormous Hadoop environments, but the bigger question for customers is what do they do with all that data," Mahan said, noting the new course should help answer that question.

Hortonworks offers certifications for those who complete its training courses, with certification tests administered by Pearson Vue. The certifications can bolster a resume, Mahan said. "One of our students added the certification to his LinkedIn profile and started getting calls."

In addition to these public training courses, Hortonworks offers on-site Hadoop training for clients that want to tailor courses to their specific development and admin needs and the products and services they already use.

Thanks to a Hortonworks executive's relationship with the University of Georgia's Terry College of Business, Mahan said Hortonworks will likely make Hadoop training available through the college and will hopefully make similar arrangements with other universities. "We’d like to enable folks coming out of college to use this technology and to improve adoption of Hadoop," he said.


Cloudera, which also employs some of the creators of Hadoop, has offered Hadoop training since early 2009. Sarah Sproehnle, the company's director of Educational Services, said more than 10,000 people have taken instructor-led courses, with "many more" viewing free instructional videos on the company's website.

Cloudera has awarded "several thousand" certifications for developers and DBAs. Beginning in May, its certification exams will be administered by Pearson Vue. Like Hortonworks, Cloudera offers public training courses as well as on-site courses geared toward specific client needs. Cloudera has offered training in 20 countries to date, Sproehnle said.

In addition to courses for developers and admins, Cloudera offers an "Essentials" class for executives, managers and architects who want to get up to speed on Hadoop basics and a course for data analysts. The wide variety of courses is intended to give companies "the right knowledge about Hadoop across the organization," Sproehnle said. 

"Executives can't make the right decisions without knowing how Hadoop fits into a Big Data infrastructure or what resources they'll need to hire and purchase," she said. "Developers need to know how to develop Hadoop jobs, including optimizing and debugging in a complex distributed environment, by using a variety of tools like Java MapReduce, Hadoop Streaming, Hive, Pig and HBase.  Administrators must be trained to properly install, configure and monitor a Hadoop cluster.  And many organizations have business or data analysts that use SQL, R or other tools and need to re-purpose their understanding of data analysis using Hadoop."


IBM is offering a broader and not exclusively Hadoop-based approach to Big Data with its training efforts. With the aim of exposing college students to Big Data concepts, IBM created a virtual lecture series called Tech Talks which gives students at schools including the University of Arizona, University of Michigan and Santa Clara University an opportunity to learn from IBM's Big Data experts.

In October IBM launched a Center for Digital Transformation at Fordham University to bring together industry and academic experts to conduct research and develop curricula about business analytics. Fordham is also collaborating with IBM on a Business Analytics degree program. "There's no disputing that analytics skills are in high demand, and with this new program, we are taking the next step in ensuring students are prepared to use technology to solve complex business challenges," said Mark Hanny, vice president, IBM Academic Initiative.

IBM has partnerships with some 200 other universities around the world, including Yale, Harvard, DePaul and Northwestern, to promote the inclusion of business analytics in both undergraduate and graduate courses.

Big Blue also offers training in Hadoop, stream computing and Big Data analytics through its BigDataUniversity.com, which it launched at its Information On Demand Conference in October 2011. The number of participants enrolled has doubled over the past five months to almost 14,000 – with no real marketing campaign promoting it. Online resources direct students to free downloadable books, courses, communities for sharing information and methods for receiving certification.

In addition to its online programs, IBM in 2011 conducted 1,200 Big Data skills bootcamps at client, partner and university sites. Some 2,400 college students, graduate students and IT professionals were trained it data management techniques during the bootcamps, Interest in this type of training is booming, said Anjul Bhambhri, IBM's vice president of Big Data Products.

"The amount of data is growing tremendously, but CIOs and decision makers tell us they are still making decisions on the relatively small percentage of data captured in repositories. They are becoming more aware they are not leveraging all of the data that is being collected," she said. "They know they must get their arms around Big Data if they want competitive advantage. It's not just about the technology and use cases. To gain real competitive advantage, you have to have the right talent and the right teams."

BigDataUniversity.com site users receive a certificate of completion indicating they have satisfied course requirements once they finish the course, including the course name, the instructor and date. IBM also offers a Mastery Exam, which is administered at events like the Information On Demand conference and at Prometrics Testing Centers.

While IBM's various educational efforts naturally include information on IBM's Big Data product offerings, Bhambhri said other technologies and approaches are covered as well. "There are certain capabilities only IBM is providing, which we explain, but we want people to more broadly understand the need for Big Data analytics capabilities and what is out there. People need to know how to evaluate vendor solutions and what to look for; that doesn’t mean they will go with IBM every time."

Think Big Analytics

Think Big Analytics, a consulting company which recently joined Amazon Web Services (AWS) Solutions Providers Program, is offering courses geared toward Big Data development with Amazon Elastic MapReduce and Amazon Web Services.

The Think Big Academy offers courses ranging from three to five days covering all facets of building Big Data solutions, including training in the principles of the Hadoop Distributed File System (HDFS), how to use Amazon Elastic MapReduce, and how to effectively configure and utilize the Hadoop ecosystem on Amazon Web Services.

Rick Farnell, co-founder and president of Think Big Analytics, said the focus is on helping students understand the patterns of using the right architecture and tools for the right analytic workload and is based on his company's experiences training its own new hires.

Farnell said building advanced analytic solutions on Amazon makes sense due to that platform's speed to deployment, scalability and flexibility. It facilitates a "best-of-breed approach that allows for any mix of scenarios of on premise, cloud and hybrid data center environment integration," he added.

Revolution Analytics

Revolution, a provider of analytics solutions based on the open source R programming language, offers several courses that can presented at client sites as well as several online courses. Richard Kittler, the company's vice president of Services, said that although Revolution does not yet offer a regular series of public courses, it has worked with "R gurus" such as Hadley Wickham, an assistant professor of statistics at Rice University, to present some and hopes to expand its public course offerings.

"Our topics cover how R is used to solve business problems in industries such as banking, insurance and in areas such as marketing optimization," Kittler said.

  This article was originally published on Thursday Mar 29th 2012
Mobile Site | Full Site