In today’s data-rich enterprise, Hadoop gives enterprises a much-needed alternative to traditional relational databases.
Hadoop is an open source Apache project for storing and processing large amounts of data. Technically, it’s a stack of solutions, including MapReduce, Hive and other open source tools. As with Linux, there are commercial enterprise distributions supported by vendors.
What makes Hadoop so powerful is its ability to distribute the data and the workload across nodes. It’s open source, which decreases licensing costs. More importantly, it uses industry standards for hardware, which means you can use commodity servers to run it.
That means you can run it on regular servers at a cost of about $4,000 for each node, according to Cloudera, which offers an enterprise distribution of Hadoop. That’s a yard sale price when you consider the $10,000-12,000 you’ll spend per terabyte for relational databases.
Enterprises and vendors are finding business-changing ways to put this power to use. Here are eight innovative ways companies are using Hadoop and Big Data to reduce risks, better serve customers and even change the Internet.
1. Build your own cloud for enterprise applications. Enterprises are exploring how they can use some of the money-saving principles of the public cloud in their internal infrastructures to cut costs on enterprise applications and data. Doug Cutting, one of the co-creators of Hadoop, now works at Cloudera, which offers an enterprise-ready distribution of Hadoop. He pointed out that Amazon runs their version of Hadoop as a cloud service, and companies are catching on.
“There’s another sense of cloud, which is just the general notion of having a service out there that you run things on rather than running them on your own machine,” Cutting said. “They build a cluster internally, but it’s run as a service for their company. So they put all of their different data sets in there and then different groups have different needs, but they all share this. That's a private cloud model.”
2. Keep the lights on by finding points of failure in the power grid. Sensors generate a huge amount of data, but until now, it’s been difficult to put all that data to use. For instance, the Tennessee Valley Authority has sensors all over the country monitoring its power transmission lines and generation facilities. These sensors monitor electricity usage at that point, generating data 50 to 100 times a second, Cutting said. Hadoop allows them to save all of this data and run test patterns against it to find points of failures before there’s a power outage.
3. Troubleshoot in real-time to keep systems up and running. From power surges to switch problems, telecommunications companies constantly collect data about what’s happening within their networks. It hasn’t always been useful, however, because they weren’t able to process it quickly enough to make it meaningful, explained Lance Speck, general manager for integration products at Pervasive Software.
“If things are happening with the switches country-wide and they can't narrow down to what's trending bad or what needs attention, if they can't do that in minutes, well then it doesn't really help them much,” Speck said, adding that telecoms use Pervasive’s DataRush to process Hadoop-stored data in minutes instead of hours. “That’s a real game changer for them.”
4. Make smarter decisions about credit risks. One of the key questions about the subprime mortgage crisis came down to whether financial institutions made poor credit risk decisions. Hadoop is helping banks more accurately determine someone’s credit worthiness by allowing them to integrate different sorts of data internally for a better measure of credit worthiness. “If they more accurately say yes to people who will pay their loans and no to people who won’t, then that’s a huge benefit to a bank,” Cutting said
5. Build the Internet of Things. Hadoop and Big Data solutions in general make sensor data more usable, which will pave the way for bringing all kinds of useful gadgets online. A great example is Twine, a tiny Wi-Fi gadget invented by two MIT Media Lab graduates. Twine is armed with sensors to detect temperature and vibrations, with plans to add a moisture sensor. If there’s a change - say, you’ve placed Twine on the washer and the vibrations have stopped - Twine can send a wireless message telling you it’s time to switch the laundry. GigaOm calls this “The Internet of Things” and estimates that by the end of this decade, there will be more than 50 billion sensor-based gadgets online.
6. Make a better match for customers. Being a matchmaker is no easy task - particularly when you’re sifting through 20 million users. The online dating service, eHarmony, moved from a relational database and traditional batch jobs to running Hadoop and MapReduce, a companion open source processing software, on top of Amazon Web services. This shift cut the time and costs of running its algorithms. “They run a Hadoop job every night to tell you who they think you should date today,” Cutting said.
7. Do a better job of reaching customers. One of the more widely publicized uses for Hadoop is analyzing online user data. Web advertisers use Hadoop to more accurately target readers with promotions, Cutting explained. Hadoop is also used by Yahoo! and Facebook to send targeted information to their users.
Once the domain of Hadoop programs, these types of Hadoop queries are being “democratized” as more business intelligence and data integration vendors such as Pentaho and Informatica offer Hadoop connectors.
8. Revolutionize enterprise applications. While there are already plenty of business use cases for Hadoop and Big Data, we still haven’t seen the full implications for these technologies. That will change soon as enterprise applications evolve to make better use of Hadoop data stores, according to Nucleus Research’s “Big Data - Beyond the Buzzwords.”
There’s a large amount of untapped data sitting in CRM, ERP and other enterprise systems, ripe with possibilities. Big Data solutions like Hadoop will allow businesses and software vendors to put that data to use. Nucleus Research predicts we’ll soon see enterprise applications with embedded analytics, integration of role-based interfaces, and a push-model for information, which is already available with some collaborative applications such as Salesforce.com’s Chatter