As demand for Big Data technologies grows, one technology has stood head and shoulders above all others – Apache Hadoop. Adding enterprise-class management and packaging up all the disparate parts of Hadoop is where Cloudera comes into play.
This week, Cloudera updated its core open source CDH 4 (Cloudera Distribution of Hadoop) and Cloudera Enterprise platforms. The new releases provide more security and scalability than prior releases, according to Cloudera.
Mission-critical Big Data
CDH 4 is an open source collection of all the required components that enables a Hadoop Big Data deployment. Omer Trajman, vice president, Technology Solutions at Cloudera explained that with new high-availability features, Hadoop can now be used for mission-critical deployments.
One of the new high-availability features is a hot standby feature for NameNode in the HDFS (Hadoop File System). There is now also support for heterogeneous clusters, so enterprises can mix and match Hadoop versions as they scale up.
The HBase database component now has column and table level permissions, which improves security. The scheduling system also gets a security boost. Overall the world of multi-tenant access to Big Data has been improved in CDH 4, according to Trajman.
"The Fair Scheduler which is responsible for figuring out who gets to run jobs on the cluster and where the jobs run now can be grouped such that only users or groups can submit to certain resource pools," Trajman said. "So you can carve up your cluster and allocate only a certain amount to a userbase."
Sitting on top of CDH is the Cloudera Enterprise 4 release, which includes management capabilities for Hadoop. One of the new capabilities comes in the form of visualization with heatmaps that provide a view of health and status across Hadoop cluster metrics.
“It gives you a high-level view from which you can drill down to solve any operational issues within the system," Trajman said.
Another innovation is support for federated NameNode management for Hadoop. With that capability, Trajman said an enterprise can grow a CDH cluster into tens of thousands of nodes to store billions of files.
Cloudera has many partners; among them is Oracle. Cloudera partners with Oracle for the Big Data machine, which officially debuted in January of this year. The Big Data machine is an engineered system combining powerful hardware that can scale as high as 864 GB of RAM on top of 216 CPU cores and 648 TB of storage, together with Cloudera's Hadoop software.
Though Oracle's box has significant performance capabilities, Trajman noted that CDH 4 and Cloudera Enterprise 4 were not unduly influenced by Oracle.
"There are definitely a lot of things that Oracle is very interested in, but there is nothing from a Cloudera perspective that is only available in the Big Data appliance," Trajman said.