Cloudera, one of the leading commercial sponsors of Hadoop, is now aiming to enable faster Big Data queries by introducing a new technology codenamed Impala. The goal with Impala is to enable rapid and interactive queries.
Cloudera CEO Mike Olson said MapReduce was originally designed by consumer Internet companies to process large-scale, batch data workloads.
"If you want to get at your data for interactive queries, you just can't get there with MapReduce," he said. "That means that Hadoop just doesn't get deployed for a whole bunch of workloads."
How to Speed up Hadoop
Impala doesn't replace MapReduce, Olson noted. "What we have done is added another execution framework – another way to get at the identical data in a Hadoop cluster. Customers can transform and analyze data with MapReduce and they can query the results using Impala."
Impala is also complementary to the SQOOP SQL database technology that Cloudera first released in 2009.
"SQOOP is a way to move data between a relational database and Hadoop," Olson said. "With Impala, you can now get the same interactive query speeds that you would expect with a relational database."
The Impala technology is being made available today as a public beta under the open source Apache license. The plan is for Impala to be part of the Cloudera Distribution of Hadoop (CDH) version 4.5 in the first quarter of 2013.
"We're not politically committed to open source," Olson said. "We just believe that open source is a better way to develop platform software, and it's the way customers of ours want to consume the platform."
The last major release of CDH, CDH 4.0, debuted in June of this year, providing enterprise-grade stability features. Olson explained that Cloudera has quarterly point releases to update the platform. The next major release, CDH 5.0, is currently scheduled for the middle of 2013.
"We're not yet announcing the key features of CDH 5, but it's mostly about more enterprise grade features for our installed customer base," Olson said. "Impala as an addition to the platform is non-disruptive, so we can roll it into one of our point releases."