Open source business intelligence provider JasperSoft is among the many software vendors looking to help companies derive value from Big Data information. The new JasperSoft 4.5 release isn't just about connecting to Big Data, it's about analyzing the data quickly.
Mike Boyarski, director of product marketing at JasperSoft, explained to InternetNews.com that analytics solutions typically connect to Big Data framework Apache Hadoop via the Hive interface. Hive provides a SQL layer for Hadoop, using a syntax familiar to enterprises. A problem with Hive is that it can add latency, slowing the analytics process.
JasperSoft 4.5 takes a new approach to connecting to Hadoop data. Instead of getting at the data via Hive, JasperSoft uses the HBase layer. The end result, according to Boyarski, is that users no longer need to wait long periods of time during data analysis.
“HBase does not have a SQL interface, so it's different but we're able to pass query parameters and filters to the storage engine and get results," Boyarski said. "Based on those results we can then format the data so that someone can make sense of it."
JasperSoft's BI solution has included an in-memory analysis capability since at least the 3.7 release in January of 2010. In-memory analysis can help speed up queries against traditional data stores and, with the 4.5 release, might be of some benefit with Big Data as well.
Boyarski noted that since JasperSoft 4.5 is able to pass parameter filters to the Hadoop query, a manageable amount of data could potentially be handled in-memory.\
“It's a combination of being able to control the results set query, and our engine is intelligent to know if the result set is too large to take advantage of in-memory processing," Boyarski said.
JasperSoft isn't the only vendor that uses in-memory for speeding up analytics. Leveraging fast in-memory analysis is also a key feature of the recently announced Exalytics appliance from Oracle.
By going the HBase route, JasperSoft loses the power of SQL. Boyarski noted that the HBase interface is designed more for short, iterative queries that don't require the MapReduce infrastructure, as Hive does. As such, HBase has a simpler structure and lacks the strength of SQL. It is faster, however, since it has less overhead.
“The downside to the simplicity is that there is only so much you can pass through the query filter,” Boyarski said. "It's evolving, and as HBase improves we will be able to pass more to the environment to take advantage of more sophistication."
While the process of analyzing Hadoop data using HBase is faster from a processing perspective, writing queries with the HBase approach might involve more time up front.
“We haven't got to the point where it's a wizard-driven query design environment," Boyarski said. "I would expect that is something that we'll look at in the future as we do have a metadata layer solution for relational data, and that is a wizard-driven query designer."