Big Data Buyer’s Guide, Part Two: IBM, SAS, Pentaho and More

by Drew Robb

Oracle and SAP have commanded much of the buzz around Big Data with their respective Exalytics and HANA products. But plenty of other vendors, including IBM and SAS, offer alternative approaches to dealing with Big Data.

In Part One of this series on Big Data solutions, we compared Oracle Exalytics and SAP HANA. But what about the alternatives? SAP and Oracle aren’t the only games in town. IBM, SAS, Pentaho, Birst, Terracotta and others are moving forward with their own approaches to Big Data.

“Users face the challenges of analyzing zettabytes of information and scaling existing data warehouses and analytics to match new data types within Big Data,” said Wayne Kernochan, an analyst at Infostructure Associates.

A torrential downpour of data has businesses worried. A recent IBM/MIT Sloan Management Review survey of 3,000 executives found that 60 percent felt they had more data than they could effectively use. Further, 71 percent of marketing officers admitted their employers were unprepared to deal with the explosion of Big Data.

Here’s what a few companies are doing about it:


SAS was doing analytics long before companies like Oracle and SAP cottoned onto the term. Now it is taking its savvy into the Big Data space. Its most recent product in this arena is SAS High-Performance Analytics , in-memory analytic software that runs on a Teradata or Greenplum appliance.

“Big Data analytics is an extension of what we have been doing for a while,” said Keith Collins, chief technical officer at SAS. “Big Data in and of itself is not the biggest issue, but good information management practices are crucial to managing and finding value in Big Data.”

Other vendors tend to consider Big Data as part of a technology discussion related to Hadoop, NoSQL and other data processing methods, Collins said. SAS, on the other hand, focuses on the information management side by providing a strategy and supporting solutions that allow Big Data to be analyzed, regardless of the storage mechanism or technology employed.

“It’s not just Big Data, it’s what you do with the data to improve decision making that will result in business gain,” Collins said. “SAS High-Performance Analytics helps organizations find value in Big Data, solving difficult problems.”

Collins said SAP HANA uses an in-memory database to process high volumes of transactional data and queries and reports. Oracle Exalytics’ in-memory hardware and software system augments Oracle business intelligence software with data discovery capability. He characterized both as employing reactive query and reporting and providing descriptive statistics rather than proactive advanced analytics and optimization.

“It’s the difference between a rear view mirror and knowing what’s down the road,” he said. “Unlike other offerings, SAS HPA can perform analyses that range from descriptive statistics and data summarizations to model building and scoring new data at breakthrough speeds.”


IBM offers an awful lot of Big Data analytics products such as InfoSphere Streams, Business Analytics: Cognos 10 , Cognos Consumer Insight , Netezza MPP Data Warehousing , IBM Smart Analytics System  and IBM SPSS. IBM considers its Big Data strategy more client-focused than SAP HANA, Oracle Exalytics and other competitors. 

“IBM deals with all aspects of Big Data including data in motion as well as data at rest, structured and unstructured data,” said Leon Katsnelson, IBM director of Big Data Cloud Initiatives. “The IBM platform does not treat Big Data as an island and instead allows our clients to integrate it with their existing business processes and to augment their existing data warehousing solutions.”

In addition, IBM has a Hadoop-based analytics product known as IBM InfoSphere BigInsights. BigInsights software is the result of a four-year effort of more than 200 IBM Research scientists and provides a framework for large-scale parallel processing and scalable storage for terabyte to petabyte-level data. It incorporates unstructured text analytics and indexing that allows users to analyze rapidly changing data formats and types on the fly.

BigInsights can be used with software called InfoSphere Streams that analyzes data entering an organization and monitors it for changes that may show new patterns or trends.  In May IBM announced enhancements to Streams that make it possible to analyze Big Data such as Tweets, video frames, GPS, and sensor and stock market data up to 350 percent faster than before. BigInsights complements Streams by applying analytics to an organization's historical data as well as data flowing through Streams.


Speaking of Hadoop, Pentaho was one of the early players in the Hadoop/analytics space. In a short time, Hadoop has become something of a Big Data darling – having gained fame as the underlying logic behind massive Google analytics systems. Everyone, except perhaps SAS, is jumping on the Hadoop bandwagon.

“Most of our conversations about Big Data are with organizations using technologies such as Hadoop and No-SQL databases that are alternatives to Oracle/SAP offerings,” said James Dixon, who holds the unconventional title of Lord of the 1s and 0s at Pentaho. “Oracle Exalytics and SAP HANA products seem to appeal to locked-and-loaded Oracle and SAP customers.”

Noting that SAP claims HANA is 3,600 times faster than regular SAP ERP queries, Dixon’s take is that HANA is therefore a fix for an SAP performance problem experienced by its customers. “I’m not saying that HANA is not good technology, just that it wasn’t necessary invented to solve general Big Data problems,” he said.

As for Oracle, he said he had talked to various organizations that already have all the Oracle database capacity they can afford, yet can only store part of their data there and find it expensive to scale. “These companies go for Hadoop and the No-SQL databases because they provide scale-out solutions that are more flexible/granular,” he said.

Pentaho is not addressed to a specific database or application, but covers a range of data types, formats and sizes. As such, it encompasses data access and integration, data discovery and analysis, as well as data visualization, reporting and dashboards on Big Data sources.

In November Pentaho released Business Analytics 4.1, software that provides advanced in-memory features that enable enterprises to leverage the benefits of in-memory as well as disk-based analysis. In contrast to a pure in-memory model, Pentaho’s approach is not limited by scalability issues. To optimize performance, companies can choose what data gets loaded into memory and manage the data cache.

Like most Pentaho applications, Business Analytics 4.1 is available in both open source community and commercial editions. A key differentiator is the advanced in-memory integration offered with the commercial release.


Another company that is not as well known as SAS, IBM, SAP or Oracle, Birst  just released version 5 of its business intelligence software, which it has positioned to compete with the Oracle Exalytics and Big Data appliances, as well as HANA.

“Exalytics and HANA are being optimized for analytics by Oracle and SAP to run their business intelligence solutions on top of,” said Wynn White, vice president at Birst. “These guys for years have been piece-mealing the overall solution together from mostly acquired, and in some rarer cases, built parts.” 

He believes this assemble-the-parts approach translates into poor integration, which means costly implementations and an uneven experience for users. By building business intelligence from the ground up, he said, overall deployments can be made faster and cheaper.  

The company calls Birst 5 the first in-memory database optimized exclusively for business analytics.  It uses a SQL-based, columnar database.

 “Data discovery tools are limited by the data sets they can handle and cannot deliver analytics that scale,” said White. “High-volume, high-speed analytical queries can be accomplished using in-memory technology which is the future of business analytics.”

The Birst in-memory analytics database builds on Birst’s data warehouse automation technology, which provides for data integration across data sources such as SAP, Salesforce, operational and financial systems. By offering it as an appliance, the company gives users the option of deploying its tools as a self-contained appliance, in the cloud or on-premise.


The little guys really do like to pick on SAP and Oracle. Mike Allen, vice president of Product Management at Terracotta, decries both as HANA and Exalytics as high-cost products hoping to lock customers into vendor solutions by speeding up long running (and non-real-time) analytic processes. He said both use in-memory processing within the database but don’t create a bridge between the application and the data.

“At the end of the day SAP HANA and Oracle's solutions are still databases - running on dedicated hardware that has to be sized to the problem in hand - that are accessed across a network,” said Allen. “Terracotta BigMemory stores data right where it’s used, in the memory where the application runs. This makes it much faster.”

BigMemory is a software solution that enables users to process terabytes of data in memory. It is being used in areas such as fraud detection, credit card authorizations, trade order processing, billing, call center/e-service, database/mainframe offload, and high volume customer portals/online services.

“BigMemory excels at servicing both transactional and analytical workloads from within a single unified in-memory store,” said Allen. 

Winners and Losers

Clearly, there are many opinions on which Big Data approach is superior. Barry Cousins, senior research analyst at Info-Tech Research Group, agrees with many of the views above that HANA is an in-memory analytics layer for all things SAP. He’s less kind with regard to Exalytics.

“Oracle Exalytics still looks like a competitive reaction based on a bundle of legacy technologies,” he said.   

Cousins, though, is largely positive about IBM InfoSphere BigInsights, noting that it had a long and strong development path resulting in a reasonably mature product release. Therefore, he feels IBM is well-positioned in the overall data/business intelligence/analytics space. He holds a similar opinion of SAS.

“SAS had the most natural evolution to in-memory analytics of the major players and are best positioned in terms of expertise, partners, and platform openness for flanking technologies,” Cousins said. “IBM and SAS stand out as the vendors with the most interesting blend of technology, expertise and partners for clients with complex analytics needs.”

His thoughts on some others:  Birst is aimed at mid-sized companies with a data warehouse /business intelligence solution that integrates well with existing data technologies. Pentaho’s commercial open-source model and subscription pricing have driven faster adoption in non-U.S. markets than their larger competitors.

And don’t count out HP, he said, which acquired Vertica last year and aims to make it central to its Big Data analytics strategy.  In November HP announced Vertica Analytics Platform, software designed to improve real-time performance for big business intelligence and analytic workloads that can analyze data on a massive scale.

Drew Robb is a freelance writer specializing in technology and engineering. Currently living in California, he is originally from Scotland, where he received a degree in geology and geography from the University of Strathclyde. He is the author of Server Disk Management in a Windows Environment (CRC Press).

  This article was originally published on Monday Feb 27th 2012
Mobile Site | Full Site