Open Source R Language Could Revolutionize Business Intelligence

by Herman Mehling

The open source R programming language promises big advances in analytics and business intelligence, and IBM and SAS are among the companies getting on board.

The R programming language could be coming to a workplace near you — if it hasn't arrived already. The big deal about R is that it can analyze Big Data, those exploding data sets that have traditionally defied analysis.

R is the brainchild of Ross Ihaka and Robert Gentleman (known as "R" and "R"), academics at the Department of Statistics at the University of Auckland, New Zealand. Since Ihaka and Gentleman wrote the original R paper in 1993, R has become the lingua franca of analytic statistics among students, scientists, programmers and data managers.

R is a GNU project similar to the S statistical programming language and environment, which is often the vehicle of choice for analytic statistics. R provides an open source route to S and adds some unique capabilities. One of R's greatest strengths is the ease with which it can create well-designed publication-quality plots with mathematical symbols and formulae.

After long use in academia, R only recently began to appear in the business world. Among the vendors bringing R into the commercial realm are SAS, Netezza (NYSE: NZ), Revolution Analytics and IBM (NYSE: IBM), which acquired SPSS.

While IBM and SAS are the 800-pound gorillas in the business analytics market, they have been slow to evangelize R. Instead, the boosterism is flowing from Revolution Analytics, a startup founded by Norman Nie.

More than 30 years ago, Nie co-invented the Statistical Package for Social Sciences (SPSS), which marked the beginning of analytic and predictive statistical software. Now Nie is championing R.

In just two years, Nie's new company has won blue-chip customers such as Bank of America, Motorola and Pfizer.

Earlier this month, Revolution Analytics introduced 'Big Data' analysis to its Revolution R Enterprise software, taking R to what it claims are unprecedented levels of capacity and performance for analyzing very large data sets.

The company says R users will be able to process, visualize and model terabyte-class data sets in a fraction of the time of legacy products, without the need for expensive or specialized hardware.

This Big Data scalability will help R transition from a research and prototyping tool to a production-ready platform for enterprise applications such as quantitative finance and risk management, social media, bioinformatics and telecommunications data analysis, said Nie.

On its website, Revolution Analytics has published performance and scalability benchmarks for Revolution R Enterprise analyzing a 13.2 gigabyte data set of commercial airline information containing more than 123 million rows and 29 columns.

The new version of Revolution R Enterprise introduces an add-on package called RevoScaleR that provides a new framework for fast and efficient multi-core processing of large data sets, a capability that Revolution Analytics says sets it apart from other vendors.


Taking Hadoop and Other Big Data Sources to a New Level

Revolution R Enterprise works with Hadoop, NoSQL databases, relational databases and data warehouses, products used to store and do basic manipulation on very large datasets.

"Together, Hadoop and R can store and analyze massive, complex data," said Saptarshi Guha, developer of the RHIPE R package that integrates the Hadoop framework with R in an automatically distributed computing environment. "Employing the new capabilities of Revolution R Enterprise, we will be able to go even further and compute Big Data regressions."

Netezza, a maker of data warehouse, analytic and monitoring appliances, recently announced its TwinFin data warehousing appliance, which integrates with R. The new appliance includes Netezza's i-Class analytics capabilities and a new release of Netezza Performance Software.

The vendor says its i-Class technology provides extensions for the development and execution of advanced analytics, including support for Java, C/C++, Fortran, Python, MapReduce, Hadoop, SAS and R. The i-Class technology eliminates the need to move data into specialized systems for advanced analytics, accelerating application performance and simplifying their deployment.

Recently, Kelley Blue Book selected the Netezza warehouse appliance to gain deeper insights into advertising performance and site traffic, and to increase customer satisfaction and advertising revenue.

kbb.com, with more than 18 million monthly visits, dramatically reduced the amount of time it took to process data for a variety of purposes, said Dan Ingle, vice president of analytic insights at Kelley Blue Book.

"With Netezza, kbb.com can leverage analytics to provide clients with more accurate user information within minutes, instead of days," said Ingle.

For instance, the company can process vehicle valuation in a day, producing values that can be delivered in near real-time to the marketplace.

Kelley Blue Book also leverages the TwinFin appliance to predict advertising for 12 to 24 months, enabling it to increase revenue, enhance insight into ad performance, and create accurate ad forecasting and analysis for its advertisers.


  This article was originally published on Monday Aug 30th 2010
Mobile Site | Full Site