PMML Makes Predictive Analytics and Data Mining Easier

by Herman Mehling

PMML has become the de facto standard for predictive analytics by making it easier to deploy and share analytics models.

Predictive analytics — the art of mining and analyzing historical data patterns to predict the future — is not a common term among IT types, let alone consumers. Yet predictive analytics is practiced widely, and its use affects millions of consumers and businesses every day.

"Every time you swipe your credit card or use it online, a predictive analytic model checks the probability of that transaction being fraudulent," said Alex Guazzelli, vice president of analytics at Zementis, a developer of predictive analytics software.

"If you rent DVDs online, chances are a predictive analytic model recommends a particular movie to you," he said. "Predictive analysis is already an integral part of your life and its application is bound to assist you even more in the future."

Predictive analytics is also used in sensors in bridges, buildings, industrial processes and machinery, generating data and making predictions to alert people about potential faults and problems before they occur. Its many uses also include healthcare, financial services and insurance.


PMML: The Predictive Analytics Standard

Predictive analytics leverages techniques from statistics, data mining and game theory to help individuals analyze current and historical facts so they can make predictions about future events.

The Predictive Model Markup Language (PMML) is the de facto standard to represent predictive analytic models and is currently supported by all of the top commercial and open source statistical tools.

The language is supported by top business intelligence and analytics vendors like IBM, SAS, MicroStrategy, Oracle and SAP, and NASA and Visa can also be found on the member list.

PMML enables the instant deployment of predictive solutions. Within a company, PMML can be used as the lingua franca not only between applications, but also between divisions, service providers and external vendors. In this scenario, it becomes the standard that defines a single clear process for the exchange of predictive solutions.

PMML represents a myriad of predictive modeling techniques, such as Association Rules, Cluster Models, Neural Networks and Decision Trees. These techniques empower users around the globe to extract hidden patterns from data and use them to forecast behavior.

The beauty of the XML-based language is that it allows people to easily share predictive analytic models between different applications, said Guazzelli.

"Therefore, you can train a model in one system, express it in PMML, and move it to another system, where you can use it to predict, for example, the likelihood of machine failure," he said.

The PMML project is the invention of the Data Mining Group, a vendor-led committee composed of commercial and open source analytics companies and government and academic users. Consequently, most of the leading data mining tools today can export or import PMML. A mature standard that has evolved over the last 10 years, PMML can represent not only the statistical techniques used to learn patterns from data such as artificial neural networks and decision trees, but also pre-processing of raw input data and post-processing of model output.


Building Analytics Models That Can Be Shared

Sharing models between applications is key to the success of predictive analytics. But to be able to share a model, you first need to build it. Model building is composed of several phases, including an exhaustive data analysis phase.

"In this phase, you slice and dice raw data and select the most important pieces of information for model building," said Guazzelli.

Raw and derived fields are then used for model training. Typically, only a fraction of the data fields looked at during the analysis phase are used to build the final model.

Once the model is complete, the next task is to test its performance against a test data set. This may last several weeks, depending on the complexity of the problem people are trying to solve.

"When you put a predictive analytic model to work, you usually expect it to do its job for months or years until it needs to be refreshed, most probably because of performance deterioration," said Guazzelli. Then another model is built and deployed in place of the older one.

Without a language such as PMML, deploying predictive solutions would be difficult and cumbersome, as different systems represent their computations in different ways.

"Every time you move a model from one system to another, you go through a lengthy translation process which is prone to errors and misrepresentations," said Guazzelli.

With PMML, the process is straightforward. From application A to B to C, PMML allows predictive solutions to be easily shared and put to work as soon as the model building phase is completed.

"For example, you might build a model in IBM SPSS Statistics and instantly benefit from cloud computing where you can deploy it in ADAPA, the Zementis predictive decisioning platform," said Guazzelli.

Or you can move it to IBM InfoSphere, where it will reside close to the data warehouse, or you can move it to KNIME, an open-source tool for building and visualizing data flows from the University of Konstanz in Germany, said Guazzelli.

This is the power of PMML: enabling true interoperability of models and solutions between applications. PMML also allows IT folks to shield end-users from the complexity associated with statistical tools and models.


  This article was originally published on Monday Nov 29th 2010
Mobile Site | Full Site