Open source data processing engine Apache Spark continues to win favor in the enterprise, as evidenced by its growing usage among developers who find it easier to use than Hadoop. According to an October survey, standalone deployments of Spark - including those running in the public cloud - now outnumber deployments that also contain Hadoop components.
Some big vendors are throwing support behind Apache Spark, notably IBM, which last year put 3,500 researchers and developers to work on Spark-related projects and opened a Spark Technology Center in San Francisco, following up this year with a second center in India. "These centers serve as hubs for the developer and the data science communities to collaborate and accelerate development on the framework," said Kathryn Guarini, VP, Offering Management, z Systems and LinuxONE, IBM, in an email.
IBM in October also introduced an analytics-as-a-service offering based on Spark.
"Our clients are attracted to Spark's ability to perform federated analytics over heterogeneous data sources, and are adopting the framework in large numbers," Guarini said.
Spark Meets Big Iron
Today IBM deepened its commitment to Spark, rolling out a platform called z/OS for Apache Spark that enables the open source engine to run natively on its workhorse z/OS mainframe operating system, which is especially popular with financial services firms, insurers and retailers.
This means data can be analyzed in place on the mainframe with no need to extract, transform and load (ETL) data and move it elsewhere. This helps organizations perform real-time analytics on their enterprise systems of record, Guarini said, saving them time and money and reducing risk.
The solution, which is available today, supports popular programming languages including Scala, Python, R and SQL. Data abstraction services allow analysts to interact with data in formats they use with traditional databases, such as IMS, VSAM and DB2 z/OS. The services also enable z/OS analytics applications to leverage standard Spark APIs.
In addition, z Systems has established a new GitHub organization for developers to collaborate and build tools around z/OS on Spark. For example, a combination of Project Jupyter and any NoSQL database can provide an extendible data processing and analytics solution.
"As businesses of all sizes transform into real-time digital organizations, they must be able to get a clear picture of all their enterprise data without the excessive time and risk of ETL," said Rod Smith, IBM Fellow, Emerging Internet Technologies, in a statement. "With Apache Spark enabled natively on IBM platforms – now including z Systems – customers can perform analytics alongside the transactional systems that house key data, while drawing contextual insights from other data sources, enabling them to engage with customers and generate revenue in real time."
Spark Platform Partners
IBM also announced that three partners, DataFactZ, Rocket Software and Zementis, will create customized solutions using the new platform.
DataFactZ is working with IBM to develop Spark analytics based on Spark SQL and MLlib for data and transactions processed on the mainframe. According to DataFactZ, the initial focus will be on preventing fraud in the financial industry by isolating and investigating fraudulent transactions in real time with advanced machine learning and streaming technologies.
"We are working with IBM to provide analytics services for z Systems customers that will improve their ability to detect and prevent costly fraudulent transactions before they occur," said Krishna Kallakuri, DataFactZ's CEO, in a statement.
Zementis is complementing its in-transaction predictive analytics offering for z/OS with a standards-based execution engine for Apache Spark that facilitates the deployment of advanced predictive models.
Data virtualization specialist Rocket Software has created Rocket Launchpad, "an engagement model to help organizations develop creative solutions to solve their most challenging data problems," according to a company blog post. Rocket Launchpad will allow mainframe users to try the platform using data on z/OS.