Vendors big and small are making news at this week's Strata + Hadoop event as they try to expand their portion of the Big Data market.
Cloudera highlighted a trio of Apache Software Foundation (ASF) projects to which it contributes. Among them is Spark 2.0, which benefits from a new Dataset API that offers the promise of better usability and performance as well as new machine learning libraries.
"Cloudera was the first vendor to offer a commercially supported version of Apache Spark in our Big Data platform," Mike Olson, Cloudera's founder and chief strategy officer, said in a statement. "In the years since then, Spark has become a standard for stream processing and machine learning workloads across the industry."
Cloudera donated the Apache Kudu project to the ASF in 2015 as a columnar data store for Hadoop. The Kudu 1.0 release is now available, providing the ability for fast data scans for analytics. This week Cloudera also formally announced the launch of the Apache Spot effort, which is based on the Open Network Insights platform that enables Big Data security analytics.
Cloudera rival MapR this week announced support for event-driven microservices on the MapR Converged Data Platform.
"Microservices by themselves are great but don’t deliver on their full promise until you have a converged platform that brings the data together," said Anil Gadre, senior vice president, product management, MapR Technologies. "We’ve made it easier for developers to build innovative new converged applications that can help transform a business by providing a competitive advantage that was not possible before."
Pentaho used the Strata + Hadoop event to discuss an expansion of support for Spark on the Pentaho platform. Additionally Pentaho is now integrating with Cloudera Sentry to help secure Hadoop data.
"Our latest enhancements reflect Pentaho’s continued mission to quickly make Big Data projects operational and deliver value by strengthening and supporting analytic data pipelines," Donna Prlich, Pentaho's senior vice president, Product Management, Product Marketing and Solutions, said in a statement.
Alation, which builds a data catalog for collaboration, announced the upcoming release of its Alation Data Catalog 4.0 with Alation Connect. The Alation Connect capability enables queries from popular compute engines, including the Presto SQL query engine and SparkSQL.
"With the introduction of Alation Connect, we catalog queries alongside reports, dashboards and data," Satyen Sangani, Alation's CEO, said in a statement. "Most people access data through views, queries, reports and dashboards; so it’s critical for a data catalog to move beyond an inventory of only physical data assets like tables and files. "
On the standards front, the Linux Foundation's ODPi Collaborative project announced an expansion of the ODPi Runtime Specification 2.0. The ODPi Runtime Specification helps Big Data vendors guarantee a consistent set of base-level expectations across different platforms. With the expanded specification, Apache Hive and Hadoop Compatible Distributed File System support (HCFS) are now being added.
"Through a common specification, we are enabling developers to easily write applications that sit on top of Big Data stacks, lowering the costs of interoperability across systems," John Mertic, director of ODPi, said in a statement. "These compliant applications should need little to no re-engineering to run on other ODPi Runtime Compliant platforms."
Sean Michael Kerner is a senior editor at EnterpriseAppsToday and InternetNews.com. Follow him on Twitter @TechJournalist.