It was a big week for Big Data, with multiple vendors making announcements at this week's Hadoop Summit in San Jose.
Among the big multi-vendor announcements in the Hadoop ecosystem was one from Linux Foundation ODPi (Open Data Platform Initiative) involving the ODPi Runtime specification, an open standard that covers HDFS, YARN and MapReduce components of a Hadoop deployment and specifies how those components should be configured. The basic idea is that by using the same set of technologies, configured in the same way, multi-vendor interoperability is possible.
Move Toward Big Data Standardization
To that end, Hadoop distributions from Altiscale, ArenaData, Hortonworks, IBM and Infosys are now all assessed to be ODPi Runtime Compliant. According to the Linux Foundation, this ensures that Big Data vendors provide a consistent set of base-level expectations as defined by the ODPi Runtime specification.
The ODPi Runtime specification compliance is a self-assessment effort that is publicly available online at: https://github.com/odpi/self-certification-reports
"Having Altiscale, ArenaData, Hortonworks, IBM and Infosys declare compliance with the ODPi Runtime Specification is a strong step toward simplifying and standardizing the Big Data ecosystem to accelerate the delivery of business outcomes," said John Mertic, director of program management for ODPi, in a statement.
Moving forward, the ODPi project is working on the ODPi Operations Specification to further expand Big Data standardization efforts.
Filling Data Lakes
Business intelligence provider Pentaho announced its "filling the data lake" blueprint for Hadoop that aims to define a process of ingesting data into Hadoop data repositories, sometimes referred to as data lakes.
"A major challenge in today’s world of Big Data is filling Hadoop data lakes in a simple, automated way," Chuck Yarbrough, senior director of Solutions Marketing at Pentaho, a Hitachi Group Company, said in a statement. "Our team was passionate about identifying repeatable ways to accelerate the Big Data analytics pipeline and have developed an approach to drive more agile and automated Big Data analytics at scale."
Another Big Data vendor making news this week is Hortonworks, which announced its Hortonworks Data Platform (HDP) 2.5 update. A key area of enhancement in HDP 2.5 is security. With HDP 2.5 the Apache Atlas project is being leveraged to help classify and tag data, with metadata tags that can be enforced via the Apache Ranger security control project. Additionally data analytics gets a boost, with the Apache Zeppelin project that delivers a web-based notebook for interactive data analytics.
Sean Michael Kerner is a senior editor at Enterprise Apps Today and InternetNews.com. Follow him on Twitter @TechJournalist.