Setting up, configuring and managing a Big Data cluster of Hadoop and Spark applications is no easy task. That's where Altiscale comes into the picture with its Big Data as a Service cloud platform that enables organizations to run their analytics workloads without the need to build out on-premises infrastructure for Big Data.
Altiscale is launching its Data Cloud 4.0 platform this week. It offers updated capabilities such as support for Apache Hadoop 2.7.1, Apache Spark 1.5.0, Apache Hive 1.2.0 and Apache Pig 0.15.0.
Raymie Stata, CEO and founder of Altiscale, explained to Enterprise Apps Today that Altiscale created its own cloud from the ground up, optimized around the needs of Hadoop.
"We have our own data centers in which we put hardware that we have selected, configured and tuned the hardware, networking and software to ensure high performance and reliability for big data," Stata said. "We don't use OpenStack, but we’ve built substantial IP around what we call the 'control plan of Hadoop' – the automation that will allow us to, ultimately, run thousands of Hadoop clusters with a small, high-octane operations team."
Big Data by definition involves large amount of data, which needs to be uploaded from a user to the Altiscale Data Cloud. While transferring large volumes of data might be a concern for some, Stata noted that it's actually fairly easy and straightforward for customers to get data uploaded and in place.
"While customers are always initially concerned about how to transfer their archival data, we get them to quickly realize that it's more important to focus on getting the day-to-day flow of data in place," Stata explained. "Networking has become very cheap, and most customers realize that the day-to-day flow is both simple and cost-effective to get into place."
Once the day-to-day flow of data is figured out, Altiscale will work with customers to figure out how to do the one-off transfer of archival data.
"This we typically solve either over public or private network connections, which can handle a few tens of terabytes, or shipment on secured media for larger amounts. We’ve got a service designed specifically around doing this," Stata said.
Also of note is the fact that the Altiscale Data Cloud meets the emerging ODPi effort that aims to define standards for cross-Hadoop application data compatibility. Stata said that Altiscale is a founding member of the ODPi because he strongly believes that standards will accelerate the spread of Hadoop, and its business value, in the enterprise. By promoting standards, there can be more apps, businesses can adopt them more quickly, and see value. Altiscale Data Cloud 4.0 meets the standards specifications set by the ODPi, which means that any application running on top of an ODPi-compliant Hadoop distribution can easily run on Altiscale.
"It also makes it easier for customers to pick a Hadoop distribution," Stata said. "It's good to have confidence that your apps will run and that you will have access to a broad array of applications."
While ODPi aims to define the standard, Stata emphasized that there has been a huge convergence within the Hadoop community since Hadoop 2.0 shipped in October 2013; the distros are more and more alike.
"But there is lots of confusion in the market about this – many customers believe the distros are very different," Stata said. "Among other things, ODPi will help reduce that confusion, as well as help deal with the remaining areas of incompatibility."
Sean Michael Kerner is a senior editor at Enterprise Apps Today and InternetNews.com. Follow him on Twitter @TechJournalist