At the core of many modern enterprise apps and services is a fundamental need for data analytics. It's a need that Databricks and the open source Apache Spark project that it leads both help to fill.
It's also a need that a lot of organizations are willing to pay for. On Feb. 5, Databricks announced that it now generates over $100 million in annual revenue. Databricks still wants to grow more, and to that end the company raised a $250 million Series E funding round led by Andreessen Horowitz, Coatue Management, Microsoft, and New Enterprise Associates (NEA). Total funding to date for Databricks now stands at $498.5 million, and the company has a publicly stated valuation of $2.75 billion.
"What’s driving this incredible growth is the market’s massive appetite for Unified Analytics," Ali Ghodsi, CEO and co-founder of Databricks, wrote in a media advisory. "Organizations need to achieve success with their AI initiatives and this requires a Unified Analytics Platform that bridges the divide between big data and machine learning."
Unified Analytics Platform
Apache Spark is widely used as a foundational element in Big Data analytics and processing stacks. The Apache Spark project today benefits from the contributions of over 1,000 developers from more than 250 different organizations. Databricks goes beyond what the core open source project provides, giving organizations what the company has branded as a Unified Analytics Platform.
The core Apache Spark open source platform provides data integration, ETL (Extract, Transform, Load), analytics and real-time data processing capabilities. The Databricks platform adds additional capabilities across the Databricks Workspace and Databricks Runtime components that make up the Unified Analytics Platform.
Databricks Runtime offers the promise of improved read and write performance for cloud workloads. The Runtime component also includes Databricks Delta, which provides advance data pipelines including table snapshotting, schema management and indexing.
The Databricks Workspace component enables organizations to collaborate on data science with interactive notebooks and built-in data visualization capabilities. Security is another core element of the Databricks platform, providing enterprise controls including data encryption, audit logs and compliance features.
Apache Spark 2.4
The last major update for the Unified Analytics Platform came in November 2018, benefiting from new features that landed in the Apache Spark 2.4 update. Among the key features in Spark 2.4 is Project Hydrogen, which is all about improving the fault recovery and performance of machine learning frameworks that are used together with Spark.
"Project Hydrogen is the most recent major initiative with an aim to provide first-class support for popular distributed machine learning frameworks on Apache Spark,” said Reynold Xin, co-founder at Databricks, Apache Spark PMC member and the top contributor to the project.
Sean Michael Kerner is a senior editor at EnterpriseAppsToday and InternetNews.com. Follow him on Twitter @TechJournalist.