Containers and databases seem like an odd mix. Containers offer a lightweight way to run applications, but databases tend to be massive.
Yet Redis, Elasticsearch, MySQL and Postgres are among the Top 10 technologies running Docker containers, according to cloud monitoring vendor Datadog -- and the numbers are growing.
Still just 13 percent of respondents to a recent survey by cloud host Bitnami cited Docker containers as their preferred method of running a database for cloud-based applications.
Container technology isn't new, but Docker packaged a way to solve the "dependency hell" problem in deploying applications, bringing the technology, often considered the next evolution beyond virtual machines, to the mainstream. The Docker name has become synonymous with containers.
Major database players like Oracle, SAP, Google, Amazon and Microsoft have jumped on the container bandwagon, along with many smaller innovators, including MongoDB, MariaDB, Crunchy Data (PostgreSQL) and Joyent, which was recently purchased by Samsung.
One of the upstarts, containerization platform provider Robin Systems takes the position that everything is better in a container. It's been focused on Oracle databases, but is seeing growing interest in running more modern Big Data applications such as Cassandra, Mongo and Hadoop in containers, according to Sushil Kumar, its chief marketing officer.
"Enterprise will not be able to benefit from all containerization has to offer until the 90 percent of traditional applications are brought onto a container platform," he said.
"If you're a business, probably your costs for hardware and software licensing are spiraling out of control. Containers provide the benefits that virtualization promised but never actually delivered, especially for data applications," he added. "Containers maximize [hardware] utilization. Companies want to release new products faster, and that's where containers provide agility. And eliminating spotty performance can increase user satisfaction."
In this article, we cover:
- Why it matters that databases are stateful applications
- Container database benefits
- Container database challenges
- How containers can improve database testing
- Why IT teams must be prepared to support databases in more environments, including containers
Stateless vs. Stateful Applications
Most of the buzz about containers has been about stateless applications, which treat each request as an independent transaction. In contrast, databases are stateful applications that rely on some stored data to perform multiple tasks. Thus, they must maintain a "state" -- which makes containerization trickier.
And while they share some of the benefits and challenges of stateless applications, they also bring their own.
Container Database Benefits
"The bigger applications are performance sensitive, particularly I/O performance-sensitive," said Kumar, noting that hypervisor-based virtualization carries a significant performance overhead.
In an IBM comparison of virtual machines vs. Linux containers running database workloads including Redis and MySQL, containers performed better in every test. The researchers also questioned running containers inside a virtual machine, which is commonly done to provide added security, adding that it only slows performance.
However, the authors wrote, "If containers are to be widely adopted they must provide advantages other than steady-state performance. We believe the combination of convenience, faster deployment, elasticity and performance is likely to become compelling in the near future."
Reduced infrastructure cost
Hosted Elasticsearch provider Qbox, backer of open source container orchestration project Supergiant, found it could reduce its infrastructure costs by 50 percent through containerization. (It considers Elasticsearch a database, though parent company Elastic says it is in some use cases, but not in others.)
It also saw a dramatic decrease in support tickets compared with its dedicated virtual machine model, and it was able to provide customers more flexibility when they needed bursts of compute resources.
Reduced VM sprawl
"With VMs, you end up with a sprawl of operating systems and software that needs to be installed, which creates more operational overhead; the more OS copies you have, the more patching you have to do," Kumar said. "You get performance and much better manageability with fewer machines and operating systems."
Yet in reducing VM sprawl, companies might face container sprawl, which requires more monitoring, according to Torry Campbell at Intel Security. The amount of data coming from those containers might easily overwhelm the security team.
Faster database deployment, easier management
"The ability to spin up an application very, very quickly can happen with databases just as they can with, say, Apache Web Server [in containers]. You can spin up that Postgres database in a container almost instantly as you deploy your app in a new environment," explained Michael Ferranti, vice president of marketing at ClusterHQ, parent company of the open source data volume manager Flocker.
It can be a one-click process with Supergiant, the open source container orchestration project, according to Qbox CEO Mark Brandon.
"You'll have as many nodes as you need, it will autoscale, it will automatically attach and reattach volumes as nodes become available, it will load balance... " he said.
An array of vendors are speeding container database deployment, including Nimbix, a cloud platform provider for high-performance computing, which professes sub-second launch times for databases with its new PushToCompute capability.
With containers, it's easier to spin up a new one than apply a patch or update to an existing one. And if something goes wrong, you can easily revert to an old one. Crunchy Data, which provides the containerized PostgreSQL database, also provides a "crunchy-watch container" in case of failover with a master-slave configuration. If the master cannot be reached, it triggers one of the slaves to take over.
Container Database Challenges
Databases are actually two components: a data store, called a data volume, and a process for reads, writes and querying that data, explained Ferranti.
Most databases write to some type of underlying storage that isn't on the same host as the database application itself.
"That could mean you're running MongoDB or Postgres on AWS EC2 and your storage is on EBS (Elastic Block Storage)," he said.
The issue is maintaining persistent storage. You can store the data on the same host as the process, but then what if that host fails? If you put it in shared storage, if the host fails, you can remap the data volume when you restart the container on another host.
"Especially at scale, host failure is a problem, and you have to figure out how to get the data onto another host. Then the data that the process needs is no longer there," he said, adding that Flocker remaps those connections automatically, which doesn't happen natively.
Container database portability
Robin System, AWS and others maintain it is really easy to move container databases to different environments, but it's essential to maintain ties with the stored data when considering portability. It's trickier than with stateful applications.
Container database security
Less than two years ago, Gartner research director Joerg Fritsch called container security "immature," adding that running a container inside a hypervisor didn't help much.
And a report from network analytics vendor BanyanOps found that more than 30 percent of container images in the Docker Hub contained high-priority security vulnerabilities such as ShellShock and Heartbleed.
In the time since, Docker and others in the container ecosystem have scrambled to shore up container security.
In a July report, however, Fritsch changed his tune:
"Gartner asserts that applications deployed in containers are more secure than applications deployed on the bare OS and, arguably, on a VM. Although containers will not prevent applications from being compromised, they greatly limit the damage of a successful compromise because applications and users are isolated on a per-container basis so that they cannot compromise other containers or the host OS -- as long as a kernel privilege escalation vulnerability does not exist on the host OS."
And a report from UK-based information assurance firm NCC Group states that "from a security perspective, [containers] create a method to reduce attack surfaces and isolate applications to only the required components, interfaces, libraries and network connections."
Meanwhile, Nathan McCauley, director of security at Docker, published a blog post stressing proper configuration while also touting the company's updated security features.
The Container Database Role in Testing
"If you're using Agile development, you need to make a copy for testing of an application or a snapshot of an application, and you make a change and if it doesn't go as expected, you can quickly come back to a pristine state," said Kumar. "But that gets very challenging when you use large volumes of data. How do you take a quick snapshot of data without significantly more storage and waiting hours or days to make the copy? The agility of the application layer needs to extend to the data layer as well."
Containers improve testing of databases by allowing developers to more accurately recreate a consistent environment through development, testing and production. But there is more work to be done in this area, according to Ferranti, including a needed maturation of developer tools.
"I think the next wave of innovation for databases will be around making containers work better for CI (continuous integration) or test environments. Making it easy to take copies of production data, securely share it in a CI environment so you can test against ... a realistic copy of your database. You can detect many more bugs this way."
At this point, data governance policies are among the reasons that is not being done. Companies have to find ways to respect data governance policies to make working with data volumes easier for developers, he said.
More Database Environments
In determining whether a container database is the right choice for your organization, you have to consider your own organization and its capabilities.
The concerns client companies have for container databases are largely the same as for traditional databases, according to Ferranti: How do I back up my data? How do I recover from host failure? How do I get my production data into my Jenkins CI environment?
An "explosion" in the types of databases means an increase in the number of environments in which IT teams must support them, he said.
"They may have one application team using Redis, another using Mongo and a third using Postgres. So they're architecting for AWS-plus one; the 'plus-one' usually means Google or Azure. So there's a matrix of capabilities that people are thinking through."
Susan Hall has been a journalist for more than 20 years at news outlets including the Seattle Post-Intelligencer, Dallas Times Herald and MSNBC.com. She writes for The New Stack and FierceHealthIT, among other publications.