As vendors and users testified at last month's In-Memory Computing Summit, the relatively low cost of flash memory is driving databases and apps toward leveraging Fast Data – mobile and sensor cloud data – using systems whose storage is predominantly or even entirely composed of main and flash memory. One use case cited by a presenter employed one terabyte of main memory and one petabyte of flash.
What is driving this shift in databases and the applications that use them?
Increasingly, enterprises are realizing that "almost-real-time" handling of massive streams of data from cars, cell phones, GPS and the like is the new frontier -- not only of analytics but also of operational systems that handle the Internet of Things (IoT). As one participant noted, this kind of real-time data-chewing not only allows your car to warn of traffic ahead, but also to detect another car parked around the corner in a dangerous position.
It gives each customer product "thing" in the Internet of Things adaptability as well as intelligence. On the analytics side, its real-time monitoring allows more rapid feedback and adjustment in the sales process.
Handling this sensor-driven Web quickly is a task dubbed Fast Data by summit participants. And it is a greater technical challenge than handling Big Data, mainly because of the need for "almost-real-time" operational processing.
The decreasing cost of flash memory now makes it possible to handle Fast Data without breaking the bank, though, and new databases designed to take advantage of "flash-only" (such as Redis Labs) are arriving. Increasingly, Fast Data implementations are showing up not only at public cloud providers but also within forward-looking enterprises.
So what are the emerging best practices of these flash-memory database architectures? Space doesn’t permit a full discussion of the smart implementation techniques that are sprouting, but here are five good rules of thumb:
Treat Flash as Extension of Main Memory
This is what I call the "flat-memory" approach. Vendors must do much of the heavy lifting in ensuring that processors treat flash as just a slower version of main memory (as Intel is apparently doing) and flash modules provide new interfaces optimized for random rather than disk accesses. The user should look for vendors who do this best, and design Fast-Data-using apps to view all storage as flat and equally available.
Implement Variable Tiering for Flash
That is, allow flash to sometimes be used for storage, and sometimes for processing, depending on the needs of the application. Vendors such as Redis Labs are in the forefront of providing this.
Understand Tradeoff between Data Correctness and Speed to Process
Specifically, vendors will vary in their ability to "write to storage" without risking data loss, and to optimize processing speed at the risk of some data incorrectness.
Mesh New Database Architecture with Existing Big Data Architectures
Summit participants agreed that the new database architectures simply could not handle the full scope of present-day Big Data analytics – or, to put it another way, Big Data can’t do Fast Data’s job, and vice versa. Frameworks and brokers that parcel out application data between operational and analytical databases are today’s main candidates for good approaches to this problem.
Accept and Plan for Multiplication of Vendor Databases
Get familiar with columnar database technology. Real-world use cases are already adding NoSQL/Hadoop-based databases and columnar databases to handle the operational and analytical sides of Fast Data.
Bottom Line on Fast Data
Vendors are moving exceptionally rapidly to provide the basic technology for the new flash-memory Fast Data architectures. The benefits in real-time analytics and leveraging the Internet of Things are clearly strategic, even before all of the potentialities of the sensor-driven Web have been fully comprehended. So what are you waiting for?
Wayne Kernochan is the president of Infostructure Associates, an affiliate of Valley View Ventures that aims to identify ways for businesses to leverage information for innovation and competitive advantage. An IT industry analyst for 22 years, he focuses on analytics, databases, development tools and middleware, and ways to measure their effectiveness, such as TCO, ROI and agility measures. He has worked for respected firms such as Yankee Group, Aberdeen Group and Illuminata, and has helped craft marketing strategies based on competitive intelligence for vendors ranging from Progress Software to IBM.