Because data preparation has largely been the responsibility of time-crunched IT organizations, it often creates a bottleneck in data analytics. The problem has gotten lots of attention the past few years, with vendors rolling out products that aim to help business people employ even complex technologies like Hadoop.
But it all started with Microsoft Excel, said Howard Dresner, founder and president of Dresner Advisory Services, the humble office productivity tool that put basic data preparation abilities into the hands of hordes of business people.
Use of Excel has "created a pent-up demand for something better," he said, noting that "Excel is not especially good at data preparation. It presents data in a table format, you can do find and replace, but it is a general purpose tool. It was not designed to be a data preparation tool, even though that is how many of us use it."
According to Dresner's recently published report on end-user data preparation, part of his firm's Wisdom of Crowds series of reports, 63 percent of respondents say end-user data preparation is critical or very important. Sixty-five percent say they constantly or frequently make use of it.
The market for end-user data preparation is growing, Dresner said, as "those that have exhausted the capabilities of Excel or are frustrated by Excel in terms of volume or automation or usability graduate to other tools."
Business people are keenly aware of the need for better end-user data preparation, the report found. Only 12.6 percent of respondents believe their current approach is highly effective. About half believe it is somewhat effective, and more than a third say their approach is either somewhat or totally ineffective.
More Data Sources, More Formats
Newer data preparation tools include visual highlighting, intelligence for creating new columns and the ability to combine multiple data sources, including unstructured data, Dresner said. The latter feature satisfies a growing need, he said, noting that sales and marketing professionals and others "who care about the outside world" want to combine internal and external data sources in a variety of formats in order to glean insights.
Just 11 percent of survey respondents said they never use third-party data sources for their data preparation. Thirty percent do so frequently or constantly, while 33 percent do so occasionally and 25 percent rarely do so. The heaviest users of third-party data are sales and marketing professionals, followed by IT professionals and executive managers.
While respondents are interested in a wide variety of data preparation features, an overarching theme is their desire for tools that work well with their existing business intelligence and analytics environments, Dresner said. "They want to drop their work into a format for Tableau or Qlik or Spotfire without having to go through another process."
Vendors seem aware of this, Dresner said. Some makers of data preparation tools are adding data discovery and visualization capabilities to their products, while providers of data discovery software are beefing up their data preparation capabilities. "It's an interesting time in the marketplace," he said.
In terms of desired features, vendors are improving overall usability and data integration and adding data output options, Dresner said. They are lagging in offering automation capabilities such as automatically generating data transformation code or scripts for execution or providing automated recommendations for data relationships and keys for combining data across multiple data sets and sources.
Many data preparation vendors initially emphasized Big Data before expanding their focus. "They are adding CSV and Excel support, and they will have to support relational databases too," Dresner said. "If I have an existing data warehouse, and I want to combine my data with demographic data from the Internet, for example, you'll have to be able to support all those data sources. You limit your market dramatically if you tell someone the data has to be in Hadoop first."
In addition, some cloud-only data preparation vendors tweaked their products by adding on-premise versions of their product to support users who worked with local data sources on their desktops, he said.
Vendors will continue to adapt to satisfy what he sees as "a necessary progression in the market," Dresner said. "You are more likely to develop insight if lots of people are doing data preparation; not just IT."