By Dan Sholler
Big Data represents one of the biggest opportunities and biggest challenges facing organizations today. There is tremendous expectation that if we could only get our hands on the right data and analyze it the right way, great breakthroughs will result.
This can be true, as we have seen breakthroughs for both B-to-B and B-to-C companies, with retailers leveraging behavioral data to deliver targeted personalized shopping experiences and banks analyzing purchase activities to detect and prevent fraud. Even health care companies are realizing the benefits of Big Data through real-time analysis of data streams from patients' electronic sensors. But there are also some inherent problems with Big Data that prevent businesses from unlocking its true value.
Big Data Problems
For one, Big Data comes from a number of different sources (mobile data, cloud applications, IoT and social data) and this number will continue to expand as more business becomes digital. Complicating matters further, these new data sources are often in unstructured or semi-structured formats. Outside data sets also pose a threat to privacy and regulatory compliance.
When these different characteristics of Big Data vs. traditional data are ignored, organizations end up with data that is either used inappropriately due to its inaccuracy and lack of precision, or is missing context and cannot be interpreted for analysis. If anything about the data or its analysis is not clear to users, then they will not trust that data, especially if the data suggests letting go of long-held beliefs or work practices. And since the point of Big Data analysis is to change the way people work and behave, this lack of trust means the analysis fails to impact the operation of the organization and its people, and delivers little value.
Data Governance Is the Answer
Thankfully data governance can address these issues and help companies achieve the benefits of Big Data.
Data governance offers a sophisticated and systematic approach to managing the deluge of Big Data while ensuring its availability, usability, integrity and security. Data governance gives users of data the information they need to find, know and trust the data, and it gives organizations the capabilities to take care of the data and improve its usability, quality and value.
Most people are only familiar with governance as the enforcement of privacy or other data policy rules, but its true value to the organization goes well beyond regulatory or statutory compliance. Big Data poses the perfect use case for data governance. The tremendous volume and variety of data requiring analytics and the frequent need to combine data across organizational silos necessitates data governance. Good data governance creates trust in the data, and makes it visible and usable to the business user to generate the most value.
So how does an organization get started on this path to reaping Big Data rewards through data governance? Implementing a data governance process is not as daunting as it may seem, but it does require structure and automation. Without implementing some structure regarding how decisions are made about the data, it's impossible to be transparent and have those decisions be visible.
Often, these processes have existed in bits and pieces, but either were not structured and automated and therefore did not scale, or they were confined to a siloed group and were of limited benefit. However, there are proven approaches to building a robust and valuable data governance process. These approaches generate the information users need to create and utilize Big Data analytics.
5 Steps to Implementing Data Governance
Adopt an Overall Data Governance Policy
The first step in governing Big Data is to adopt overall policies about your data. In particular, there must be clear policies around:
- data inventory
- data ownership
- critical data elements (CDE)
- critical big datasets (CBD)
- data quality
- information security
- data lineage
- data retention
Policy management helps you consult the right stakeholders, understand the impact of changes, and formulate and enforce policies that improve efficiency and reduce errors and risk. This improves control over previously hidden or siloed enterprise data and empowers everyone in the organization to go beyond just producing and consuming data to trusting and using the data to optimize value.
Define Data Standards for Critical Big Data Sets
Once the basics of policy are covered, the next step is to define data standards for those Big Data sets that are expected to drive the most value. Starting with the critical data is crucial to effective governance. Critical data elements must be identified, and the metadata and relationships that they have documented.
This is a continual process, partly because there is just too much data to do it all at once, and partly because every time you use data in a new way, you create new meaning. Critical data often includes some structured information from the enterprise, but in today's world things such as chat logs, Facebook data, Twitter feeds and sensor data are often crucial to new, value-added processes and activities.
Ensure Flexibility and Automation
Big Data requires flexibility and automation in order to handle the ever-increasing amount and variety of data. Unlike the rigorous and centralized policies of previous systems, Big Data activities and decisions are spread throughout the organization.
For instance, having automated ingestion functionality in place for the Big Data lake can ensure clean quality data, while also tracking the history and showing the impact of changes to any data standards. Manual processes cannot keep pace with variety and breadth of data, or the new data that is brought into the lake almost daily. It's crucial to automate the data governance with technology.
Build in Analytical Models
Big Data and analytics go hand in hand; it's important for analytical models to be part of your data governance. The data governance approach, and the system that automates it, should have the flexibility to capture information about all aspects of your analytics, from MapReduce jobs to visualizations. Processes need to be simple for the business users and data professionals alike. If business users want to request new analytical models, there must be a simple way to do this, and its status to be monitored. If not, users will not trust the process.
Appoint a Dedicated Data Governance Team
To implement data governance you'll need a data governance team. Organizations need a strong data governance council that includes the chief data officer and executive sponsors from around the organization. Data stewards and subject matter experts must be empowered to constantly augment, improve and enhance the data and its information.
Big Data holds out the promise of competitive advantages to those companies that use it to unlock data about customer behavior and operational efficiencies. However, without data governance, enthusiasm alone for Big Data projects can unleash a big mess of trouble: misleading data, unexpected costs and risk of regulatory violations.
Data governance is a framework for setting data usage policies and implementing controls designed to ensure that information remains accurate, consistent and accessible. It enables organizations to provide a set of information to their users, making it possible for them to leverage the power of Big Data.
Dan Sholler is director of Product Marketing for data governance company Collibra.