Machine Learning – NimbleMind

It has been a while since The Economist proclaimed that “data is the new oil” following the tremendous surge of profits of FAMGA – Facebook, Apple, Google, Microsoft and Amazon. Businesses in all kinds of industries from utilities to retail, followed and embarked on this new trend and started hoarding vast amounts of data, strengthening their analytical teams and looking for use cases that make it possible to extract value from data. As it turns out however this isn’t an easy task especially for not typical IT companies.

It does not take a long time to realize that the insights are never better than the underlying data. It is slowly becoming obvious how crucial it is to have in place sufficient control over data quality and information governance.

But first thing first – before you can improve the data quality, you need to understand what data quality means. Data quality isn’t just a single dimensional feature. It is a broad term and often described by a number of dimensions, see e.g. 6 dimensions or data quality worksheet:

completeness – data must be as completely as possible (close to 100%)
consistency/integrity – there should be no differences in the dataset when comparing two different represantations of the same object
uniqueness – avoiding duplication of data
timeliness – whether information is available when it is expected and needed
validity/conformity – data are valid if it conforms to the syntax (format, type, range) of its definition
accuracy – how well the data set represents the real world
traceability – is it possible to track the data origin and its changes

You will need to work with all of these dimensions. It isn’t enough to improve the completeness of the data if the data does not conform to the expected format.

Moreover, there are some profound implications to your organization. Although the quality of data is something that the whole organization should focus on it is natural to focus on teams that actively use the data and are dependent on the quality and suffer most. This usually includes the customer-facing channels, user-facing interfaces or data-warehouse teams that are the first to observe and detect issues related to the data quality. This is often the case when the information governance framework is missing or it is poorly implemented.

Consequently, the information governance framework becomes crucial to ensure sufficient control over data and data quality. Such framework includes both a set of principles, i.e. information governance principles, which are established and supported by the organization as well as new roles to enforce it. Moreover, there is often a need to establish or strengthen the data culture, focus on data quality and a right mindset to both ensure that the quality issues are corrected at the origin and not where they manifest themselves.

The information governance organization itself can operate under a number of different models, i.e.:

IT driven
IT takes care of everything, storage, processing and processes that secure high quality, structuring and catalogization of data
business driven
IT only provides the storage infrastructure, business is in charge of processes to secure data quality
hybrid model
IT driven in some domains, business driver where it makes most sense, probably the most pragmatic approach

The process of improving your information architecture and information governance framework isn’t so complicated, but it requires some effort and a huge amount of patience as it is primarily an organization and culture change.
In order to improve the information governance framework and as a result improve data quality, you will need to get through at least the following steps:

Get an overview of the information architecture and create/improve data models
You need to know the current state of the union when it comes to what are the most central information entities, how the information is modeled, used and transferred between different parts of your organisation.
Get an overview over pain points in data quality
You need to know the actual data related issues that your organisation currently experiences. Without proper insights you are unable to improve the data quality. You need to talk to the business, talk to people around to get enough insights and understanding of most critical data related issues they deal with.
Create an initial set of governance principles
Establish the initial governance framework, first of all by creating and describing a set of principles for Information Architecture, Enterprise Information Architecture as well as principles for data analytics and advanced analytics. Get sufficient backing in the organization.
Adjust the organization, create new roles and responsibilities including roles like information owners, information stewards, data stewards, data scientists, and other roles (see e.g. IBM Redbook, IA governance)
Finally, consider and introduce new tools and technologies for managing the information
Depending on the results of previous steps and needs of your organisation you may need to consider new tools for better control of your master and reference data. The most obvious one is a Master Data Management system. A Master Data Management system makes it possible to reduce manual operation on master data, coordinate master data between different systems and keep it aligned as well as detect any deviation from the data model.

Although it is very tempting to jump on and start implementing new, exciting use cases for AI/Machine Learning, the actual value of this technology is completely dependent on the underlying data quality and other aspects of information architecture. Data quality and proper information governance are crucial, basic aspects. Without them the vast amounts of data that you spend lots of effort gathering becomes not oil, but garbage with little value.

Category: Machine Learning

Capitalizing the value of your data, get your basic in place first!