Information Architecture

Since over a decade the omnipresent SOA architecture and ESBs were considered a state of the art when it comes to integration architecture. Still there are lots of organizations where ESBs is in use. If you still have an ESB as a main hub of your integration stack, it is probably time to start considering som newer options. The world moved on and “agile” has also reached the integration architecture.

But before we look at what agile integration is, we need to take a broader look on the integration architecture. An example of that could be the the reference architecture model shown below (Based on IBM Think 2018 presentation: http://ibm.biz/HybridIntRefArch).

Reference architecture for hybrid integrations

The integration architecture patterns ar often divided into three main categories: synchronous, asynchronous and batch integrations. Synchronous integrations are often implemented as http/https or ReST interfaces, asynchronous integrations are mostly different kinds of pub-sub or streaming integrations and finally batch are often referenced to as ETL (Extract Transform Load) or more recent ELT (Extract Load Transform) and are very commonly used in connection with data warehouses, various dataplatforms and data lakes.

With the on march of the cloud technologies, the integration architecture has also more an more adopted cloud as the execution environment and there seem to emerge two main streams on how the integrations are being implemented in the cloud: either as the native PaaS or the “best of suite” iPaaS /iSaaS type of plattform.

The native PasS basically uses the very basic components in one or more of the major PaaS plattforms (AWS, Azure, Google Cloud) Here we talk about components like f.eks. AWS API GW, AWS Kinesis, AWS SNS/SQS, AWS Step Functions, Azure API Manager, Azure ESB, Azure Logic App and so on. The “best of suite” iPaaS/iSaaS is basically a complete integration suite implemented as a SaaS service e.g. like Dell Boomi, Informatica or MuleSoft which often provide a set of adapters for different protocols.

The integration architecture has also evolved over last decade from the infamous centralized SOA architecture and ESB to a more distributed architecture. This evolution has happened and affected three different axises: people, architecture and technology.

In the architecture and technology axis, as the development becomes more and more autonomous, with cloud services, big data and micro service oriented architecture as well as the new ways of running software natively in cloud or in containers, also the integration architecture developed into a more distributed variant. The centralized ESB like plattforms disappear, the integration became either of the point to point type for synchronous integrations or pub-sub and high performance streaming for asynchronous integrations. The integration software itself became more distributed and in some cases also run either in containers or natively in cloud.
Finally as the integration is more distributed and often developed by separate autonomous teams, it is also natural that different integrations are implemented using different technologies and programming language or become what we call polyglot integrations.

Another consequence of this evolution in the integration architecture are changes affecting the people axis. With autonomous teams and distributed integrations there is no longer need for centralized integration teams and the integration resources are now spread over different teams. This means as well that the integration architecture becomes more of a an abstract aspect that has to be taken care of in the organization, often without resources that are explicitly allocated for this task and often without clear ownership. This trend basically follows the same pattern as for the other dimensions of the enterprise architecture including security and information architecture.

The integration architecture also follows another important trend, called domain driven architecture (DDD). DDD is another force that pushes integration architecture from centralized and layer oriented architecture into a more distributed architecture with more tight integrations inside each domain and more loosely coupled integrations with other domains and external services. This makes it possible to reduce complexity of long technical value-chains with unnecessary transformations, increases the ownership of integration artifacts as well reduces the amount of overlapping data that pops up everywhere. Here is an example of Domain Centric Integration Architecture at DNB (presented at IBM Think Summit Oslo 2019)

*Data Centric Integration Architecture *DNB – IBM Think Summit Oslo 2019)*

Process orientation is another important aspect in particular when looking at the digitalization as the process improvements and optimization are possibly the most important areas for driving any business to be more digital. Also integrations need therefore to become more process driven instead of being only technology driven. However the traditional, centralized integration plattforms give little space for adjustments and adaptation to better facilitate the changes in processes and make it therefore difficult to tailor the integrations to fit the improvements in the processes. As the choice of platform is often purely technology driven, once the plattform is selected and implemented it is usually hard to adapt to the actual process. If lucky, the chances are that you have wide enough range of adapters and tools to fit your needs, but there is no guarantee to that.
Cloud based “À la carte” integration platform, where one can pick the most suitable integration components and only pay for the components in use and for the time they are in use, are therefore more suited for process driven integration approach.

The critics would however point out that with the rise of the modern, distributed, autonomous and polyglot integration plattforms we lost some of the important capabilities that e.g. SOA and ESB provided. The integrations are becoming more point-to-point and with that adding more complexity and increase the “spaghetti factor”. There is no longer one place, one system, which hides the complexity and where you can look and see how your portfolio is integrated and see all dependencies. In practice this is not such a big issue and can be solved by either documentation, reverse engineering or self-discovery mechanisms and there are several tools that make this task easier. The point-to-point challenge can also be alleviated e.g by using data lakes and data streaming mechanisms that reduce the need for direct point-to-point integrations, just to mention Sesam (https://sesam.io/) or Kafka (https://www.confluent.io/)

On the other hand one could point out that the new plattforms no longer support several aspects of the traditional ESB VETRO pattern which stands for Validate, Enrich, Transform, Route and Operate (https://www.oreilly.com/library/view/enterprise-service-bus/0596006756/ch11.html)
This is somewhat correct, however with distributed, conteinarized and polyglot integrations it is relatively easy to implement all necessary validations, enrichments and transformations. When it comes to routing, there are several components which can provide similar functionality in Azure (APIM) or AWS (API GW) and also the Operate aspect is more of a task of the autonomous DevOps team that operates the service with its integrations.

Summarizing, the integration architecture has undergone massive changes in several dimensions and evolved from the centralized SOA/ESB platform into a more distributed, autonomous and polyglot architecture. This development has been catalyzed by underlying trends in IT development and architecture, in particular, DevOps and autonomous teams, digitalization and process orientation, cloud, microservices and containerization of the architecture. The result is the integration architecture, which is more flexible and more adaptable both when it comes to the business needs, but also needs of the development organization itself and finally the rise of what we call Agile Integration architecture.

This work excluding photos and pictures is licensed under a Creative Commons Attribution 4.0 International License.

It has been a while since The Economist proclaimed that “data is the new oil” following the tremendous surge of profits of FAMGA – Facebook, Apple, Google, Microsoft and Amazon. Businesses in all kinds of industries from utilities to retail, followed and embarked on this new trend and started hoarding vast amounts of data, strengthening their analytical teams and looking for use cases that make it possible to extract value from data. As it turns out however this isn’t an easy task especially for not typical IT companies.

It does not take a long time to realize that the insights are never better than the underlying data. It is slowly becoming obvious how crucial it is to have in place sufficient control over data quality and information governance.

But first thing first – before you can improve the data quality, you need to understand what data quality means. Data quality isn’t just a single dimensional feature. It is a broad term and often described by a number of dimensions, see e.g. 6 dimensions or data quality worksheet:

completeness – data must be as completely as possible (close to 100%)
consistency/integrity – there should be no differences in the dataset when comparing two different represantations of the same object
uniqueness – avoiding duplication of data
timeliness – whether information is available when it is expected and needed
validity/conformity – data are valid if it conforms to the syntax (format, type, range) of its definition
accuracy – how well the data set represents the real world
traceability – is it possible to track the data origin and its changes

You will need to work with all of these dimensions. It isn’t enough to improve the completeness of the data if the data does not conform to the expected format.

Moreover, there are some profound implications to your organization. Although the quality of data is something that the whole organization should focus on it is natural to focus on teams that actively use the data and are dependent on the quality and suffer most. This usually includes the customer-facing channels, user-facing interfaces or data-warehouse teams that are the first to observe and detect issues related to the data quality. This is often the case when the information governance framework is missing or it is poorly implemented.

Consequently, the information governance framework becomes crucial to ensure sufficient control over data and data quality. Such framework includes both a set of principles, i.e. information governance principles, which are established and supported by the organization as well as new roles to enforce it. Moreover, there is often a need to establish or strengthen the data culture, focus on data quality and a right mindset to both ensure that the quality issues are corrected at the origin and not where they manifest themselves.

The information governance organization itself can operate under a number of different models, i.e.:

IT driven
IT takes care of everything, storage, processing and processes that secure high quality, structuring and catalogization of data
business driven
IT only provides the storage infrastructure, business is in charge of processes to secure data quality
hybrid model
IT driven in some domains, business driver where it makes most sense, probably the most pragmatic approach

The process of improving your information architecture and information governance framework isn’t so complicated, but it requires some effort and a huge amount of patience as it is primarily an organization and culture change.
In order to improve the information governance framework and as a result improve data quality, you will need to get through at least the following steps:

Get an overview of the information architecture and create/improve data models
You need to know the current state of the union when it comes to what are the most central information entities, how the information is modeled, used and transferred between different parts of your organisation.
Get an overview over pain points in data quality
You need to know the actual data related issues that your organisation currently experiences. Without proper insights you are unable to improve the data quality. You need to talk to the business, talk to people around to get enough insights and understanding of most critical data related issues they deal with.
Create an initial set of governance principles
Establish the initial governance framework, first of all by creating and describing a set of principles for Information Architecture, Enterprise Information Architecture as well as principles for data analytics and advanced analytics. Get sufficient backing in the organization.
Adjust the organization, create new roles and responsibilities including roles like information owners, information stewards, data stewards, data scientists, and other roles (see e.g. IBM Redbook, IA governance)
Finally, consider and introduce new tools and technologies for managing the information
Depending on the results of previous steps and needs of your organisation you may need to consider new tools for better control of your master and reference data. The most obvious one is a Master Data Management system. A Master Data Management system makes it possible to reduce manual operation on master data, coordinate master data between different systems and keep it aligned as well as detect any deviation from the data model.

Although it is very tempting to jump on and start implementing new, exciting use cases for AI/Machine Learning, the actual value of this technology is completely dependent on the underlying data quality and other aspects of information architecture. Data quality and proper information governance are crucial, basic aspects. Without them the vast amounts of data that you spend lots of effort gathering becomes not oil, but garbage with little value.

Category: Information Architecture

The rise of agile integration architecture – from centralized SOA/ESB to distributed autonomous polyglot integration architecture

Capitalizing the value of your data, get your basic in place first!