AdSysAr

Blog on software engineering

Saturday, November 05, 2005

X-Engineering, Zero Latency Enterprise will put the spotlight on data quality

Article published in DMReview
Introduction

According to James Champy, the co-author of The New York Times bestseller “Reengineering the Corporation” , the market players that can respond to the critical market events faster than their competitors will end up as winners in the emerging new economy. It is safe to assume that most of these market players have already reengineered their business processes within the corporation boundaries to achieve better efficiency. In order to win the next phase in the never-ending market race, they will also need to integrate their business processes with those of their suppliers and business partners. Additionally, the ability to quickly adjust processes to better respond to one’s customers will also become a decisive factor in the new economy.

In this type of economic environment, the latency between the initial market event (any kind of significant disruption to the market status quo) and a response from the integrated process chain cannot take months or even weeks. The winners will have days and sometimes just hours to react to the changes in the supply chain or a new customer trend.

Taking this into account, successful corporations that are aspiring to become winners in the new global race should be thinking about zero latency processing. For instance, in today’s marketplace, a new financial services product usually guarantees its inventor a head start of a few months, typically resulting in substantial financial gains. However, if the other market players can respond within days instead of months, they can practically eliminate the competitor’s advantage of being the first to the market.


X-Engineering will lead to Zero Latency Enterprise
Given that modern business processes rely heavily on information systems, and as market forces keep pushing companies towards faster updates to their business processes, the information systems’ implementation/deployment cycle becomes increasingly more important. One of the more popular approaches -- Zero Latency Enterprise -- encourages the creation of a feedback loop from the analytical (OLAP) side of the tactical, and even possibly strategic decisioning, into the operational (OLTP) systems in order to accelerate the event-response sequence.

As the update cycles accelerate, data quality will become even more important
Traditionally, the quality of data stored in the Enterprise Data Warehouse (EDW) significantly influences the quality of the decisioning process. In turn, the quality of data housed in the Enterprise Data Warehouse is dependent on the quality of data produced by the OLTP systems. With margins contracting more and more every year, it is possible to consider that the difference between success and failure of some significant undertaking may depend on some relatively obscure operational attribute captured by some operational system and then consumed by the EDW. Unfortunately, the more complex the business processes is, the more difficult it is by the OLTP systems to produce high quality data. Further, in the Near Real Time (NRT) Enterprise environment, the OLTP systems should be able to accept changes in the operational parameters that the OLAP / decision-support systems (DSS) produce. In order to support this fast update cycle, a rules-based or a similar fast-deployment cycle technology should be used. The existence of a Near Real Time feedback loop from the OLAP side back to the OLTP side of the enterprise supports very rapid changes to the business process, but at the same time exacerbates any inconsistencies and errors that result when information is transformed and loaded from the OLTP system into the EDW/OLAP side. For example, an erroneous calculation of loan processing costs in the EDW (based on an incorrectly captured operations time) may lead to an automated decision to open this financial product (loan type) to more clients. This decision would be automatically consumed by the appropriate operational systems with the Internet-enabled front end, and may significantly affect the business’ financial characteristics. If it turns out that the calculation was wrong, the net effect may be a substantial loss instead of a hefty profit. The elimination of time-consuming manual steps in the process, while providing a corporation with the ability to respond very quickly to a market event, will at the same time put even more emphasis on decisioning and thus data quality.

Traditional Approach to Data Quality needs improvement
In a classical EDW environment Extract Transform Load (ETL) tools assume responsibility for data extraction from the source systems, as well as transformation, cleansing and loading the data into the EDW/OLAP systems.
At the same time, the OLTP and OLAP developers have to address a rather different set of issues. It is not surprising that there is oftentimes a mental “impedance mismatch” between the OLTP and the OLAP staff that results in disagreements about:
• Push versus Pull (Extract in ETL)
• Data transformation responsibilities and techniques
• Data cleansing approach

Traditionally, IT departments rely on the teams responsible for the EDW/ OLAP processing to address the ETL issues, using the old and unfortunately ineffective creed: “you need it -- you do it”.
This approach does not work well as the cost of loading the OLAP Data stores with the reliable high-quality data, and especially keeping the OLAP Data stores semantically synchronized with the changes in the source OLTP systems, is very high. In the most common scenarios, changes introduced on the OLTP side still require weeks, and sometimes months, to be correctly reflected in the EDW/OLAP systems.

Making it happen
Two factors have come together to change the traditional approach. As I have already pointed out above, there is a demand from business leadership to shorten implementation cycles, as well as the trend of integrating many inter-corporation business processes into one end-to-end, highly-efficient process. On the technology side, the emergence of and advancements in the Service Oriented Architecture (SOA) have created momentum in IT departments towards better understanding, and thus modeling of business processes.

With the SOA advancement, the OLTP side is becoming much better structured: the issues at the syntax and communication protocol level are addressed, and boundaries are now explicit. The asynchronous nature of communication requires understanding, capturing and transmission of time-state information. The advent of SOA is creating a foundation and an industry impetus to start viewing the data issues in a new light, connecting data issues more with the business process management (BPM). While the SOA-way of thinking can itself help, it is not sufficient to address the issue of semantic differences between the source and target systems within the scope of the SOA framework. The differences in semantics are impossible to address without capturing enough contextual information to reason about these differences.
• Timing
• Relationship to the rest of the domain
• Business process-level coordination.

Meet Data in Context.
On my most recent project (for the medium size financial services company) the project team has developed an approach that addresses significant issues that until now were preventing this company, as well as other companies, from realizing benefits of the Near Real Time analytical decision-support technology. The cornerstone of this approach is the creation and the rigorous maintenance of the rich contextual domain model on the OLTP side. The existence of this ontological model eliminates two main predicaments that exist in the way to the zero latency enterprise.

First, the rich contextual OLTP-side model enables and facilitates better understanding of information within the context of the business process, which in turn enables business process integration both within, as well as across the corporation boundaries.

Second, the OLTP side of the enterprise can now output information according to the specs produced by the EDW/OLAP/ DSS side with improved quality and efficiency. While the ETL processes still exist, the cycle of producing the information required by the strategic and tactical decision makers is now significantly shorter.
Decentralized rich OLTP-side Domain Model is the key
A successful OLAP-side approach that capitalizes on a single Enterprise metadata repository has not yet been successfully applied on the OLTP side of an enterprise . This is not surprising given that the business processes, and thus the various OLTP systems themselves, are much more diverse in nature than the more homogeneous OLAP-side systems. For instance, the operational processes in the acquisition department and trading desks are different in scope, use different terminology, and have different key performance indicators.

Data Architecture teams have realized that the diverse business process context may pose a problem on the way to creating a single OLTP-side metadata repository, and have suggested an alternative approach. This approach strips the data of most of its business process context in order to make it easier to correlate data from different processes and OLTP systems.

One example of this common technique is a data dictionary approach. Unfortunately, this approach also does not work too well in the long run: data divorced from its context rapidly becomes more or less useless with the increase in business processes complexity. For instance, a typical Data dictionary for a financial services company would have an Address structure defined. While it may be sufficient for very simple cases, with an increase in business process complexity, data analysts and system developers found themselves dealing with numerous variations of the Address structure: Current Client Residence Address, Property Address, Client Correspondence Address, Third Party Address, etc.

Furthermore, the Address case described above is relatively simple compared to a situation of three- or four- layered hierarchical data structures, i.e. Credit Report and Credit Score. Credit Score, for example, may be aggregated at different levels: a Loan, a Borrower, a Borrower Group, etc., with the Borrower level may further be provided by a number of credit vendors. Credit vendors, in turn, may use different aggregations of Credit Scores from the three Credit Repositories: Experian, TransUnion, EquiFax. Considering that these Credit Repositories in turn may use different scoring models, it quickly becomes apparent how fast information complexity can increase.

The ability to impeccably correlated data from the different OLTP systems across the contexts of their own business processes is absolutely essential in order to solve the data quality problem for conventional, and even more so for Zero Latency Enterprise models,. Unfortunately, quite often due to lack of a well-defined system development process, as well as shortage of analysts with appropriate modeling skills, this correlation analysis (sometimes called ‘mapping”) is left out to the data analysts and developers. The skill set possessed by these groups of professionals, specifically one of physical integration of the OLTP RDBM-based systems, does not lend itself well to the domain/business process modeling problem. Add to this a rather common absence of any metadata-repositories for the business process models that would be available for an average Java or .NET developer , and the result is a status quo of a tightly-coupled physical database systems. This approach makes the entire OLTP side brittle and commonly producing unreliable “polluted” data consumed by the OLAP side. This is typically followed by the never-ending cycle of blame for low data quality made apparent in the extraction, transformation and cleansing steps.

In order to successfully integrated OLTP systems, two main issues need to be addressed.
First, every application, or group of applications, that will be considered as an independent processing entity with well defined boundaries should have a rich meta-information repository. This repository will unambiguously define all the relevant data within the scope of the business processes supported by this system. For instance, I was recently part of an effort where the Domain model had three main parts: Business class model, Business Use Case Realizations, and System Use Case model. No data element would be added to the Domain model unless it was initially called out in the business and system use cases.

Second, a well-defined process should be created and rigorously maintained for information correlation between the metadata repositories of the different areas. Each department should be responsible for the creation of its own Domain model, but the correlation process and the artifacts that capture the results of this process (in our case, we call them Overarching System Use Cases) are the joint responsibility of the departments that are integrating their business processes.

1 Comments:

Blogger Chaves, L. G. said...

Hi Semyon,

Congratulations for the article "Quality Data Through Enterprise Information Architecture".

I would like to suggest in the next article a discussion on a hybrid approach including the techniques of data mining for KDD.

Best Regards, Leonardo.

5:37 PM  

Post a Comment

<< Home