Don’t Start Your Integration Project Before Reading This!

Architects of the integration projects commonly consider the following when designing the interfaces:

Capability: the new capabilities being created or the existing capabilities being enhanced by connecting two or more systems

Data:  what are the key data objects which need to be transmitted across the integration channel

Method: the method employed to implement the interfaces; such as SOA via web services, Point-to-point via ETL, etc.

Pattern: whether the interfaces need to be real-time or batch jobs, chatty or bulky, hub-and-spoke or one-to-one, etc.

Technology: The tools used to enable the interfaces; this includes the decision to use in-house or cloud-based technologies

Frequency: Daily, monthly, real-time?

People: the stakeholders and end-users of the end product.


Most often than not, the solution architects and designers will start the project once they have at least high level understanding of all or most of the above.

However, one of the most commonly overlooked areas I’ve observed in many integration projects are Data Quality issues which are dormant when the systems are used in isolation but will inevitably surface up only after the data is already flowing from A to B, and it’s the time when it’s most expensive and challenging to sort them out.


To put this into perspective I’ll give you a real example in one of the recent projects I’ve been involved with. In this instance the CRM system is integrated with the Financial system to enable a new function in the financial system: A capability to report on the income per customer by correlating the invoices raised in the Financial system with the customer data in the CRM.


The interface brings Customer data from CRM to the Financial system and the capability matches the customer with the invoices. The invoices raised in the Financial system and sent to the clients has a key field which is mandatory and it’s “Customer ABN”. The Customer data object in the CRM has the same attribute “Customer ABN” and it’s captured when the Customer record is created in the CRM. This field is brought via the interface from CRM to the Financial system and is used to match customer data with the invoices.


But here’s the catch: “Customer ABN” is MANDATORY in the Financial system when invoices are raised but OPTIONAL in the CRM. This means the person creating the customer record in the CRM can leave the “Customer ABN” blank. Once the interface is built, the project team has realised that the report showing “income per customer” is skewed for an obvious reason in the hindsight: where there’s no “Customer ABN” against the Customer data in CRM, the invoices cannot be matched to the customer. This requires to either change this key attribute between systems OR to make the “Customer ABN” mandatory in the CRM; both options are expensive, the project is delayed and stakeholders are not happy.


This is a very simplified example of numerous Data Quality issues I’ve seen in many projects which could have been avoided if understood from the outset. Obviously the Data Quality issues are not always due to mandatory vs optional attributes. More complicated and less straight forward issues to capture are process-driven issues, user behaviour and lack of training, discrepancy between inter-related modules in a given system and lack of technical hard-controls in the systems to name a few.


Bottom-line: try to allocate some time upfront in the design phase of the project to understand the underlying systems, the data, the process and the people using the system in an effort to capture as many of data quality issues as possible and plan to resolve them as part of the project plan.