On their own, enterprise applications and systems are not always straightforward. Writ large, they are complex, integrated environments, full of multiple data formats and structures. You spend a great deal of effort and time to define and maintain diverse data models among these integrated components.
A Canonical Data Model helps reduce that burden significantly — by promoting a standard and consistent data model between connecting components. This article describes a few things to get you started:
The Canonical Data Model (CDM) is a data model with a standard and common set of definitions, including data types, data structures, relationships and rules — all independent of any specific application.
Applications must create and consume messages in this common format when exchanging data between them. A canonical data model is not an amalgamation of all data models. Instead, it is a single universal data model between integrations. CDMs aim to:
The term ‘canonical’ refers to anything that follows a general rule or accepted procedure—aka it’s part of the canon. So, this data model is one that follows these general rules you’ll lay out.
It is essential to know the complexities associated with integrations to understand why CDMs were introduced. Current application architectures consist of several integrations with sub-systems and applications that use different technology stacks or programming languages. Microservices, service-oriented architectures (SOA) and distributed systems are some examples of highly integrated architectures.
Each architecture has a different format, complicating data exchange, data governance and interoperability across integrated applications and systems. A CDM enables all integrations to share a common understanding of the data that passes between them. It minimizes dependencies between integrations, improving data consistency and data governance.
Since you’re already exploring CDMs, you might be interested in these additional topics: data management, data pipelines, data observability, data quality, data normalization and ETL.
Suppose an online learning application integrates with several other sub-systems, like student registration, course enrollment, and a payment system. Each sub-system may maintain client data (student and instructor) in different data types, formats and structures.
For example, a student registration system in Node.js may store information in MongoDB. In the meantime, the main learning application is in Java and stores data in relational databases.
A company can create a CDM with standard data types, formats and structures to integrate this client data across the above-mentioned systems. The CDM can be defined in an agreed-upon format like Plain Old XML (POX), SOAl and JSON. It can include data fields like student name, ID, email address, phone number, etc.
The systems should agree on a common name for each data field. For instance, if one system uses "Student ID" as a field name and another system uses "Student No," both can be mapped to the "Student No" field in the CDM.
The student registration system transforms the student data into the standard format of CDM before sending it to the main application. After receiving data from the registration system, the main application will transform the data into its own format.
A CDM brings many benefits for current enterprise applications integrated with different systems and third-party applications. The following are some key benefits of a CDM:
Suppose your company has three different systems (X, Y and Z) that need to connect with each other. It will require a maximum of six data translations from X-Y, Y-Z, X-Z, and vice versa. If you use a CDM, the maximum number of data translations will also be six.
Without the CDM, you will have to perform more data translations as the number of connected systems increases. So, you can reduce the number of data translations and the burden of maintaining them by using a CDM.
A CDM provides a standard data model across different systems, regardless of their data models. This standardization encourages organizations to maintain consistent:
Furthermore, it results in high-quality data, which helps them make better business decisions. Additionally, the communication between systems will be consistent regardless of the number of integrations that are added in the future.
The CDM is independent of integrated applications and systems, allowing you to implement new integrations easily. It allows organizations to expand their operations with fewer complexities and integration costs. In addition, this flexibility helps them respond to changing business needs faster. It leads to improved responsiveness, business resilience and agility of the company.
Another important benefit a CDM brings is the reduced effort required to maintain translations. Suppose you need to replace, delete or update one integrating system. Without a CDM, you will have to check the data translations of every system that connects to it, which is costly and time-consuming.
In contrast, you only have to check the data translations to and from the CDM when there is a CDM between connecting systems. It allows for easy maintenance of integrations.
The CDM not only maintains the data translations but also makes it easy to maintain the logic between integrated systems. You must check for dependencies between the existing data model and the logic if there is a change for an integration. Then, you should make changes to the logic accordingly. Since the logic is used with CDM, changes to the system do not require changes in the business logic of the integration layer.
Different industries use CDMs to set a standard between data and communications within their diverse applications and systems. The following are some common examples of such CDMs that are in use today.
Healthcare providers, like hospitals and medical laboratories, have systems like patient registration, patient tracking, clinical histories and payment systems. HL7 is a set of standards that defines a common message format to exchange electronic health records between different healthcare applications. Its latest standards include protocols such as HL7 V2, V3, FHIR and CDA.
Clinical Document Architecture (CDA) is one of the primary standards based on XML, specifying the encoding, structure, and semantics of clinical documents. It can include clinical information like medical history, discharge information and special medical reports of patients.
Prior to the CDM, the communication between Microsoft apps was done app by app and integration was hard to maintain, expensive and overall challenging. The Microsoft CDM was introduced to reduce these complexities and enable the integration of different MS apps.
For example, different versions of MS Dynamics 365 store and process the same data differently. The CDM allows these two versions of the apps to match up the data in their own way and easily exchange information between those applications.
OTA defines a common message format to exchange data between travel, tourism and hospitality systems that belong to hotels, airlines, railways, cruise lines and distribution/logistics companies. These companies can use it to enhance the interoperability between their electronic systems.
For example, the industry-standard XML schema of OTA allows airlines to automatically transfer e-tickets to another airline system. Its CDM is an XML schema with a standard format for exchanging data like ticket pricing and reservations. OpenTravel's 2.0, released in 2016, enables exchanging JSON messages with existing XML messages.
OGC defines standards for exchanging geospatial data between geographic information systems (GIS) applications. Its CDM allows exchanging of geographic information in formats like points, lines, polygons, etc. Other industries, like energy & utilities, aviation and emergency response and disaster management use it to improve system interoperability.
Building a CDM involves several steps, from understanding your domain to implementing the CDM by mapping data. The following steps illustrate how to build a CDM for an organization from scratch, using a generic example.
Starting with knowing the connections. (A CMDB might be helpful here.) For example, if your domain is a retail business and you want to build a CDM to exchange customer orders, you must know the connecting systems or applications that store and process customer data.
Additionally, identify the workflows within those systems.
You must know how each system stores the customer order data, what data types and relationships exist, and in what structure and format they store the information. This step helps you identify the common data maintained across different systems.
(Understand various data structures.)
Once you have completed the second step, the next step is defining your CDM by introducing the standard data types, structures and relationships that will serve all the connecting systems.
Next, map all the data of the connecting systems and their relationships to the CDM.
Finally, build the CDM and data translators that help translate the data model of each system into CDM and vice versa.
Canonical Data Model defines a data model with a standard set of data types, structures, relationships, and rules independent of any specific application. It enables easy data exchange between integrating applications and systems, allowing interoperability between them regardless of their technological differences.
Integrating components must create messages in this common format and translate the messages to convert them into their format. Nowadays, a CDM brings many advantages for enterprises, such as reduced data translations, improved data consistency, integration flexibility and business agility. At the bottom line, it reduces the maintenance costs of data translations and business logic.
See an error or have a suggestion? Please let us know by emailing ssg-blogs@splunk.com.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.