Open Standards for Data Mesh
Current activities and approaches for Data Mesh, Data Products and Data Contracts
Since Data Mesh come to live in 2019 by the great initial article “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh“, the data world accelerated to a federated data management and has met a need.
When searching for ideas how to bridge the gap between theoretical concepts and practical implementation I found many “open” standards in this realm I will try to sort them in the following.
Quantyca started in 2022 with a now extensive Data Product Descriptor Specification in a current draft version 1.0.0 and expand it with a Data Store API Specification and build a Open Data Mesh Platform on top. The specification builds on the following ideas:
We think that the work made by OpenAPI Initiative and AsyncAPI Initiative is great :)
We want to make the learning curve for the Data Product Descriptor Specification as smooth as possible, aligning its definition to the one of other two popular specifications in the software and data engineers community
We think that OpenAPI and AsyncAPI are natural specifications for defining the interface of data product's ports that expose an API endpoint. This specification does not impose the use of any specific standard for the port's interface definition but these two are highly recommended.
“The Data Product Descriptor Specification (DPDS) defines a declarative and technology-independent standard to describe a data product in all its components.”
The goal is described as “The formalization of a standard data product descriptor document through an open specification is useful to enable the implementation of an ecosystem of interoperable data mesh tools.“
It seems Quantyca use it also for there Data Governance and Compliance platform Blindata but in general everything is under the Apache 2.0 licence and is open for a community of contributors with their Open Data Mesh Initiative.
Fig 1: Quantica - Open Data Mesh Initiative
As for sure Data Product is a core element of building a Data Mesh, there are some further initiatives für a Data Product specification.
Similar to Quantyca, italian consultancy company Agile Labs promote “An open specification for data products in Data Mesh“ on GitHub. The specification depends on the design principles
Data Product as an independent unit of deployment
Technology independence
Extensibility
They describe a Data Product Specification and give examples in YAML and CUE. In the structure definition they reference several times to the schemas defined as metadata standard by OpenMetadata. They use and develop it in the context of their solution Witboost and described their motivation in this article and show the usage in the documentation.
Fig. 2: Agile Lab - Data Product Specification
Open Data Product Specification (ODPS) is a further initiative currently with a release candicate of a 3.0.0 version on GitHub. It is described as “a vendor-neutral, open-source machine-readable data product metadata model. It defines the objects and attributes of data products, making data more understandable and accessible.“
Goals for a open specification are seen as follows:
enable interoperability between organizations, data platforms, marketplaces, and tools.
reduce data product metadata conversions and errors between systems and organizations,
increase the speed of designing, testing, and implementing data products.
speed up tools development around data product design, development and management.
enable creation of automated data product deployment with standard methods (DataOps)
There is a commercial-like Website https://www.dataproductbusiness.com/ which includes a nice toolset but seems not to be up to date.
Fig. 3: Open Data Product Specification (ODPS)
I see also vendors e. g. SAP opening their own standards like Open Resource Discovery (ODR) intended as a more agnostic approach but can be used for Data Products.
Next I found at least 3 independent initiatives caring about Data Contracts Standards.
Let’s start with the most extensive ecosystem I can see here bei consultancy INNOQ. I assume the are well known in the Data Mesh world not just for Dr. Simon Harrer’s german translation of Zhamak Dehghani’s Data Mesh book. They also provide and maintain Websites like https://www.datamesh-architecture.com/ and https://www.datamesh-governance.com/.
They also support a Data Product Specification on a own website on a current version 0.0.1 but the Data Contract part shows more progress and is connected with offerings. The Data Contract Specification with the current version 0.9.3 follows these design principles
A free, open, and open-sourced standard
Follow OpenAPI and AsyncAPI conventions so that it feels immediately familiar
Support contract-first approaches
Support code-first approaches
Support tooling by being machine-readable
Fig. 4: INNOQ - Data Contract Specification
The next finding is rather a informative website which I assume is related to The Modern Data Company. At least there is a Website for Open Data Contract describing nicely and compehensive what the topic is about.
They not really give templates and examples but they go end-to-end through the idea of a Data Contract seeing the following goals:
Manageable Data Ecosystem
Concrete and Adaptable Data Pipelines
Bridge between Business Logic and Physical data
Optimizing Data Modeling
Happy Data Engineers
High Data Quality
Fig. 5: The Modern Data Company? - Open Data Contract
Bitol is a Linux Fundation sandbox project. They describe their purpose as “Data contracts should follow an open and extensible standard like the Open Data Contract Standard (ODCS), adopted by many organizations. ODCS leverages YAML and can, therefore, easily be versioned and governed. As a consequence, a data contract is enforceable and actionable by tools and services that follow the standard.“
The GitHub repository has it’s orgin in Paypals Data Product Template, which inspired other initiatives, too. Several blogs are referenced to describe what it is about like Getting started with ODCS.
The approach is described as “A set of data contracts governs each data quantum: the primary data contract defines the relationship between the data quantum and its users. It also describes the interoperable model and SLA (service-level agreement) details. This consumer-oriented data contract can also be called output or user data contract.“
Fig. 6: Bitol - Open Data Contract Standard (ODCS)
I have seen some aspects but not a comparable open standard are "Data Sharing Agreements”. I see them currently discussued in the context of Data Contracts and Data Mesh implementations.
Also Microsoft sees DSA’s in this context for Data Mesh.
So far my exploration of open standards in the context of Data Mesh. For sure there is more to explore and deeper to dive in the future.
What is your experience with open standards for implementing Data Mesh or Data Products? Are you working on your own or participate in a community of one of the described standards? Are there additional standards you recognize or working with? Would be happy to hear from your experience!











