As long as we discuss a concept like Data Mesh it is not obsolete like Gartner claims (and mean it will vanish into different separate concepts over time) or even dead how sometimes articles like to title. But where are we now? After Zhamak Dehghani’s launch of Nextdata OS I see an evolution - maybe an evolution where concept and reality approach each other.
Data Mesh is a socia-technical approach coined by Zhamak Dehghani in 2019, for a decentralized data architecture based on the following four pillars:
Based on several Reddit discussions, articles and YouTube talks, the following wraps up my current understanding of the state of Data Mesh, as discussed in the field.
Pro Data Mesh (Positive Aspects and Potentials)
Decentralized Domain Responsibility and Data Ownership: A modified approach where teams own their own datasets and a central team combines them into core tables has been successful. The concept that each department owns their dataset in the data warehouse is highlighted as the "mesh part" that most organizations don't implement but is a positive distinction. This shifts responsibility to those who best understand the data.
Data as Products/Services: Treating data as products is viewed positively. This includes applying software development (SWE) best practices to data products, which is seen as very good. The idea of treating each dataset as a service with its own endpoint is considered advantageous.
Organizational Alignment and Empowerment: There has been a slight positive shift towards federating data and analytics workloads to business/product teams in large enterprises. Data Mesh can empower teams to build their own data use cases and allows for more independence and end-to-end product ownership.
Speed and Scalability (under certain conditions): Data Mesh can work well for companies of a certain size that need high velocity to achieve their goals. It can help make data platforms more scalable and self-serve. By breaking down into smaller units (data products/domains), pipelines can be set up in smaller chunks, which can be easier to manage than a monolithic approach.
Data Maturity and Governance Foundations: Data Mesh is seen as a data maturity journey. The early phase of implementation can lead to increased awareness of the importance of metadata and data discoverability. Tools enabling data lineage (like dbt or Unity Catalog) are seen as important for sharing data between domain teams.
Suitability for Specific Scenarios: Data Mesh is described as the least painful and least error-prone method for very large, siloed companies when done right. It can also work in smaller organizations with around 20 data and analytics-focused individuals by minimizing platform team roles. It can work for large, well-known companies with a genuine need for a less monolithic approach and support certain data strategies.
Tooling Advancements: There are advancements in supporting tools, such as the dlt library, which aims to automate normalization, schema management, data contracts, and declarative loading.
Flexibility and Adaptability: The concept is viewed as "aspirational," meaning organizations can adopt the useful parts without implementing the entire framework.
Autonomous Data Products (according to Zhamak Dehghani & Nextdata OS)
A self-governing, long-running service for a specific business domain's data. It autonomously manages all aspects of its data, acting as its own product and factory, without direct human control. It is the core unit of execution, governance, and interaction. This involves capabilities like self-provisioning, self-orchestrating, and self-governing
Contra Data Mesh (Negative Aspects and Challenges)
Perception as Hype and Academic Concept: Data Mesh is often seen as overhyped, a marketing gimmick, or a buzzword. It is assessed as purely academic or utopian and not practical for most businesses. The hype around it seems to have died down quickly. Gartner is reported to have marked it as obsolete.
High Personnel and Cost Requirements: A major criticism is the need for technically competent personnel in every domain or department, which is seen as a luxury very few, very large companies can afford. Implementation can lead to significantly increased costs, e.g., due to redundant data processing across different teams. It is seen as expensive to set up and operate.
Risk of Data Silos and Redundancy: Despite the goal of interoperability, Data Mesh, especially if poorly implemented or lacking coordination, can lead to data silos. Teams may build similar data products or transform data multiple times, leading to redundancy.
Issues with Data Quality, Accuracy, and Ad-hoc Querying: In one reported implementation, there were issues with poor data quality, inaccuracy, and limited ad-hoc query capability. Without proper governance, it becomes a "data mess".
Organizational Challenges and Lack of Leadership: Implementation requires significant organizational and cultural change. Lack of project coordination, absence of strong technical leads, or weak owners can hinder the enforcement of standards and lead to frustration. Silos, lack of buy-in, and egos can block implementations. Defining domain boundaries is difficult. There can be career issues for decentralized data professionals.
Difficulty with Data Ownership (despite the principle): Although data ownership is a core principle, defining and enforcing it for shared data is difficult in practice. Ownership can be lost when people leave teams.
Not Suitable for All Companies: Data Mesh is seen as unsuitable for small and medium businesses (SMBs). It is often viewed as a solution only for very large, complex, and mature organizations with specific problems. Implementation can be a nightmare if data needs to be siloed for legal reasons.
Tooling Gaps and Immaturity: There are mentions of missing or inadequate tools for certain aspects (e.g., data ingestion). The author believes the necessary technology partly does not exist yet. Data Catalogs to support it require many separate features to be effective.
Comparison to Old Concepts: It is seen as a renaming of older concepts like Data Marts, Hub-and-Spoke, or Microservices.
Domain Ownership
In Data Mesh, Data Ownership is a core principle where responsibility and accountability for data management, modeling, and governance are decentralized to business-aligned domain teams. These teams own their specific datasets and are accountable for their logic and glossary. This approach empowers teams closest to the data, shifting away from centralized control, but relies heavily on robust governance and technical capabilities within those domains for effective implementation
What Data Mesh Governance means
One of the pillars of Data Mesh is “Computational Federated Governance”, where a lot of people are challenged to make that work. What does governance in Data Mesh even means?
The Fundamental Concept of Data Mesh Governance is the application of data governance principles and practices within a Data Mesh architecture. It represents a shift away from traditional, centralized governance towards a more distributed approach.
Federated Governance is the central model for governance in the Data Mesh:
Ownership lies within each individual domain.
Domains jointly decide on matters that need to be governed centrally, while domain-specific issues are handled independently.
The goal is to keep centralized governance to a minimum and delegate as much as possible to the domains.
A key challenge is reaching a consensus on what “federated” actually means.
A central entity is still needed, but its role shifts; it acts more as a guide and facilitator rather than executing the main tasks. The “extended arms” of governance reside within the domains.
Computational Governance is seen as crucial for making governance scalable within the Data Mesh.
It involves creating automation or mechanisms to carry out data management at scale—developing “products for data management.”
The best current example of this is data contracts.
Data contracts are executable and automate tasks such as enforcing access controls at the data source or stopping pipelines when data quality checks fail.
Moving toward data contracts requires appropriate automation, and data quality tooling is considered essential.
Automation is seen as an opportunity to enhance compliance, security, and ethical behavior.
What is Data Contract?
It is a formal, codified, and machine-readable agreement between data producers and data consumers. It defines the structure (schema), quality, behavior, and service level objectives (SLOs) of the data being exchanged. Think of it as a Service Level Agreement (SLA) for data
Data Mesh brought an essential shift in approach:
From “Police” to Collaboration: Governance is no longer seen as the “police” but rather as a collaborative effort with the domains, empowering them.
From Central Knowledge Distribution to Co-Creation: The model shifts from a central body distributing knowledge to a process of co-creation and collaboration with the domains.
Thin Layer of Standardization: Central governance should provide a “thin standardization layer” that works enterprise-wide, while leaving space for domains to handle specific use cases. Security and compliance are areas where a clear, centralized “hardline” is necessary.
Governance as a Product: Governance should be easy to use and well documented. Domains should be treated like customers for whom governance products are developed.
But it also comes with challenges and considerations in Implementation:
Definition and Alignment: Agreeing on what “federated” means is an initial challenge.
Varying Maturity Levels: Domains vary in maturity, requiring a tailored approach.
Balancing Autonomy and Standards: Finding the right balance between domain autonomy and consistent standards is difficult and requires an understanding of business cases and co-creation (e.g., in working groups).
Cultural Shift and Change Management: Organizational culture and change management are absolutely critical. It’s about bringing people on board, explaining the “why,” and empowering the domains (literacy, budget, tooling). This is 90% relationship-building, communication, and change management.
Pragmatic Start: When building a new governance function, start by identifying the problem/pain point to solve. Don’t begin with a generic standard roadmap—instead, align governance efforts with business goals and use cases. Start with a specific use case, demonstrate its value (“governance by stealth”), and expand incrementally.
Cross-Domain Visibility: Visibility is key. Tools (e.g., for data quality) can help automate visibility.
Incentivization: It’s important to create incentives for domains to do the right things (e.g., share data, ensure quality). Visibility (e.g., of data quality checks) can foster healthy internal competition (“carrots”).
New Capabilities: Governance teams need new skills, such as building tools for others or systems thinking.
Some mistakes and lessons learned from the field:
Don’t move too fast: Avoid rushing—it’s a change management process.
Start small: Take an incremental approach and don’t ask for too much at the beginning.
Focus on value: Initiatives shouldn’t take too long—focus on small iterations that quickly deliver value.
Start simple: When beginning to apply governance to data products, start with simple ones and initially avoid sensitive data/PII to reduce complexity.
Recommendations for Successful Data Mesh
Partial or Modified Implementation: Do not try to implement the entire framework from scratch, but adapt the concepts to the specific needs and maturity of the organization. A hybrid approach is often more realistic.
Strong Central Governance and Platform: Build and maintain a strong central platform that provides the necessary tools, standards, and governance rules. Governance, metadata management, and data quality monitoring are critical for success.
Focus on Culture and Organization: Recognize that Data Mesh is primarily an organizational and cultural shift, not just a technical implementation. Invest in change management, foster data literacy, and address silo thinking, lack of buy-in, and egos. Strong leaders are essential to enforce standards and coordinate teams.
Careful Domain Definition: Take the time to carefully define the boundaries of data domains based on business needs and capabilities.
Data as Product/Service with SWE Practices: Focus on the core concept of treating data as products/services, applying established SWE practices (like CI/CD, versioning, common tools and frameworks).
Assess Prerequisites and Need: Only implement Data Mesh if there is a clear need (e.g., in very large, complex organizations with many use cases and a domain-oriented team structure). For smaller organizations or simpler needs, traditional approaches are often better suited.
Investment in Personnel and Skills: Ensure sufficient competent personnel are available, both in the domain teams and the central platform team. Plan career paths for data professionals outside the central team.
Process and Architecture Improvements: Improve data management practices regardless of the chosen architecture. Address potential redundancies and high costs through better platform design and the use of (governed) intermediate or core tables between domains, rather than every team reading directly from sources.
Metadata and Discoverability: Invest in tools and processes for comprehensive metadata management, data lineage, and documenting transformations to facilitate self-service and understanding across domain boundaries.
Domain
A domain in Data Mesh is a specific organizational unit or area within a company that is assigned ownership of its data. These domains consist of local teams responsible for building and managing data products with significant autonomy. They serve as the primary implementers of federated data governance, balancing central standards with domain-specific needs while taking on new responsibilities for data stewardship and product development.
Reference: https://www.datamesh-architecture.com/ | DDD = Domain-driven Design
In summary, Data Mesh is a visionary concept with potential benefits such as organizational flexibility, speed, and data maturity, but in practice, it encounters significant difficulties, particularly regarding cost, personnel requirements, organizational change, and maintaining governance and data quality in a decentralized environment. Successful implementation requires careful planning, strong leadership, a robust central platform, and a clear focus on culture and governance rather than just technology. Often, a hybrid or modified approach is more realistic and beneficial than a full implementation.
Nice summary of where data mesh is today and how to apply it, thanks for sharing!
Thank you for the great summary!
I would be interested if there are established usage scenarios, perhaps with a proven tool stack. In my perception every major supplier claims data products as it‘s playing field, but it is a big step from colourful websites to impactful implementations…