The Data Fabric ensures free access to distributed data sets. It thus enables various roles such as data analysts, data scientists, developers and end users to find the right data more quickly and drive innovation with data. The data catalog can be considered the starting point and an integral component of a Data Fabric. It uses metadata to control the semantics, transparency and findability of the data.
With the help of knowledge graphs, the use of operational metadata to utilize data and machine learning, an optimal degree of automation, orchestrability and uniform semantics can be achieved in distributed and heterogeneous data landscapes.
Depending on the functional scope, the data catalog can play a central role in making metadata usable.
Fig. 1: Basic pattern in the context of data fabric architectures
Ideally, the topic of data governance plays an important role in the context of a Data Fabric. The decentralized approach can lead to a strong distribution of responsibilities. Finding, accessing and understanding data must be linked to roles, processes and responsibilities in order to comply with legal and organizational guidelines and protect sensitive data from unauthorized access. The data catalog offers the ideal conditions for setting suitable roles for the data sets and objects and linking these to processes such as change requests or access approvals. These processes can be supported by internal data marketplaces, which further improve findability and access for users.
The data catalog as a metadata hub
In practice, it is challenging to find a solution or at least a provider that fully offers the necessary capabilities for a data fabric. A possible scenario with an integrated solution could look like this:
Fig. 2: Scenario of a Data Fabric approach
In order for a data catalog to meet the complex requirements of a data fabric, it must offer a corresponding level of openness. This allows it to develop into a metadata hub that collects and provides metadata for optimizing the data landscape. The data fabric can thus represent a bridge for centralized control of decentralized, often historically based, analytical data management.
This blog is part of the blog series Data Catalogs in different Data Architectures.