Data Architecture Review - Are we right?
Orientation when the possibilities are unlimited
Doing Data Architecture is a hard job. How do you know which systems, components and capabilites are the right ones for your current and future data & analytics needs? What if we decide wrong and don’t get what we imagined, the costs are to high or it is to complex to handle?
“Data architecture is the overall structure of an organization’s data and data-related resources, describing how data is collected, stored, managed, integrated, and used to support business processes and decision-making.”
In Data Architecture we can work with data architecture pattern or paradigms as a general approach. Some typical (not exhaustive) are shown in the following overview:
Fig. 1: Overview of different data architecture patterns
Depending on the use cases, different patterns or combinations are necessary. Some typical indicators are:
Building a Data Warehouse is typically an approach for structured data and classical BI, while a Data Lake is for operational cases and unstructured data. Data Lakes are first choice for data scientists, as every kind of data can be storaged and the degree of freedem and getting data via APIs is standard.
The Data Lakehouse is going to be the leading storage pattern, as BI cases and unstructured data can be managed within a single technical layer.
Combinations make sense, if teams need to work tightly together leading to Multi-Data Warehouse environments or Modern Data Warehouses (an integrated perspective of using a Data Lake/Object Store and a Data Warehouse/relational database as an integrated approach.
Data Mesh and Data Fabric are seen more and more often building on top of data platforms or data storage systems. Both are different ways handling distributed/decentral activities within an organization.
Real-life Use Case
When I get a request to review a data architecture, my customer typically want to have an outside-in perspective. This means I can only build on what I know from the customer, about the technology and what works for other customers from my experience. So first I have to understand the customers needs, general aspects, the way how they come to this data architecture and sometimes also technology- and data-specific aspects for a common understanding. It is always good if you understand the use cases they want to implement in the future - but surprisingly this is not always given.
Doing a review is different to crafting a data architecture. Typically the time is short and there is a focus on certain aspects or two or three options. Furthermore there are typically aspect which are fixed and can not be changed, whatever I will tell them.
The customer hat initially three pattern and decided for the following data architecture to be evaluated:
Fig. 2: Data architecture to be reviewed
So the goal is rather to legitimate the decision and look for optimizations than to create something totally new.
How to Create Value From the Review
Asking the right question is the key. A direct talk is the best to not just get the answer but also understand what is important to them. Where is the pain? What do they want to hear? Where are the limits? Based on this I can describe or recommend about the data architecture and point to consider, changes I would recommend and so on. Some typical data architecture questions to start with from my side are (excerpt):
Which architectural aspects are already fixed and will not change as a result of our recommendations?
To what extent has thought been given to the topic of Enterprise Data Catalog? Databricks Unity and Datasphere Catalog cannot fulfill this task at the moment.
Separate data management concepts (SAP/Databricks) and organization - What about overarching information/analysis requirements?
Users prefer certain frontend - How do you deal with this when overarching requests come in? (e.g. Power BI user wants to access SAP Datasphere data)
Should access from SAP Datasphere to Databricks take place via data federation at runtime or should data be kept redundant on request?
How should requirements in the area of machine learning be handled if overarching data (SAP/non-SAP) is required for this?
How far does the self-service concept go? What are the business departments themselves doing in terms of data management on the platforms, and also with Power BI and SAC?
Results and Challenges
In a review it is often important to highlight the strong points, especially if there is only one option:
Fig. 3: Highlighting the strong points
But there are also important challenges take aways. Just an excerpt:
Multiple data platforms require advanced metadata management, such as the planned Data Catalog solution DataHub, in order to enable end-to-end data governance. This is associated with various challenges:
A data governance approach must be accompanied by clearly defined roles and responsibilities.
The data catalog must be an integral part of the workflows in the data environment.
The technological integration of different metadata sources is often a challenge.
Multiple platforms frequently lead to competing approaches in the front end and back end, often addressed through redundant data storage or undesired cross-connections between systems.
The interfaces between data platforms create additional complexity, which typically results in extra effort and requires a corresponding level of organizational maturity.
Outlook
If there is already a specific variant of the data architecture, it is possible people just want a confirmation that they did their job right. Also typically there is no bad architecture in such situations, it is rather an optimization problem. If you do not have super specific challenges, todays typical components should do the job and the interplay between the components can be subject to change. Don’t forget, the typical customer is not Google, Spotify, Uber or AWS. But such a process to include an outside-in perspective can help to cut out some future problems. This will save time and money for the customer.
Furthermore, we have to understand, that crafting a data architecture based on hard facts. What we often see is captured in the following:
Fig. 4: There are not only objective factors to consider
Getting a look into a data architecture approach and discussing the level especially before a final decision is something special. You go a step back and challenge the general idea instead of speaking about implementation details. The better you understand the objectives of the different stakeholders, the better the result.
This article is relocated and an extended version from my experimental Substack.





