I just have seen a talk between Martin Kleppmann and Jesse Anderson about “Designing A Data-Intensive Future” from GOTO 2023 event. Martin Kleppmann and he said “When you get to large data workload, if you have a single tool that can do everything, the only way that can do that is by doing everything badly.“
I think this is a basic idea where the Modern Data Stack (MDS) evolve from. Over the time many characteristics where added what a MDS must archive:
Fig. 1: Characteristics of the Modern Data Stack
The idea was, to create a Best-of-Breed software architecture for data, which brought together the best fit of data capabilities needed to create maximum of value from data for the enterprise. But is this idea of having always the best tool doing this job is still valid?
While Best-of-Breed is not a new idea, we have seen the pendulum swings from Best-of-Breed to Best-of-Suite in the data world and back again. Currently even the end of the MDS was proclaimed. Why? Because over time several developments happend:
Customers became more price sensitive and data tools became standard in companies. Standards are goal for optimization so it is easier to manage licences, support, etc by one or at least a few reduces complexity and costs.
Differentiation between providers is increasingly disappearing. The evolution of cloud services is fast. From the capability perspective Best-of-Suite solutions like SAP and Oracle catched up. It is more about evolution speed, partnership and details now.
MDS is converging to a Best-of-Suite. I see that especially for Google Cloud with BigQuery in the center of the stack where Google is able to offer a End-to-End approach and simliar for Databricks except the BI part. Snowflake is possibly still the most representative MDS data platform and really celebrating the ecosystem idea with strong partnerships. But they are also growing there own stack with innovations like Streamlit, Snowpipe and Snowpark.
Customers coming from one side shifting over time to a mixed scenario to adapt to the advantages of the other side. Means customer starting e. g. with Databricks on Microsoft Azure adapting more and more Azure services over time as the Suite approach making things easier (integration, less vendors, good enough services, …). Same If you start with a suite but extend for new BI tools, ETL/ELT and other components to complete or compensate.
Best-of-Suite vendors like SAP are embracing data ecosystems with partners like Collibra, DataRobot, Confluent and Databrick to close gaps and offer alternatives for a better adaption of the data stack.
Maybe a short term trend but something I see, is the shifted focus from data to AI with the ChatGPT/LLM hype. Maybe this pendelum will come back when people understand that data is the base for AI and even more for data-centric AI (compared to discriminative/generative AI).
Once I read, the biggest lie of a data architecture picture is that there is only one data platform in the middle. There is at least one old, one new, one from an acquisition, in germany at least you will find a extra one for HR. From a certain size or complexity companies today have several data platforms and redundant tools. It’s the new normal somehow.
The last aspect brings us to a new perspective. We need maybe best of both worlds because every world (Best-of-Suite / Best-of-Breed) have their advantaged (and disadvantages).
Fig 2: Combining characteristics of Best-of-Breed (MDS) and Best-of-Suite (here: SAP)1
So now if we think a little bit further, these worlds are converging into a open ecosystem which not only adapts to the best fit for creating value from data, but also adapt optimal to the operational ecosystem like SAP driven enterprises as a next level of ecosystems.
Fig. 3: Convergence of MDS and Best-of-Suite vendors into a data ecosystem
We thougt deeply about this development in our latest whitepaper about SAP & Modern Data Stack available here2 (in german only at the moment).
An extended excerpt can be found here with Snowflake and here with Databricks.
What do you think or experience where data stacks evolve? Is one vendor delivering everything for you and doing well? Do you build an ecosystem of tools to solve complex problems or is the MDS-approach still the best helping you in your daily data business?