I just followed a discussion about aspects that make data teams fail. In the discussion about these points I got the question “…which points would you see as universal and which ones depend?“ So let’s try:
Unclear KPI ownership - From my experience this is very typicall in many organizations. Typically the project owner or business department originally made the BI project is somehow responsible for the data and KPIs. But people and processes are changing and than ownership become very unclear. Furthermore the department is not the process and ownership of the data is not in one department. But establishing data governance for clarifying ownership and processes is typically already a higher level of maturity.
Data products not mapped to business goals - So the whole thing about data products is to be business owned and creating values by being aligned with business goals. But the concept of data products is blurred. I recommend to clarify first, what is the understanding of a data product from the technical side as for the embedding into the organizational context.
No data dictionary - Similar to the first point “Unclear KPI ownership” often there is no overview of you data and what it means. Often a data catalog could help, but I see them rather in organizations with a higher maturity. There are more simple ways like using your Confluence, Wiki or even Excel to get at least an basic overview of your data assets and what is important to know about.
Unmaintainable data pipelines - There can be many reasons for that. It is typical for data teams to build some technical debts over time, making changes and maintenance slow. Agile modeling and clear platform ownership could help, what is also typically a sign of higher maturity of a data organization.
Lack of automated testing - I assume like unit tests or CI/CD is meant here. It depends a little bit on the technology you use what is possible here and how you deploy. Some organizations need manual processes e. g. for compliance reasons. Nevertheless is a high degree of automation very helpful and a goal to reach for being efficient.
No version control and environment separation - Lately I experienced this especially for organizations making first steps in ML/AI. They start just with a server, a database and a Jupyter notebook, trying to create value from it. It is not uncommon to first show the value of the approach before getting invest into the right infrastructure. Cloud, if allowed, makes things easier here.
Lack of data modeling best practices - Indeed while data modeling is not new, I’ve just been involved into a One Big Table vs. modeling-approaches like Kimbal, Inmon or Data Vault discussion. In a fast data world, sometimes we may have to make compromises. For a sustainable approach data modeling is still the thing holding your data pipelines maintainable and staying flexible enough while having a robust data model for performance and consistency.
No single source of truth - Today a single source of truth is maybe not always necessary, even not always desired or necessary. I lately discussed this here. But still it is a general recommendation if you have no other way to handle your data with the right awareness.
Ill-defined roles and responsibilities - Be it for your data organisation or data governance, in organisations roles are often not defined, badly defined, outdated or politicaly defined. This is for sure a problem of every organization in many areas. How this works depends largely on your company culture and data culture.
No “super users” - Not exacly sure what is meant here ;-)
Such discussions are very popular. I just wrote a blog about Anti-Pattern for Data about. The following picture, described earlier in a blog about versions of truth, is the result from a single workshop where we tried to understand the challenges in the data organization:
This is very typcial. Not everything mentioned by the employees, how they work with data and what challenges they have is equally important. Some things could be accepted. Here e. g. inconsistent figures where accepted for being flexible, fast and rather support decentral business activities. It is important to understand what can be changed and what creates value if you are going to change things.
These are some of my experiences with why data teams and data strategies fail. What do you think about and what is your experience?