Understanding dbt

What I learned from reddits "I'm not getting it...what's the point of DBT?"

Peter Baumann

Mar 17, 2023

From the Seattle Data Guy I learned dbt is “literally just a Jinja templater that runs your SQL in the right order“

Additonally he listed some pros and cons and referenced to this tutorial.

The pros:

It’s Open Source
Comes with version control
Doesn’t require specific skill
Testing build in
Well documented

The cons:

SQL based
Doesn’t have debugging functionality
Dbt is just the T

There is a nice Reddit post about dbt and what ist the point about it. As I have similar questions and no practical experiences I try to summarize what is discussed:

The author starts with this summary but nevertheless understands the need for a tool like dbt:

Modularity and Reusability: DBT allows for the creation of reusable and modular data models, which can be easily shared across different projects and teams. This makes it easier to maintain consistency across the organization and reduces the duplication of effort.
Version Control: DBT integrates with version control systems like Git, allowing data engineers to track changes to the data pipeline over time. This feature is crucial for maintaining data integrity and auditability.
Testing and Validation: DBT includes built-in testing and validation features, which enable data engineers to test data pipelines and ensure that the output is accurate and consistent with the input data.
Collaboration: DBT's collaborative features allow multiple team members to work on the same project simultaneously, which increases productivity and reduces the likelihood of errors.

Now, discussion results in some common insights:

dbt helps to automate a lot of things
It enables good software engineering principles for data pipelining
Documentation during development helping to create data lineage
You still need other tools like
- orchestration e.g. Dagster, Airflow
- CI eg. Jenkins
- Extraction/Load e.g. Meltano
- Data Quality e.g. Interzoid, Great Expectations
Jinja and macros make everything more flexible than SQL with templates
It handles relationships and dependencies between models
Data Analyst and Data Scientist can automate stuff without data engineering dependencies and without having to learn CLI git/version control
Focus on business logic instead on DDL
“Dbt makes sense when you are small-scale and want to write sql transformations in a warehouse. It may not make sense for places that have big data and use spark to transform on a data lake.”

Finally there is also the insight that there is a regular repeated question why dbt is used for. One final insight which is very important is - it depends on your use case if it is useful or not.

Data Strategy in a Nutshell

Discussion about this post

Ready for more?