Matthias Niehoff
Enjoying SQL data pipelines with dbt
#1about 1 minute
The challenge of managing traditional SQL data pipelines
Traditional data pipelines often rely on unstructured Python glue code and notebooks, making them difficult to maintain and extend.
#2about 4 minutes
Introducing dbt for structured SQL data transformations
dbt is a command-line tool that brings software engineering principles to the transformation layer of ELT, allowing you to build data pipelines with just SQL.
#3about 1 minute
Setting up a dbt project and defining data sources
A walkthrough of a dbt project structure shows how to define raw data sources and their associated tests using a `sources.yml` file.
#4about 2 minutes
Tracking data changes over time with dbt snapshots
The `dbt snapshot` command provides a simple way to capture historical changes in your source data by creating slowly changing dimension tables.
#5about 3 minutes
Using seeds for static data and running models
Use `dbt seed` to load small, static datasets like country codes and `dbt run` to execute SQL models, which can be modularized with Jinja macros.
#6about 3 minutes
Generating documentation and visualizing data lineage
dbt automatically generates a web-based documentation site from your project's metadata, including a complete, interactive data lineage graph.
#7about 1 minute
Implementing and running data quality tests
The `dbt test` command executes predefined and custom SQL-based tests to ensure data integrity and quality throughout your pipeline.
#8about 4 minutes
Applying software engineering practices to data pipelines
dbt integrates with standard developer tools like pre-commit hooks for linting, CI/CD for automated testing, and profiles for managing environments.
#9about 4 minutes
Exploring the dbt ecosystem and key integrations
The dbt ecosystem includes packages for extended testing, orchestration tools like Airflow, visualization layers like Lightdash, and integrations with analytical databases like DuckDB.
#10about 1 minute
Addressing the extraction and loading phases of ELT
While dbt focuses on transformation, tools like Airbyte, Fivetran, or custom scripts are used to handle the initial extraction and loading of data into the warehouse.
#11about 2 minutes
Understanding dbt's core benefits and limitations
dbt excels at simplifying data transformation with code-based practices but is not a tool for data ingestion, a full data catalog, or a no-code solution.
#12about 1 minute
Q&A: Raw data formats and comparing dbt to Spark
Answering audience questions clarifies the strategy of loading raw data as-is and positions dbt as a simpler, SQL-focused alternative to complex systems like Apache Spark.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
01:54 MIN
The growing importance of data and technology in HR
From Data Keeper to Culture Shaper: The Evolution of HR Across Growth Stages
04:57 MIN
Increasing the value of talk recordings post-event
Cat Herding with Lions and Tigers - Christian Heilmann
05:12 MIN
How to build structure and culture without killing agility
From Data Keeper to Culture Shaper: The Evolution of HR Across Growth Stages
02:20 MIN
The evolving role of the machine learning engineer
AI in the Open and in Browsers - Tarek Ziadé
01:32 MIN
Organizing a developer conference for 15,000 attendees
Cat Herding with Lions and Tigers - Christian Heilmann
03:07 MIN
Final advice for developers adapting to AI
WeAreDevelopers LIVE – AI, Freelancing, Keeping Up with Tech and More
02:39 MIN
Establishing a single source of truth for all data
Cat Herding with Lions and Tigers - Christian Heilmann
05:17 MIN
Shifting from traditional CVs to skill-based talent management
From Data Keeper to Culture Shaper: The Evolution of HR Across Growth Stages
Featured Partners
Related Videos
Modern Data Architectures need Software Engineering
Matthias Niehoff
How building an industry DBMS differs from building a research one
Markus Dreseler
Python-Based Data Streaming Pipelines Within Minutes
Bobur Umurzokov
Data Science on Software Data
Markus Harrer
Fully Orchestrating Databricks from Airflow
Alan Mazankiewicz
Say goodbye to building boring APIs with Azure Data API Builder
Sander ten Brinke
Empowering Retail Through Applied Machine Learning
Christoph Fassbach & Daniel Rohr
Tomorrow's cloud data platforms - fully managed database-as-a-service (DBaaS)
Gregor Bauer
Related Articles
View all articles


.gif?w=240&auto=compress,format)
From learning to earning
Jobs that call for the skills explored in this talk.

Deutsche Wohnen AG
Berlin, Germany
Remote
Azure
T-SQL
Python
Data Lake
+4

La Collective
Canton de Nantes-1, France
Remote
Intermediate
GIT
Python
Data analysis
Continuous Integration

Smart Future Campus GmbH
Berlin, Germany
ETL
JSON
Azure
NoSQL
Data analysis

Smart Future Campus GmbH
Darmstadt, Germany
Remote
ETL
JSON
Azure
NoSQL
+2

Smart Future Campus GmbH
Sankt Augustin, Germany
ETL
JSON
Azure
NoSQL
Data analysis

Smart Future Campus GmbH
Aschaffenburg, Germany
ETL
JSON
Azure
NoSQL
Data analysis

Smart Future Campus GmbH
Düsseldorf, Germany
ETL
JSON
Azure
NoSQL
Data analysis

Smart Future Campus GmbH
Bamberg, Germany
ETL
JSON
Azure
NoSQL
Scrum
+1

Smart Future Campus GmbH
Falkensee, Germany
ETL
JSON
Azure
NoSQL
Data analysis