Guides

dbt Interview Questions and Answers (Models, Materializations, Tests)

dbt interview questions and answers grouped by level. Cover models, materializations, ref(), tests, snapshots, and project structure with sample answers.

Practical guideInformational10 min read
dbt Interview Questions and Answers (Models, Materializations, Tests)

Put this into action

Turn this guide into better conversations with Articuler

Use this guide as the research layer, then turn the next step into a live networking workflow: search by intent, prep for the conversation, and send outreach that is built for replies.

Try the Articuler workflow

dbt (data build tool) shows up in almost every modern data engineering and analytics engineering interview. If a team uses Snowflake, BigQuery, Databricks, or Redshift, odds are they transform data with dbt. So they want to know you understand models, materializations, ref(), tests, and how to structure a project that scales.

This guide groups the most common dbt model interview questions by level: junior, mid, and senior. Each comes with a concise, technically correct sample answer you can adapt. Read the answers, then say them out loud in your own words. Memorized lines sound memorized.

Here is what tends to get tested, and roughly when.

Topic areaWhat it testsTypical level
What dbt is, the T in ELTYou understand where dbt sits in the stackJunior
Materializations (view, table, incremental, ephemeral)Tradeoffs between cost, freshness, and speedJunior to mid
ref(), source(), the DAGHow dbt builds lineage and run orderJunior to mid
Tests (generic and singular)You ship data quality, not just SQLMid
Jinja and macrosYou can write DRY, reusable transformationsMid to senior
Incremental strategies and snapshots (SCD)You handle large tables and historySenior
Project structure (staging, intermediate, marts)You design for a team, not just yourselfSenior

Junior dbt interview questions

These check that you understand the fundamentals. Be crisp.

What is dbt and where does it fit in the data stack?

dbt is a transformation tool. It handles the "T" in ELT. You load raw data into a warehouse first, then dbt runs SQL SELECT statements to model that raw data into clean, tested, documented tables and views. dbt does not extract or load data, and it does not store data itself. It compiles your SQL and runs it inside your warehouse, then manages the dependencies, tests, and documentation around those models.

What is a dbt model?

A model is a single .sql file that contains one SELECT statement. dbt takes that SELECT, wraps it in the right DDL (like create table as or create view as), and builds the object in your warehouse. The file name becomes the model name. You never write CREATE or INSERT yourself; dbt generates that based on the model's materialization.

What are the four materializations, and when do you use each?

This is the most common dbt question. The official materializations docs define four built-in types:

  • view (the default): dbt rebuilds the model as a view with create view as. No data is stored and it always reflects the latest source rows, but it can be slow to query if the transformation is heavy or stacked on other views. Start here.
  • table: dbt rebuilds the model as a table with create table as on every run. Fast to query, but the full rebuild can be slow and new source data only appears after the next run. Good for models that BI tools hit often.
  • incremental: dbt inserts or updates only the records that changed since the last run instead of rebuilding everything. It saves a lot of build time on large tables but needs extra config. Reach for it when full rebuilds get too slow, not before.
  • ephemeral: the model is not built in the warehouse at all. dbt inlines its SQL into downstream models as a common table expression (CTE). Good for light, reusable logic used by only one or two models. The tradeoff: you can't query it directly and it's harder to debug.

What does `ref()` do?

ref() is how one model references another. Instead of hardcoding a schema and table name, you write {{ ref('stg_orders') }}. dbt does two things with that: it resolves the correct database and schema at run time, and it uses the reference to build the dependency graph so models run in the right order. The ref function docs describe it as the most important function in dbt, and that is fair.

Mid-level dbt interview questions

Now they want to see that you can build reliable, maintainable transformations.

What is the difference between `ref()` and `source()`?

You use ref() to point at another dbt model and source() to point at raw upstream tables that dbt did not create. You declare sources in a YAML file, then reference them with {{ source('jaffle_shop', 'orders') }}. Both feed the same dependency graph, so dbt tracks lineage end to end: from raw source, through staging models, all the way to your marts.

What is the DAG and why does it matter?

The DAG is the directed acyclic graph of your models. Every ref() and source() call adds an edge. dbt reads this graph to figure out build order: a model only runs after everything it depends on has run. It also powers documentation, lineage visualization, and selective runs. For example, dbt build --select stg_orders+ builds stg_orders and everything downstream of it.

What are dbt tests, and what is the difference between generic and singular tests?

Tests are assertions about your data. dbt has two kinds, covered in the data tests docs:

  • Generic tests are parameterized and reusable. You declare them in YAML against a column. dbt ships four out of the box: unique, not_null, accepted_values, and relationships (referential integrity, like a foreign key check). These should be the bulk of your test suite.
  • Singular tests are one-off SQL files in the tests/ directory. A singular test is just a SELECT that returns the rows that violate your assumption. If it returns zero rows, the test passes.

A good answer mentions both, plus the fact that a test passes when it returns no failing rows.

What is Jinja and why does dbt use it?

Jinja is the templating language dbt runs your SQL through before sending it to the warehouse. It is why you can write {{ ref(...) }}, loops, if conditions, and variables inside what looks like plain SQL. The Jinja and macros docs frame it well: Jinja turns your dbt project into a programming environment for SQL, so you can do things SQL alone can't.

What is a macro?

A macro is a reusable piece of SQL logic written in Jinja, like a function. You define it once in the macros/ folder and call it from many models, tests, or other macros. Macros keep your project DRY. A common example is a cents_to_dollars macro you apply across every monetary column instead of writing the same division everywhere.

Senior dbt interview questions

Senior questions test judgment: scale, history, and design for a team.

How does an incremental model actually work, and what are the strategies?

An incremental model only processes new or changed rows. On the first run it builds the full table. On later runs, dbt wraps your logic in an is_incremental() block where you filter to recent records, usually by comparing against the max timestamp already in the table. You set a unique_key so dbt knows how to handle updates versus inserts.

Common strategies include append (insert only, good for immutable event logs), merge (insert new rows and update existing ones by unique_key, the default on warehouses that support it), and delete+insert. You also configure on_schema_change to control what happens when source columns are added or removed: ignore, fail, append_new_columns, or sync_all_columns. The honest caveat to mention: incremental models add complexity and can drift from a full rebuild, so you schedule periodic full refreshes with --full-refresh.

What are snapshots, and how does dbt handle slowly changing dimensions?

Snapshots record changes to a mutable table over time. They implement Type 2 slowly changing dimensions: instead of overwriting a row when, say, an order's status changes from pending to shipped, dbt keeps both versions and tracks when each was valid.

dbt adds meta columns automatically: dbt_valid_from, dbt_valid_to (null for the current version), dbt_scd_id, and dbt_updated_at. There are two change-detection strategies. The timestamp strategy uses an updated_at column and is preferred because it tolerates schema changes. The check strategy compares a specified list of columns (or all) and is for tables without a reliable updated_at. The key best practice: make sure your unique_key is genuinely unique, or the snapshot history breaks.

How do you structure a dbt project so it scales across a team?

I follow the standard three-layer pattern from the how-we-structure guide, which moves data from source-conformed to business-conformed:

  • Staging (stg_<source>__<entity>): one model per source table. Light work only, like renaming, recasting, and basic cleanup. These are the atomic building blocks and are usually materialized as views.
  • Intermediate (int_<purpose>): purpose-built steps that join, pivot, or aggregate staging models. They keep complex logic out of the marts and prevent duplicated transformations.
  • Marts (<entity>): the final, business-defined tables your stakeholders and BI tools query. Often materialized as tables or incremental models for speed.

The principle behind it is that each transformation lives in exactly one place. That modularity is what keeps a project maintainable as the team and the DAG grow.

What are some dbt best practices you actually follow?

Keep staging models thin and one-to-one with sources. Reference everything with ref() and source(), never hardcoded names. Add unique and not_null tests on every primary key. Start with views and only reach for incremental or ephemeral once runs get slow. Document models in YAML, and lean on dbt build, which runs models, tests, snapshots, and seeds together in DAG order.

Beyond the SQL: who is actually interviewing you

You can have flawless answers on materializations and still do better if you know who you are talking to. Most candidates send applications into a portal and wait. A stronger move is to find the data lead behind the role and prepare for them specifically.

That is where Articuler helps. Instead of apply-and-pray, you use intent-based search across 980M+ professional profiles to find the actual person who owns the data platform you would be joining, then build a Playbook to prep for that interviewer: their background, the stack they have written about, the questions they tend to ask. A short, specific note to the right person beats another resume in the stack.

For broader prep, pair this with our data engineer interview questions and data analyst interview questions guides, and if you are early-career, our roundup of entry-level data engineer jobs.

Next step

Use Articuler to act on what you just read

Start with one concrete goal: investor intros, sales prospects, event meetings, hiring-manager outreach, or expert conversations. Articuler turns that goal into people, prep, and messages.

Start networking with intent

FAQ

Do I need to know dbt Cloud or just dbt Core for an interview?

Most interviews test dbt concepts that are identical across both: models, ref(), materializations, and tests. dbt Core is the open-source CLI and dbt Cloud adds a hosted IDE, scheduler, and CI. Know the concepts cold; mention dbt Cloud features only if the job description names them.

What is the difference between `dbt run` and `dbt build`?

dbt run executes your models. dbt build runs models, then their tests, plus snapshots and seeds, all in DAG order, stopping downstream models if an upstream test fails. dbt build is the more complete command and what most teams use in production.

Should an incremental model be my default materialization?

No. Start with views, move to tables when query speed matters, and only switch to incremental when full rebuilds get too slow or expensive. Incremental models add real complexity, so use them when the run time justifies it, not before.

How do dbt tests fail?

A dbt test is a query that selects failing rows. It passes when the query returns zero rows and fails otherwise. Generic tests like unique and not_null follow the same rule under the hood.

What is the difference between a seed and a snapshot?

A seed is a small static CSV file you check into the repo and load with dbt seed, good for lookup tables like country codes. A snapshot tracks changes to a mutable table over time and implements Type 2 slowly changing dimensions. They solve completely different problems.

Keep reading

More from Guides

Resources