Now that we have started to look more closely at dbt, we are coming across more and more toys that we are excited about.

We are all used to the fact that a table is a static object that is not refreshed. The table is filled at the moment it is created, together with all the operations on the data. As soon as the table is recreated, data is retrieved and processed again. However, what is often forgotten is that ALL data is retrieved and processed.

Calculating adjustments over and over again is obviously a waste of time, especially when millions of rows have to be run through the mill each time.

dbt has created a simple macro for this purpose: Incremental models.

Incremental models

Incremental models are built as tables in the data warehouse - the first time a model is run, the table is built by transforming all the rows of source data. On subsequent executions, dbt transforms only the rows in the source data that you let dbt filter on, and adds them to the table that has already been built (the target table).

Often the rows you filter on in an incremental run are the rows in the source data that were created or updated since the last time dbt was run. As such, the model is built incrementally with each dbt run.

The use of an incremental model reduces the amount of data that needs to be transformed, which significantly shortens the running time of your transformations. This improves warehouse performance and reduces computational costs and time!

Simplicity

Without delving too deeply into the subject of dbt, setting up an incremental is fairly simple because it is a standard macro within dbt.

If we assume the following initial situation :

select *
from raw_app_data.events

Then we can quickly adapt it to use the standard functionality of incremental models:

{{
    config(
        materialized='incremental'
    )
}}

select *
from raw_app_data.events

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  where event_time > (select max(event_time) from {{ this }})

{% endif %}

What you can see is that at the top of the code it tells you that this is an incremental model instead of a table of view that is declared in the default configuration.
Furthermore, at the bottom we see an important if statement:
If the table does not exist yet, it will be fully constructed and calculated, but if not, only the records coming from the additional query will be processed, in our case the one where :
event_time > (select max(event_time) from {{ this }})
(incidentally, {{this}} is a quick way within dbt to indicate the current table).

Conclusion

Incremental models clearly give you the option to skip recalculations within large data sets and can provide quick wins within your warehouse. So be aware of the possibilities and don't be put off by any macros within dbt!