You have your raw data in your database, but now you want to edit it. In dbt, this raw data is also called "Source(s)". A source is an as-yet unedited data source, to which you will subsequently apply all kinds of transformations. One of the advantages of using Sources, is that you only need to use a part of your data source from your data warehouse as a Source in dbt (you only import the part you need for certain transformations). In dbt, your sources look like this:

1

 

What exactly are Source files in dbt? 

In dbt, Sources files are nothing but YAML files. But what are YAML files, I hear you say? YAML (Yet Another Markup Language) is a markup language just like Json, XML, HTML etc. intended to store data and settings/configurations in a human readable format. YAML is a simple language to learn and is perceived by many of its users as easy to understand and easy to use.

As you can see in the diagram above, we build models on Sources. So we must first configure/set up these Sources, which is why we use YAML files. 

One of the advantages of YAML files is that you can define/configure multiple documents/Sources in one file - unlike files like JSON. 

Another place in dbt where you will mainly encounter YAML files is the 'dbt_project.yml' - the place to configure your overall project in dbt. YAML files have a '.yml' extension. 

How to configure a Source in dbt?

Configuring a Source in dbt simply means bringing in/importing a data source (schema, table etc.) from an underlying database such as Snowflake, Postgres etc. into dbt. In our case, we are working with Snowflake as the underlying database. In this example, we are going to bring the following tables from Snowflake into dbt as Sources:

     1

Sources in dbt are located in the 'dbt_project.yml' under the Models subfolder. It is recommended to use the template (YAML file) of dbt itself as a starting point and to extend it as desired. (You can find it here:

https://docs.getdbt.com/docs/building-a-dbt-project/using-sources)

1

We have added to this template as you can see below:

1

That is it! Save with the extension '.yml' and voilà! A Source in dbt has been created!

How do you refer to a Source in dbt?

Your Source is now ready to be built upon. For this you use a dbt Source function. The syntax of the Source function is as follows:

1

You call a Source by 'its name (source_name)' (in our case, 'Jaffel_shop') which you gave to that source when configuring the Source file. Then you indicate which table within that Source file you want to use in dbt.

For example, you create a new model ('stg_orders) in dbt that refers to the 'Orders' table that lives in your Source file:

1


Conclusion

We have seen how to create a source table in dbt. From here, you can build as complex a model as you want! You can even build hundreds of models all referring to this source table! 

Perhaps even more important to mention: if, for example, something changes in the underlying table of a Source (in our case, for example, a column name changes in the table 'Orders'), you only need to make this change in the Source file in dbt. As the rest of the models refer to it, you do not have to make this change in all of them. Saves a lot of time and effort!

I hope you got an idea of how Sources works in DBT. If you want to know more, please do not hesitate to contact us.

Tags: