We have been fans of dbt at The Information Lab for a while, but we are still pleasantly surprised by features that we had previously underestimated.

Within dbt, the workflow is fairly self-explanatory and certain aspects are more fun than others. Normally, you would put this under a necessary evil and be at peace with an aspect of work that is not as easy as it could be. This is what we did, until we started looking again at the openness of dbt and the community around the product. Don't agree with something or see something that could be easier ? Create a package and make it available !

What is a package?

Software engineers often modulate code in libraries. These libraries help programmers to focus their time more effectively: They can spend more time focusing on their unique business logic and less time implementing code that someone else has already spent perfecting.

In dbt, libraries like this are called packages. The packages of dbt are so powerful precisely because so many of the analytical problems you encounter are not unique and are shared by organisations.

So here again, you see the element of software engineering seeping through to data engineering; use best practices that have been around for a long time within software development to bring the maturity of your data engineering to a higher level.

Where can I find these packages?

The most central place for dbt packages is a registry set up by dbt itself: the package hub.
The hub serves as a central location from where you can easily import the packages into your dbt project (and keep them up-to-date).

Another place to find packages is GitHub. The developer of the packages has shared it as a project with others and it can be installed from there.

Private and local packages are also an option: You can write these packages specifically for use only within your own team using a private or closed repository. You will generally encounter fewer of these.

How do I install these packages?

Installing the packages is a lot easier than what we are used to from the software development side :
1. Create a file called "packages.yml"
2. Enter the packages you want with their version number

packages:
  - package: dbt-labs/codegen
    version: [">=0.6.0", "<0.7.0"]
  - package: tnightengale/dbt_meta_testing
    version: [">=0.3.5", "<0.4.0"]

3. Run the command "dbt deps"
4. ??
5. Profit!

How do I use these packages?

Each package has its own way of controlling, its own commands and its own way of working. The packages are generally well documented (especially if they are on the hub) with a clear expectation and examples of use.

Are there any examples of what packages can do for you?

2 packages I am very enthusiastic about are the packages Codegen and dbt_meta_testing

Codegen makes one of my least favourite tasks a breeze :
Creating .yml files with table/column information.
It can be a lot of work to get all the columns out of a table, put the correct spacing in your .yml with a description and then hope you didn't forget anything (more about that later).

Codegen

Codegen can, among a lot of other functions, very easily create the .yml for all your sources within a schedule with :

{{ codegen.generate_source('<schema>',database_name = '<database>') }}

Or create the .yml for a specific model :

{{ codegen.generate_model_yaml(model_name='<model>') }}

You paste the above commands into a Scratchpad , press Compile and the output is your complete .yml , with the correct indentation. The first time I saw it pass by I did a little dance of joy !

Codegen can be found on the package hub here.

dbt_meta_testing

I came across dbt_meta_testing when I mentioned in dbt's Slack channel that I was not sure that my documentation was 100% up to date.
meta_testing has two functions; to check if all documentation is present and to check if all your testing is present. I use it mainly for the first part : documentation.
A new column is quickly created on request of the business but is often forgotten in the documentation area. By adding a single line of code in your project.yml file :

 +required_docs: true

checks dbt_meta_testing for 3 things :

  • The model has a non-empty description
  • The columns in the model are specified in the model .yml
  • The columns specified in the model .yml have non-empty descriptions

Simply enter the following command and the check is done:

dbt run-operation required_docs

dbt_meta_testing can be found here on the package hub.