Lint Models in dbt with SQLfluff

28 februari 2024
Atzmon Ben Binyamin
All
0

SQL is the cornerstone of database management and data manipulation. As data engineers and analysts, we spend a significant portion of our time crafting SQL queries to extract, transform, and analyze data. In the realm of analytics engineering, particularly with tools like dbt, maintaining clean and readable SQL code is crucial for collaboration, understanding, and maintenance. This is where SQLFluff comes to the rescue, making our dbt models concise and readable.

Enter SQLFluff

SQLFluff is an open-source linting tool for SQL code. It analyzes SQL queries and provides feedback on issues related to syntax errors, formatting inconsistencies, and adherence to coding conventions. Think of it as a spell checker for SQL – it helps ensure that your SQL code follows best practices and is consistent across your projects.

Why Use SQLFluff with dbt?

dbt is a powerful tool for managing the data transformation process in analytics projects. It allows you to define data models, tests, and documentation using SQL-based syntax. However, as projects grow in complexity and involve multiple contributors, maintaining a clean and consistent codebase becomes increasingly challenging.

SQLFluff is already implemented in dbt cloud where you can easily and seamlessly lint your sql models. we would like to achieve the same result using dbt core…

Linting your sql will help you in several ways:

Consistency: SQLFluff enforces a consistent coding style across your dbt projects. By adhering to a consistent style guide, you make it easier for team members to understand and collaborate on the codebase. Think Upper/lower keywords, leading/trailing commas, indentations, aliasing rules, etc.
Readability: Clean and well-formatted code is easier to read and understand. SQLFluff helps ensure that your SQL models in dbt are easy to comprehend, reducing the time needed for onboarding new team members or revisiting old code.
Maintainability: Consistent formatting and coding conventions make your SQL models more maintainable. When making changes or debugging issues, having a standardized codebase simplifies the process and reduces the risk of introducing errors.
Automated Checks: SQLFluff can be integrated into your dbt development workflow, allowing you to automatically check your SQL models for issues before they are committed to your version control system. This helps catch potential errors early in the development process.

Getting Started with SQLFluff and dbt

Integrating SQLFluff into your dbt workflow is quite straightforward. If you are working as the only developer, then you obviously have a clear understanding of how you want to style your code. When working with a team, you would first need to communicate and explain why it’s important to have certain style rules.

A favourable result from this discussion would involve establishing an initial set of agreed-upon rules to adhere to. Additionally, you may consider implementing these rules for a specific section of the project, especially if it’s extensive and encompasses multiple dbt projects.

Taking into account that you already have a dbt project, let’s install the necessary packages:
$ pip install sqlfluff sqlfluff-templater-dbt

You can confirm its installation by getting SQLFluff to show its version number:
$ sqlfluff version

SQLFluff is a configurable SQL linter, which means you can configure your own linting rules. To achieve this we create a configuration file named .sqlfluff and place it in the root directory of your dbt project. Now we need to tell SQLFluff what kind of dialect to use and tune desired parameters for our rules.

dbtonic linting rules

dbt labs provides a .sqlfluff config file showcasing their approach (pasted below). You can use it for your project or change it to your taste. In the example config below, I used dialect = snowflake, you can use whatever warehouse you’re familiar with. Run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff.

[sqlfluff]

dialect = snowflake

templater = dbt

runaway_limit = 10

max_line_length = 80

indent_unit = space

[sqlfluff:indentation]

tab_space_size = 4

[sqlfluff:layout:type:comma]

spacing_before = touch

line_position = trailing

[sqlfluff:rules:capitalisation.keywords] 

capitalisation_policy = lower

[sqlfluff:rules:aliasing.table]

aliasing = explicit

[sqlfluff:rules:aliasing.column]

aliasing = explicit

[sqlfluff:rules:aliasing.expression]

allow_scalar = False

[sqlfluff:rules:capitalisation.identifiers]

extended_capitalisation_policy = lower

[sqlfluff:rules:capitalisation.functions]

capitalisation_policy = lower

[sqlfluff:rules:capitalisation.literals]

capitalisation_policy = lower

[sqlfluff:rules:ambiguous.column_references]  # Number in group by

group_by_and_order_by_style = implicit

Ignoring files in SQLFluff

.sqlfluffignore file (also in the root directory of your dbt project) provides a convenient way to tailor SQLFluff‘s linting process and improve its efficiency in analyzing SQL code within your project.

By listing the files or directories in the .sqlfluffignore file, you can customize SQLFluff‘s behaviour to suit the specific needs of your project. This helps in focusing the linting process on relevant code while excluding irrelevant or generated files, test data, or other files that are not intended for linting.

Using SQLFluff

To get a feel for how to use SQLFluff, let’s look at this sql model:

with

customers as (

    select * from {{ ref("stg_customers") }}

),

fct_orders as (

    select * from {{ ref('fct_orders') }}

),

customer_orders as (

    select 

        customer_id

        , MIN(order_date) AS first_order_date

        , MAX(order_date) AS last_order_date

        , count(order_id) AS cnt_orders

        , sum(amount_usd) AS lifetime_value_usd

    from fct_orders

    GROUP BY 1

),

final as (

    select

    customers.customer_id

    , customers.first_name

    , customers.last_name

    , customer_orders.first_order_date

    , customer_orders.last_order_date

    , coalesce(customer_orders.cnt_orders, 0) as cnt_orders

    , customer_orders.lifetime_value_usd

    from customers

    left join customer_orders 

        on customers.customer_id = customer_orders.customer_id

)

select * from final

In order to test this model with SQLFluff, let’s run the below command and check results:
$ sqlfluff lint models/marts/core/dim_customers.sql

=== [dbt templater] Sorting Nodes…
=== [dbt templater] Compiling dbt project…
=== [dbt templater] Project Compiled.
== [models/marts/core/dim_customers.sql] FAIL
L: 11 | P: 11 | LT01 | Unnecessary trailing whitespace.
| [layout.spacing]
L: 13 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
L: 13 | P: 27 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 14 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
L: 14 | P: 27 | CP01 | Keywords must be lower case.
| [capitalisation.keywords]
L: 15 | P: 9 | LT04 | Found leading comma ','. Expected only trailing near
| line breaks. [layout.commas]
All Finished 📜 🎉!

Now, fixing that sql model and apply the rules we can run a fix command:
$ sqlfluff fix models/marts/core/dim_customers.sql

==== fixing violations ====
30 fixable linting violations found
Are you sure you wish to attempt to fix these? [Y/n] …
Attempting fixes…
Persisting Changes…
== [models/marts/core/dim_customers.sql] FIXED
Done. Please check your files to confirm.
All Finished 📜 🎉!

and voila! We fixed the sql model according to our style of choice, and this is the newly fixed model:


with
customers as (
    select * from {{ ref("stg_customers") }}
),

fct_orders as (
    select * from {{ ref('fct_orders') }}
),

customer_orders as (
    select
        customer_id,
        MIN(order_date) as first_order_date,
        MAX(order_date) as last_order_date,
        COUNT(order_id) as cnt_orders,
        SUM(amount_usd) as lifetime_value_usd
    from fct_orders
    group by 1
),

final as (
    select
        customers.customer_id,
        customers.first_name,
        customers.last_name,
        customer_orders.first_order_date,
        customer_orders.last_order_date,
        customer_orders.lifetime_value_usd,
        COALESCE(customer_orders.cnt_orders, 0) as cnt_orders
    from customers
    left join customer_orders
        on customers.customer_id = customer_orders.customer_id
)

select * from final

By following these steps, you can leverage SQL fluff to enhance the quality and consistency of your SQL models in dbt, ultimately improving the maintainability and readability of your analytics codebase.

Give SQL fluff a try in your dbt projects and experience the benefits firsthand!

Thank you for reading this blog. Also check out our other blogs page to view more blogs on Power BI, Tableau, Alteryx, and Snowflake here.

Work together with one of our consultants and maximize the effects of your data. Contact us, and we’ll help you right away.

Tags: dbt

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.