Dynamic Data Masking is a Column-level Snowflake Security feature that uses masking policies to selectively mask data in table and view columns at query time. Currently, Snowflake supports Dynamic Data Masking on tables and views.

dbt is a development framework on top of snowflake that allows me to build modular SQL data pipelines and deploy code using dev/prod environments. It allows git version control, documentation and much much more. Check dbt documentation to learn more about it!

Recently, I have been experimenting with a dbt package called dbt_snow_mask to implement Dynamic Data Masking in snowflake using only dbt.

Let’s not re-invent the wheel…

One of the benefits of using dbt is the robust package ecosystem. A dbt package is additional Jinja and SQL code that can be added to a project, for additional functionality. It is as simple as importing a package in python.

One such package is dbt_snow_mask which contains macros that can be used across our project. The package will help us to apply Dynamic Data Masking using dbt meta. The package use dbt_utils as a dependency so make sure to attach it as well.

Adding the package into our project is quite simple, and we’ll do it by pasting the ‘package info’ to our packages.yml file:

packages:
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: entechlog/dbt_snow_mask
    version: 0.2.3

Execute dbt deps to install the packages.

Dynamic Data Masking: Create Masking Policy

Under the macro folder, I created snow-mask-ddl folder with an sql file containing a macro:

{% macro create_masking_policy_temp(node_database, node_schema) %}

CREATE MASKING POLICY IF NOT EXISTS {{ node_database }}.{{ node_schema }}.temp AS (val STRING)
RETURNS STRING ->
    CASE 
        WHEN CURRENT_ROLE() IN ('SYSADMIN') THEN val
        ELSE '***********'
    END

{% endmacro %}

The name of the masking policy created by this macro is {{ node_database }}.{{ node_schema }}.<masking_policy_name>. In my example we have <masking_policy_name> as temp. The Jinja description at the beginning is necessary for dynamic database schema specification.

Adding a ‘meta’ config to our models

Let’s add a meta config in the models related .yml file to define which columns the masking policy we created will be applied to. We will add it to jaffle_shop.yml, where we centralize our jaffle_shop staging configurations. Also, We will add it to our core.yml, where we centralize our marts/core configurations.

I Added the meta config under the right column for the model I want the masking to be applied. Here, under stg_customers we added the meta config to first_name and last_name columns.

An example of jaffle_shop.yml file I have in a project:

version: 2

sources:
  - name: jaffle_shop
    description: sources for jaffle_shop
    database: raw
    schema: jaffle_shop
    #loader: <string>
    #loaded_at_field: <column_name>
    #meta: {<dictionary>}
    #tags: [<string>]
    tables: 
      - name: customers
      - name: orders
        loaded_at_field: _batched_at
        freshness:
          warn_after: {count: 12, period: hour}


models:
  - name: stg_customers
    description: Staged customer data from jaffle_shop app
    columns:
      - name: customer_id
        description: Primary key for customers
        tests:
          - unique
          - not_null
      - name: first_name
        meta:
          masking_policy: temp
      - name: last_name
        meta:
          masking_policy: temp
    
  - name: stg_orders
    description: Staged order data from jaffle_shop app
    columns:
      - name: order_id
        description: Primary key for orders
        tests:
          - unique
          - not_null
      - name: status
        description: '{{ doc("order_status") }}'
        tests:
          - accepted_values:
              values: ['shipped', 'completed', 'return_pending', 'returned', 'placed']
      - name: customer_id
        tests:
          - relationships:
              to: ref('stg_customers')
              field: customer_id

An example of core.yml file, where we attach the meta config first_name and last_name columns in dim_customers.

version: 2

models:
  - name: dim_customers
    columns:
      - name: customer_id
        description: Primary key for customers
        tests:
          - unique
          - not_null
      - name: first_name
        meta:
          masking_policy: temp
      - name: last_name
        meta:
          masking_policy: temp
      - name: first_order_date
      - name: most_recent_order_date
      - name: number_of_orders
      - name: lifetime_value
  - name: fct_orders
    description: Staged order data from jaffle_shop app
    columns:
      - name: order_id
        description: Primary key for orders
        tests:
          - unique
          - not_null
      - name: customer_id
        #description:
      - name: order_date
      - name: amount

Applying Dynamic Data Masking

To apply our created macro, we would like to set pre_hook and post_hook in our dbt_project.yml configuration and create and apply our masking policy when dbt runs. Let’s have a look at the models section in our dbt_project.yml:

models:
  jaffle_shop:
    # all models should have 'unique' and 'not_null' tests
    +required_tests: {"unique.*|not_null": 2}
    staging:
      +materialized: view
    marts:
      core:
        +materialized: table
    pre-hook: 
      - "{{ dbt_snow_mask.create_masking_policy() }}"
    post-hook: 
      - "{{ dbt_snow_mask.apply_masking_policy() }}"

Alternatively, we can add a config to each specific model:

{{ config(
    pre_hook=[
      "{{ dbt_snow_mask.create_masking_policy() }}"
    ],
    post_hook=[
      "{{ dbt_snow_mask.apply_masking_policy() }}"
    ]
) }}

or using the command line:

dbt run-operation create_masking_policy --args '{"resource_type": "models"}'

Settings are now complete!

Let’s build everything with dbt build, and check our masking on snowflake..

Snowflake

After creating a new role rep1 with the right grants and permissions, we are ready to check our data masking. As a reminder, We are expecting the data to be masked for everyone except the SYSADMIN role:

CASE 
  WHEN CURRENT_ROLE() IN ('SYSADMIN') THEN val
  ELSE '***********'
END

let’s switch to SYSADMIN and check the sensitive tables:

use role SYSADMIN;
select * from stg_customers;
select * from dim_customers;

Both tables are not masked, as expected.

Checking the same tables, this time with rep1 role:

use role rep1;
select * from stg_customers;
select * from dim_customers;

We are done! All tables behave as expected.

Tags:

Bel ons

Afspraak

Mail ons