Dynamic Data Masking is a Column-level Snowflake Security feature that uses masking policies to selectively mask data in table and view columns at query time. Currently, Snowflake supports Dynamic Data Masking on tables and views.
dbt is a development framework on top of snowflake that allows me to build modular SQL data pipelines and deploy code using dev/prod environments. It allows git version control, documentation and much much more. Check dbt documentation to learn more about it!
Recently, I have been experimenting with a dbt package called dbt_snow_mask
to implement Dynamic Data Masking in snowflake using only dbt.
Let’s not re-invent the wheel…
One of the benefits of using dbt is the robust package ecosystem. A dbt package is additional Jinja and SQL code that can be added to a project, for additional functionality. It is as simple as importing a package in python.
One such package is dbt_snow_mask
which contains macros that can be used across our project. The package will help us to apply Dynamic Data Masking using dbt meta
. The package use dbt_utils
as a dependency so make sure to attach it as well.
Adding the package into our project is quite simple, and we’ll do it by pasting the ‘package info’ to our packages.yml
file:
packages:
- package: dbt-labs/dbt_utils
version: 1.0.0
- package: entechlog/dbt_snow_mask
version: 0.2.3
Execute dbt deps
to install the packages.
Dynamic Data Masking: Create Masking Policy
Under the macro
folder, I created snow-mask-ddl
folder with an sql file containing a macro:
{% macro create_masking_policy_temp(node_database, node_schema) %}
CREATE MASKING POLICY IF NOT EXISTS {{ node_database }}.{{ node_schema }}.temp AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN') THEN val
ELSE '***********'
END
{% endmacro %}
The name of the masking policy created by this macro is {{ node_database }}.{{ node_schema }}.<masking_policy_name>. In my example we have <masking_policy_name>
as temp. The Jinja description at the beginning is necessary for dynamic database schema specification.
Adding a ‘meta’ config to our models
Let’s add a meta
config in the models
related .yml
file to define which columns the masking policy we created will be applied to. We will add it to jaffle_shop.yml
, where we centralize our jaffle_shop staging configurations. Also, We will add it to our core.yml
, where we centralize our marts/core configurations.
I Added the meta
config under the right column for the model I want the masking to be applied. Here, under stg_customers
we added the meta
config to first_name
and last_name
columns.
An example of jaffle_shop.yml
file I have in a project:
version: 2
sources:
- name: jaffle_shop
description: sources for jaffle_shop
database: raw
schema: jaffle_shop
#loader: <string>
#loaded_at_field: <column_name>
#meta: {<dictionary>}
#tags: [<string>]
tables:
- name: customers
- name: orders
loaded_at_field: _batched_at
freshness:
warn_after: {count: 12, period: hour}
models:
- name: stg_customers
description: Staged customer data from jaffle_shop app
columns:
- name: customer_id
description: Primary key for customers
tests:
- unique
- not_null
- name: first_name
meta:
masking_policy: temp
- name: last_name
meta:
masking_policy: temp
- name: stg_orders
description: Staged order data from jaffle_shop app
columns:
- name: order_id
description: Primary key for orders
tests:
- unique
- not_null
- name: status
description: '{{ doc("order_status") }}'
tests:
- accepted_values:
values: ['shipped', 'completed', 'return_pending', 'returned', 'placed']
- name: customer_id
tests:
- relationships:
to: ref('stg_customers')
field: customer_id
An example of core.yml
file, where we attach the meta
config first_name
and last_name
columns in dim_customers
.
version: 2
models:
- name: dim_customers
columns:
- name: customer_id
description: Primary key for customers
tests:
- unique
- not_null
- name: first_name
meta:
masking_policy: temp
- name: last_name
meta:
masking_policy: temp
- name: first_order_date
- name: most_recent_order_date
- name: number_of_orders
- name: lifetime_value
- name: fct_orders
description: Staged order data from jaffle_shop app
columns:
- name: order_id
description: Primary key for orders
tests:
- unique
- not_null
- name: customer_id
#description:
- name: order_date
- name: amount
Applying Dynamic Data Masking
To apply our created macro, we would like to set pre_hook
and post_hook
in our dbt_project.yml
configuration and create and apply our masking policy when dbt runs. Let’s have a look at the models section in our dbt_project.yml
:
models:
jaffle_shop:
# all models should have 'unique' and 'not_null' tests
+required_tests: {"unique.*|not_null": 2}
staging:
+materialized: view
marts:
core:
+materialized: table
pre-hook:
- "{{ dbt_snow_mask.create_masking_policy() }}"
post-hook:
- "{{ dbt_snow_mask.apply_masking_policy() }}"
Alternatively, we can add a config to each specific model:
{{ config(
pre_hook=[
"{{ dbt_snow_mask.create_masking_policy() }}"
],
post_hook=[
"{{ dbt_snow_mask.apply_masking_policy() }}"
]
) }}
or using the command line:
dbt run-operation create_masking_policy --args '{"resource_type": "models"}'
Settings are now complete!
Let’s build everything with dbt build
, and check our masking on snowflake..
Snowflake
After creating a new role rep1
with the right grants and permissions, we are ready to check our data masking. As a reminder, We are expecting the data to be masked for everyone except the SYSADMIN
role:
CASE
WHEN CURRENT_ROLE() IN ('SYSADMIN') THEN val
ELSE '***********'
END
let’s switch to SYSADMIN
and check the sensitive tables:
use role SYSADMIN;
select * from stg_customers;
select * from dim_customers;
Both tables are not masked, as expected.
Checking the same tables, this time with rep1
role:
use role rep1;
select * from stg_customers;
select * from dim_customers;
We are done! All tables behave as expected.