In the realm of data analytics and data warehousing, documentation is the unsung hero that guides data teams on their journey from raw data to actionable insights. It’s not just about the numbers; it’s about understanding the context, lineage, and business logic behind the data. in this blog post we will find out how to sync dbt documentation for tables in columns with snowflake and other BI tools using +persist_docs
config.
Unlike traditional methods that rely on scattered documents and manual updates, dbt embeds documentation directly into the data transformation process.
With features like +persist_docs
, dbt seamlessly captures, stores, and updates descriptions, metadata, and lineage information within your data warehouse. This approach not only centralises knowledge but also ensures real-time accuracy.
The Value of +persist_docs
By default, documentation persistence is disabled, but it can be enabled for specific resources or groups of resources as needed.
When you incorporate the +persist_docs
config into your dbt project, you’re taking a giant leap towards comprehensive data documentation. This feature automatically captures and stores descriptions, metadata, and lineage information about tables and columns directly into your Snowflake database. There are several advantages:
Centralized Knowledge
Your data documentation is no longer scattered across various files and documents. It’s neatly organized within your Snowflake data warehouse, making it a one-stop-shop for anyone seeking information about your data assets.
Real-time Updates
As your dbt project evolves and transformations are executed, the documentation is automatically updated. This ensures that the documentation remains accurate and up-to-date, saving you from manual updates and potential inconsistencies.
Collaboration and Accessibility
With documentation stored in your database, it’s easily accessible to your entire team. Whether it’s data analysts, data scientists, or business stakeholders, everyone can access and contribute to the documentation seamlessly.
Why is it Important?
Data documentation is not a luxury, it’s a necessity.
In a rapidly evolving data landscape, where data volumes and complexity are growing, having a robust documentation strategy is crucial for several reasons:
Knowledge Retention
People come and go, but data remains. Comprehensive documentation ensures that institutional knowledge about your data assets is preserved even as team members change.
Compliance and Auditing
Regulatory requirements demand transparency and traceability in data processing. Documentation aids in meeting these compliance standards and simplifies the auditing process.
Faster Decision-Making
Access to well-documented data accelerates decision-making. When users can easily understand the data’s context and purpose, they can make informed choices and leverage data effectively.
Reduced Errors
Clear documentation reduces the likelihood of errors in data analysis and reporting. When users know what the data means and how it’s transformed, they’re less likely to misinterpret or misuse it.
Integration with BI Tools
The +persist_docs
functionality within dbt represents a transformative capability for seamlessly integrating Business Intelligence (BI) tools like Metabase with your data warehouse. By persisting documentation directly in your database using dbt, you enable BI tools to tap into a consistent and up-to-date source of metadata and descriptions. This means that when you connect Metabase to your data warehouse, it can effortlessly retrieve the latest documentation, providing data analysts and business users with valuable context about the data they’re exploring. This integration streamlines the process of building insightful dashboards and reports, making it easier for teams to derive actionable insights from the data.
Synching snowflake with dbt docs
Synching snowflake with dbt docs can be achieved by enabling persist_docs
for columns and relations in your dbt_project.yml
:
models:
+persist_docs:
relation: true
columns: true
Run dbt and observe that the created relation and columns are annotated with your descriptions. We can see the comments added to the created view fct_orders
on snowflake.
Metabase – Alternative BI Tool
Metabase is a user-friendly, open-source business intelligence (BI) tool that empowers organizations to transform their data into actionable insights. With a simple and intuitive interface, Metabase enables users, even those without extensive technical backgrounds, to query, visualize, and share data effortlessly. It connects to a wide range of data sources, including databases, data warehouses, and spreadsheets, making it a versatile solution for data exploration and reporting.
Metabase open-source nature ensures flexibility and customization, making it a popular choice among businesses seeking an accessible and cost-effective BI solution.
Syncing and Scanning Databases
Metabase runs different types of queries to stay up to date with your database. Once the queries are done running, you can view and edit the synced metadata from Admin settings > Table Metadata.
In the image below, we can see that Metabase picked up all the descriptions made on snowflake using the +persist_docs
config.
Summary
In conclusion, +persist_docs
in dbt isn’t just a feature; it’s a game-changer. It transforms your data documentation from a static task to an integral part of your data workflow. With centralized, real-time, and accessible documentation, you’re not just building data models; you’re building a data-driven culture that empowers everyone in your organization to make informed decisions.
Thank you for reading this blog. Also check out our other blogs page to view more blogs on Tableau, Alteryx, and Snowflake here.
Work together with one of our consultants and maximize the effects of your data.
Contact us, and we’ll help you right away