Aside from the great documentation available for Snowflake users, there aren’t any dedicated challenges to helps ourselves in further developing our Snowflake skills. Luckily, there is a new kid on the block now… Frosty Friday (https://frostyfriday.org/), a series of weekly challenges released every Friday to help us practice and develop our Snowflake skills. Created by Snowflake users, for Snowflake users.
In this post, we will go through challenge 1 and better understand different features within Snowflake.
Frosty Friday – Challenge 1
For the first challenge, we are asked to create an external stage, and load the CSV files (Currently in a s3 bucket) directly from that stage into a snowflake table. For a better context, Please refer to the challenge instructions.
Let’s go through the steps in order to make that happen.
- Environment configuration
- File format to stage
- Creating Tables and copying the data
After firing up our Snowflake instance we would like to set up our environment. We should state what will be our Role, Warehouse, Database and Schema.
My configuration is Role SYSADMIN, Warehouse COMPUTE_WH, Database ATZMON_DB and Schema CHALLENGES. In case you don’t have these set up, you can easily create them.
By default, Snowflake generates a warehouse called compute_wh but you can easily create another one using the code below where we create a SNOWFLAKE_WH virtual environment with the smallest size which auto suspends after 60 seconds of idle time.
// Creating a Warehouse
CREATE WAREHOUSE SNOWFLAKE_WH
WAREHOUSE_SIZE = XSMALL
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS = 300
STATEMENT_TIMEOUT_IN_SECONDS = 600;
Similar to creating a warehouse, it’s quite straightforward to create a database.
// Creating a Database
CREATE DATABASE ATZMON_DB
COMMENT = 'Challenges tables and views';
// Creating a Schema
CREATE SCHEMA CHALLENGES
COMMENT = 'Schema for Challenges';
Let’s finalise our set up with the code below.
// Environment Configuration
USE WAREHOUSE COMPUTE_WH
USE DATABASE ATZMON_DB
USE SCHEMA CHALLENGES
File Format to Stage
Stage is a place to put things temporarily before moving them to a more stable location. Since our csv file is located in a s3 bucket, we will create an external stage (Short term temp location) and unload the data. Later on we will ingest the data and load it to a table within our database (Long term storage).
Before we ingest any data we need to understand, or at least let Snowflake understand what is the underlying structure of the data. Snowflake needs to understand the file format we expect the data to be. Having CSV data residing in our s3 bucket, we will create a CSV file format.
// Creating a CSV file format
CREATE OR REPLACE FILE FORMAT ch1_csvformat
type = 'CSV'
field_delimiter = ','
skip_header = 1;
After creating a CSV file format, we will use it while creating a stage.
// Creating a Stage Object to reference data files stored in a s3 bucket
CREATE OR REPLACE STAGE ch1_csv_stage
file_format = ch1_csvformat
url = 's3://frostyfridaychallenges/challenge_1/';
// List objects in stage
After unloading our s3 bucket on stage, we would like to push the data to a more stable place – a table. For the creation of a table, we should let Snowflake know what is the Table Schema – A brief description regarding the table’s columns and their related data type.
First, let’s have a look at the csv we uploaded to our stage:
// Checking the CSV in stage
FROM @ch1_csv_stage (file_format => ch1_csvformat);
For those who are not familiar with snowflake notations, $1 and $2 are the regular data columns in the staged file.
Creating a Table with Data
Going forward with checking the result, we see only one column with text content. let’s go ahead and create our table. For our table schema, we’ll use one column (C1) with a VARCHAR data type.
// Creating a table and table schema
CREATE OR REPLACE TABLE "ATZMON_DB"."CHALLENGES"."CH1_CSV_TABLE"
Snowflake objects names such as schema, table, stage etc. must be unique within the context of the object type and the “parent” object. To enable resolving objects that have the same identifiers in different databases/schemas, Snowflake supports object identifiers in the form of:
Once our table is created, we can go ahead and copy the data in and check the final resulting table.
// Loading the CSV data into a table within the DB
COPY INTO "ATZMON_DB"."CHALLENGES"."CH1_CSV_TABLE"
// Checking the final table
SELECT * FROM ch1_csv_table
Our first challenge is done!
The Frosty Friday challenges give us the opportunity to really understand the different features in Snowflake. There is no doubt in my mind, they are a great way to develope our skills.
Stay tuned for more Frosty challenges!