Frosty Friday’s week 4 challenge asks us to take a Json file, ingest it into the data warehouse, and parse it into a table. Resulting table should include all elements within the Json, parsed into appropiate rows and columns. A great oppurtinty to sharpen our Snowflake skills and this time we will be parsing a Json.

Upon examining the challenge and the Json file, we are prompted with a few pointers:

  • Separate columns for nicknames and consorts 1 – 3, many will be null.
  • An ID in chronological order (birth).
  • An Inter-House ID in order as they appear in the file.
  • There should be 26 rows at the end.

We’ll deal with each pointer as we go along – let’s get to it…

What is a Json Object?

Simply said, A JSON object data type is a list of key:value pairs surrounded in curly braces. Keys must be strings, and values must be a valid JSON data type (string, number, object, array, boolean or null).

  • Keys and values are separated by a colon.
  • Each key:value pair is separated by a comma.

Usually, Json objects are wrapped in an array and a basic simple representation of a Json object will look like the example below.

[ 
    { 
         "role" : "Admin", 
         "age" : 45, 
         "permissions" : [ "admin", "editor", "contributor" ] 
    }
]

Exploring the Json file in challenge 4 shows us several levels of keys and associated values. First key level is 'Era' and it has 2 values: 'Pre-Transition' and 'Post-Transition'. A second level comes with 'Houses', listing the different houses and finally another level with 'Monarchs', where most of our coulmns will come from.

Environment Configuration

After examining the challenge’s attached Json, let’s configure our environment in snowflake:

// Environment Configuration
USE WAREHOUSE COMPUTE_WH;
USE DATABASE ATZMON_DB;
USE SCHEMA challenges;

Next thing, let’s create a Json file format object with an option to remove the square brakets around our Json object.

// Create JSON file format
CREATE OR REPLACE FILE FORMAT ch4_json_ff
  type = 'json'
  strip_outer_array = TRUE;

Staging and View file

Let’s go ahead and create a stage with the s3 bucket in the url. Then we will list files in stage.

// Create a stage
CREATE OR REPLACE STAGE ch4_stg
file_format = 'ch4_json_ff'
url = 's3://frostyfridaychallenges/challenge_4/';

// And list files in the stage
LIST @ch4_stg;

We can also view the Json in the stage while using a regex pattern:

// View Json in stage
SELECT $1 as col1
FROM @ch4_stg (pattern => '.*Spanish_Monarchs.json');

Parsing a Json File

At this point we would like to parse the Json and create the requested columns, but first we need to find our way and recreate the Json keys and values into columns and rows. As mentioned earlier, we have our first level of 'Era'– Let’s query and parse it!

// Viewing first level key in Json
SELECT a.$1:Era::VARCHAR as era
FROM @ch4_stg (pattern => '.*Spanish_Monarchs.json') a

In our select statment, we are calling the variant column ($1), extracting the first level key (:Era) and at the same time casting it into a VARCHAR via the :: operator as an alternative syntax. The CAST function converts a value of one data type into another data type.

LATERAL FLATTEN

As described in Snowflake JSON Tutorial, we are adviced to use LATERAL FLATTEN, but what does it mean exactly?

LATERAL specifies an inline view within the FROM clause (aka LATERAL JOIN) and allows us to refer to multiple columns from other tables.

FLATTEN is a table function that, in conjuction with LATERAL, ‘explodes’ values into multiple rows (much like a cross-join).

Combined together, we are able to efficiently separate events into individual JSON objects while preserving the global data. FLATTEN returns a row for each object, and the LATERAL modifier joins the data with any information outside of the object. In other words, each key:value pair will turn into a new row.

More about LATERAL joins can be found in Dave Abercrombie’s great blog on ‘How To: LATERAL FLATTEN’. Another superb video made by Chris Marland, who is also responsible for this challenge, to explain exactly that.

With each LATERAL FLATTEN combination we are unlocking another level of keys within the JSON.

Rushing forward, the FLATTEN function has several different outputs available (check out the documentation) and we will use VALUE to extract what we want. Let us modify our code to unlock the next level of keys. We will add a lateral view looking at the 'Houses' level: LATERAL FLATTEN(input => a.$1:Houses).

// Viewing first and second level keys in Json
SELECT a.$1:Era::VARCHAR as era,
    h.value:House::VARCHAR as house
FROM @ch4_stg (pattern => '.*Spanish_Monarchs.json') a,
    LATERAL FLATTEN(input => a.$1:Houses) h;

Great! Now let’s parse the third level 'Monarchs'and extract the columns that we need.

Accessing object values

There are two ways to access object values:

  1. Using dot (.) notation
  2. Using bracket ([]) notation

We have already used the dot notation to parse the first two level keys 'Era' and 'Houses'(a.$1 and h.value) but in order to extract all objects within 'Monarchs' we’ll use brackets.

// Viewing all needed keys in Json 
SELECT a.$1:Era::VARCHAR as era,
    h.value:House::VARCHAR as house,
    m.value['Age at Time of Death']::VARCHAR as age_at_time_of_death,
    m.value['Birth']::DATE as birth,
    m.value['Burial Place']::VARCHAR as burial_place,
    m.value['Consort\\/Queen Consort'][0]::VARCHAR as queen_consort1,
    m.value['Consort\\/Queen Consort'][1]::VARCHAR as queen_consort2,
    m.value['Consort\\/Queen Consort'][2]::VARCHAR as queen_consort3,
    m.value['Death']::DATE as death,
    m.value['Duration']::VARCHAR as duration,
    m.value['End of Reign']::DATE as end_of_reign,
    m.value['Name']::VARCHAR as name,
    m.value['Nickname'][0]::VARCHAR as nickname1,
    m.value['Nickname'][1]::VARCHAR as nickname2,
    m.value['Nickname'][2]::VARCHAR as nickname3,
    m.value['Place of Birth']::VARCHAR as place_of_birth,
    m.value['Place of Death']::VARCHAR as place_of_death,
    m.value['Start of Reign']::VARCHAR as start_of_reign
FROM @ch4_stg (pattern => '.*Spanish_Monarchs.json') a,
    LATERAL FLATTEN(input => a.$1:Houses) h,
    LATERAL FLATTEN(input => h.value:Monarchs) m;

Please note that ‘Consort\\/Queen Consort’ and ‘Nickname’ are arrays. In order to understand how many objects are in each array we should parse it as an array first:
m.value['Consort\/Queen Consort']::ARRAY as consort

Different Types of Index

Within the instructions for the challenge, we are also asked to create an ID in the order as they appear in the file (raw_idx) and another ID in a chronological order based on 'birth' (birth_idx). The order as they appear in the file itself can be created using yet another FLATTEN output (index):
m.index+1 as raw_idx (we add 1 to the index since it starts with 0).

For the birth_idx we will use the useful window function ROW_NUMBER() ordered by 'birth':
ROW_NUMBER() OVER(ORDER BY m.value['Birth']::DATE) as birth_idx

Happy Days!

All we have to do is to extract it into a table. Let’s wrap our current code with a CREATE OR REPLACE TABLE and we are done!

// Extracting values from Json using LATERAL FLATTEN
// and creating a flat table
CREATE OR REPLACE TABLE ch4_flat_tbl as
  (        
SELECT 
    ROW_NUMBER() OVER(ORDER BY m.value['Birth']::DATE) as birth_idx,
    m.index+1 as raw_idx, 
    a.$1:Era::VARCHAR as era,
    h.value:House::VARCHAR as house,
    m.value['Age at Time of Death']::VARCHAR as age_at_time_of_death,
    m.value['Birth']::DATE as birth,
    m.value['Burial Place']::VARCHAR as burial_place,
    m.value['Consort\\/Queen Consort'][0]::VARCHAR as queen_consort1,
    m.value['Consort\\/Queen Consort'][1]::VARCHAR as queen_consort2,
    m.value['Consort\\/Queen Consort'][2]::VARCHAR as queen_consort3,
    m.value['Death']::DATE as death,
    m.value['Duration']::VARCHAR as duration,
    m.value['End of Reign']::DATE as end_of_reign,
    m.value['Name']::VARCHAR as name,
    m.value['Nickname'][0]::VARCHAR as nickname1,
    m.value['Nickname'][1]::VARCHAR as nickname2,
    m.value['Nickname'][2]::VARCHAR as nickname3,
    m.value['Place of Birth']::VARCHAR as place_of_birth,
    m.value['Place of Death']::VARCHAR as place_of_death,
    m.value['Start of Reign']::VARCHAR as start_of_reign
FROM @ch4_stg (pattern => '.*Spanish_Monarchs.json') a,
    LATERAL FLATTEN(input => a.$1:Houses) h,
    LATERAL FLATTEN(input => h.value:Monarchs) m
  );

I hope this blog helped you clarify and better understand the challenge. Until the next time!

In case you found this post interesting, please consider to:

Check out our blogs page to view more blogs on Tableau, Alteryx and Snowflake.

Work together with one of our consultants and maximise the effects of your data. 

Contact us, and we’ll help you right away.

Tags:

Bel ons

Afspraak

Mail ons