In this blog post, I will go through Frosty Challenge #6, dealing with geographic objects and the way to manipulate them for our own needs. It will also help anyone who is new to geo-spatial data and seeks to understand the basics and get started in Snowflake.

Our prime objective in this specific challenge is to build both the nations/regions and parliamentary seats into polygons, and then work out how many Westminster seats intersect with region polygons. 

So with without further ado, let’s get to it!

We will start with basic environment configuration, and stage the data.

// Environment
USE WAREHOUSE compute_wh;
USE DATABASE atzmon_db;
USE SCHEMA challenges;

// Creating a FILE FORMAT
CREATE OR REPLACE FILE FORMAT ch6_csv_ff
TYPE = 'CSV'
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '"';

// Staging
CREATE OR REPLACE STAGE ch6_stage
FILE_FORMAT = ch6_csv_ff
URL = 's3://frostyfridaychallenges/challenge_6/';

If you haven’t noticed, there is a problem staging the file. There is a comma within some constituencies which prevent us from staging the CSV. To solve that we will use the FIELD_OPTIONALLY_ENCLOSED_BY option in the creation of the file format .

Ingesting the Data

We’ll start with nation and regions. First we’ll create an empty table and then copy the data into it.

// Creating Nations/Regions table
CREATE OR REPLACE TABLE nations_regions as (
  SELECT
    a.$1::VARCHAR as nation,
    a.$2::VARCHAR as type,
    a.$3::VARCHAR as sequence_num,
    a.$4::FLOAT as longitude,
    a.$5::FLOAT as latitude,
    a.$6::VARCHAR as part
  FROM @ch6_stage/nations_and_regions.csv (file_format => ch6_csv_ff) a
);

We will follow the same with the Westminster seats. This time we will create an empty table and copy the data into it.

// Creating Empty table
CREATE OR REPLACE TABLE west_constituency_pts (
  constituency VARCHAR,
  sequence_num NUMBER,
  longitude FLOAT,
  latitude FLOAT,
  part VARCHAR
  );

// Copying records
COPY INTO west_constituency_pts
FROM @ch6_stage/westminster_constituency_points.csv
  file_format = 'ch6_csv_ff',
  PURGE = FALSE;

Now that we have the data in place, we can start diving into geospatial functions and manipulations.

Basic Geo-spatial Objects – Points, Lines and Polygons

In the beginning God created the points. And God said, “Let there be lines,” and there were lines. And God said, “Let there be polygons” – And it was so.

Points, lines and polygons are the basic spatial elements when constructing geographical objects and they are usually called Vector data. Vector data structures represent specific features on the Earth’s surface, and assign attributes to those features. Vectors are composed of discrete geometric locations (x, y values) known as vertices that define the shape of the spatial object. The organisation of the vertices determines the type of vector that we are working with: point, line or polygon.

Handling geospatial data may seem a bit intimidating at first, but we’ll try to digest it piece by piece.

For start, we will alter the session and set the geography output to WKT. If it sounds like an ancient language please read the snowflake geospatial data types documentation.

Creating Regional Polygons

Let us start with creating the polygons for the nations/regions data. Since we are starting with points data, climbing the ladder to create polygons I will break it to small bits using a WITH clause.

// Set the output format back to WKT
ALTER SESSION SET geography_output_format = 'WKT';

// Construct a GEOGRAPHY object that represent a point with the specified 
// longitude and latitude
CREATE OR REPLACE TABLE nations_regions_pols AS (
WITH pts AS (
  SELECT nation, type, sequence_num, longitude, latitude, part,
    ST_MAKEPOINT(longitude, latitude) as geo_pts
  FROM nations_regions
),

// Construct the FIRST point GEOGRAPHY object
pts_0 AS (
    SELECT nation, type, sequence_num, longitude, latitude, part, 
        ST_MAKEPOINT(longitude, latitude) as geo_pts_0
    FROM nations_regions
    WHERE sequence_num = 0
    ),

// collecting all points
collect_pts AS (
    SELECT nation, type, part, 
        ARRAY_AGG(sequence_num) as seq,
        ST_COLLECT(geo_pts) as collection_pts
    FROM pts
    WHERE sequence_num != 0
    GROUP BY nation, type, part
    ),

// Joining start/End point and rest of the points in a table 
lines_tbl AS (
    SELECT cp.nation, cp.type, cp.part, 
        cp.seq, cp.collection_pts,
        pz.geo_pts_0
    FROM collect_pts cp
    LEFT JOIN pts_0 pz ON cp.nation = pz.nation
        AND cp.type = pz.type
        AND cp.part = pz.part 
    ),


// Construct a GEOGRAPHY object that represents a line 
// connecting the points in the input objects
lines AS (
    SELECT nation, type, part, seq,
        ST_MAKELINE(geo_pts_0, collection_pts) as geo_lines
    FROM lines_tbl
    ),

// Constructs a GEOGRAPHY object that represents a polygon without holes.
// Function uses the specified LineString as the outer loop
pols AS (
SELECT nation, type, part, seq, 
    ST_MAKEPOLYGON(geo_lines) as part_pols
FROM lines
    )

// Finally, getting what we need
SELECT nation, type, ST_COLLECT(part_pols) as all_pols
FROM pols
GROUP BY nation, type
    );

The documentation of ST_MAKELINE function is not that clear for what should be the two arguments in the function. Since a line is made from all collection points, the two arguments are:

  1. Start point (e.g. geo_pts_0)
  2. Other points collection with the start point as the last point (e.g. collection_pts)

It may be very helpful to see what you are doing, and there are a few free solutions out there depending on the geo format you are using. Since we are dealing with WKT, we can take advantage of WKT Playground and show our polygons.

South East Region Multi-Polygon

Westminster seats polygons

We will tackle the Westminster seats polygons in. the same manner we dealt with regions.

// Dealing with Westminster seats polygons
CREATE OR REPLACE TABLE west_const_pols AS (
WITH pts AS (
    SELECT constituency, sequence_num, longitude, latitude, part,
      ST_MAKEPOINT(longitude, latitude) as geo_pts
    FROM west_constituency_pts
),

// Constructing the FIRST point GEOGRAPHY object
pts_0 AS (
    SELECT constituency, sequence_num, longitude, latitude, part, 
      ST_MAKEPOINT(longitude, latitude) as geo_pts_0
    FROM west_constituency_pts
    WHERE sequence_num = 0
    ),

// collecting all points
collect_pts AS (
    SELECT constituency, part, 
        ARRAY_AGG(sequence_num) as seq,
        ST_COLLECT(geo_pts) as collection_pts
    FROM pts
    WHERE sequence_num != 0
    GROUP BY constituency, part
    ),
  
// Joining start/End point and rest of the points in a table 
lines_tbl AS (
    SELECT cp.constituency, cp.part, 
        cp.seq, cp.collection_pts,
        pz.geo_pts_0
    FROM collect_pts cp
    LEFT JOIN pts_0 pz ON cp.constituency = pz.constituency
        AND cp.part = pz.part 
    ),
 
// Construct a GEOGRAPHY object that represents a line 
// connecting the points in the input objects   
lines AS (
    SELECT constituency, part, seq,
        ST_MAKELINE(geo_pts_0, collection_pts) as geo_lines
    FROM lines_tbl
    ),
 
// Constructs a GEOGRAPHY object that represents a polygon without holes.
// Function uses the specified LineString as the outer loop
pols AS (
SELECT constituency, part, seq, 
    ST_MAKEPOLYGON(geo_lines) as part_pols
FROM lines
    )

SELECT constituency, ST_COLLECT(part_pols) as all_pols
FROM pols
GROUP BY constituency
    );

Happy Days! We have now two tables filled with the data we need.

We still have to work out how many Westminster seats intersect with the region polygons.

Let’s create a view doing just that with the help of ST_INTERSECTS which returns TRUE if the two GEOGRAPHY objects intersect (Regions and Westminster polygons). Since it returns a boolean, we will use it to base a left join as shown below:

CREATE OR REPLACE VIEW product AS
SELECT np.nation, COUNT(wp.constituency) as cnt_const
FROM nations_regions_pols np
LEFT JOIN west_const_pols wp ON ST_INTERSECTS(np.all_pols, wp.all_pols)
GROUP BY 1
ORDER BY 2 DESC;

// Checking the results
SELECT * FROM product; 

The challenge gave us a taste of the snowflake geospatial functionality. Hopefully, this blog post helped you clarify and mist out the creation of geo-spatial objects.

In case you found this post interesting, please consider to:

Check out our blogs page to view more blogs on Tableau, Alteryx and Snowflake.

Work together with one of our consultants and maximise the effects of your data. 

Contact us, and we’ll help you right away.

Bel ons

Afspraak

Mail ons