Snowflake Stored Procedures

15 november 2022
Atzmon Ben Binyamin
Snowflake
0

Snowflake stored procedures allow us to write procedural code that executes SQL. In a stored procedure, we can use programmatic constructs to perform branching and looping. Just like functions, stored procedures are created once and can be executed as many times as needed.

Currently, we can write stored procedures in Java, JavaScript, Python, Scala and SQL aka Snowflake Scripting. Snowflake Scripting is an extension to Snowflake SQL that adds support for procedural logic and can be used inside or outside (aka anonymous blocks) a stored procedure. A stored procedure is created with a CREATE PROCEDURE command and is executed with a CALL command.

One common use for stored procedures is to automate a task that requires multiple SQL statements and is performed frequently. It happens to be that Frosty Friday challenge #10 asks us just that!

For the challenge, we will be using Snowflake Scripting (SQL) to construct our stored procedure.

Challenge

The challenge described in challenge #10 states that a client wants to be able to load data from a stage in a manual but dynamic fashion. To be exact, he wants to:

Execute a single command (Stored procedure!)
Do so manually: NO scheduling and NO SnowPipes!
Dynamically determine the warehouse size, if a file is over 10KB then use a small warehouse, anything under that size should be handled with an xsmall warehouse.

We will start with creating the infrastructure and configure our work environment.

-- Environment
use warehouse compute_wh;
use database atzmon_db;
use schema challenges;

-- Create warehouses
create warehouse if not exists my_xsmall_wh
    with
    warehouse_size = XSMALL
    auto_suspend = 120;
    
create warehouse if not exists my_small_wh
    with
    warehouse_size = SMALL
    auto_suspend = 120;
    
-- Create the table
create or replace table ch10_tbl(
    date_time DATETIME,
    trans_amount DOUBLE
    );

-- Create file format
create or replace file format ch10_csv_ff
  type = 'CSV'
  field_delimiter = ','
  skip_header = 1;
  
-- Create the stage
create or replace stage ch10_stg
    url = 's3://frostyfridaychallenges/challenge_10/'
    file_format = ch10_csv_ff;

list @ch10_stg;

Snowflake Scripting Block

Before diving into our stored procedure, let’s first understand the building blocks of a Snowflake scripting. A general basic structure consists of several sections (required and optionals) with different purposes. Each block must contain a BEGIN and END statements, all other statements are optional.

[ DECLARE ... ] -- Variable declarations, cursor declarations...
  ... 
BEGIN -- Snowflake Scripting and SQL statements (*Required)
  ... 
    -- Branching
    [ IF ...]
    [ CASE...]

    -- Looping 
    [ FOR ...]
    [ WHILE ...]
    [ REPEAT ...]
    [ LOOP ...]

    -- Loop termination (Within a looping construct)
    [ BREAK ]
    [ CONTINUE ]

    -- Variable assignment outside DECLARE
    [ LET ... ]

    -- Cursor management
    [ OPEN ... ]
    [ FETCH ... ]
    [ CLOSE ... ]

    -- Returning a value, resultset
    [ RETURN ... ]

    -- Handling exceptions
    [ EXCEPTION ... ]

END; (*Required)

We will touch base on some section statements as we go through our procedure script below…

Stored Procedure Steps

CREATE Procedure

We start with a CREATE or REPLACE procedure statement, specifying the variables we are going to pass as inputs (stage_name STRING, table_name STRING). In addition, we will specify the data type we are expecting to return (returns STRING) and the language (SQL) we are using for the procedure.

The last parameter we use in our procedure configuration is EXECUTE AS CALLER which specifies executing the procedure with the privileges of the caller (a “caller’s rights” stored procedure). By default, Snowflake assigns a procedure with “owner’s rights” (EXECUTE AS OWNER). Interestingly, “owner’s rights” stored procedures have less access to the caller’s environment, such as the variables we are using within the procedure.

DECLARE

Within the snowflake scripting infrastructure, we are allowed to use variables in expressions. Before using them though, We must declare them, and specify the data type for each one of them. Alternatively, We can declare variables outside the ‘Declare’ section and within the BEGIN … END section of the block (before using the variable) by using the LET command. We can declare variables, cursors, resultsets and exceptions. These variables cannot be used outside of the block.

Within the ‘declare’ section of the block, general description for a variable would be:
<variable_name> <type> { DEFAULT | := } <expression>;
while within the ‘begin.. end’ section we would use:
LET <variable_name> <type> { DEFAULT | := } <expression>;

In our script we declare the variable tbl_records as a NUMBER data type in the ‘declare’ section, and assigning it a value of 0 using the := operator. On the other hand, stg_rec is also a variable (resultset), defined using a LET command. Declared variables cannot be used outside of the BEGIN … END block.

BEGIN … END

Snowflake scripting support EXECUTE IMMEDIATE statement which executes a string that contains a SQL statement and returns the result. We begin our BEGIN … END section with an immediate execution of 'ls @' || stg_name, to dynamically list files in stage. Next, we are declaring a variable (stg_rec) using the LET command. The variable is then assigned to contain the info from the last query run.

create or replace procedure dynamic_warehouse_data_load(
      stg_name STRING,
      tbl_name STRING  
        )
  returns STRING
  language SQL
  execute as caller
  as
  decalre
    tbl_records NUMBER := 0;
  begin
    execute immediate 'ls @' || stg_name;
    let stg_rec resultset := (select "name" as name,
                                "size" as size
                              from table(result_scan(last_query_id()))
                                );
    let cur cursor for stg_rec;
    for i in cur do
        if (i.size < 10000) then 
                   execute immediate 'use warehouse my_xsmall_wh';
        else execute immediate 'use warehouse my_small_wh';
        end if;
        execute immediate 'copy into ' || tbl_name ||
                          ' from @' || stg_name || 
                          ' files = (''' || split_part(i.name, '/', -1) || ''')';
    end for;
    select count(*) into :tbl_records from identifier(:tbl_name);
    return tbl_records;
  end;

CURSOR

Our next step is setting up a cursor for the stg_rec variable, but what is a cursor actually?

A cursor is an object that enables traversal over the rows of a result set. It allows us to process and run through individual rows returned by a query.

When working with cursors, we can use Fetch Iterations syntax or For-Loops syntax according to our needs.

-- *** Cursor Fetch Iterations syntax ***
DECLARE
  <variable_name> CURSOR FOR <query>;
BEGIN
  OPEN <cursor_name> [ USING ];
  FETCH <cursor_name> INTO <variable>;
  CLOSE <cursor_name>;
END;

Having said that, we will declare a cursor for the variable (let cur cursor for stg_rec;) and use a cursor-based for-loop syntax to iterate through the rows in the cursor.

-- *** cursor-based for-loop syntax ***
FOR <row_variable> IN <cursor_name> DO    
    <statement>;
    <statement>;
END FOR;

While iterating each row variable (i), we check the condition i.size < 10000 and apply the appropriate execute immediate statement for the right warehouse size. Lastly, still within the cursor for-loop, we copy each transaction within each CSV file into the tbl_name table that we passed as input for the procedure.

Binding variables

In our last select statement,
select count(*) into :tbl_records from identifier(:tbl_name);
we bind the COUNT(*) result into our first declared variable using the : operator while referring to an object (tbl_name). When using a scripting variable as the name of an object, we need to use an IDENTIFIER(). Check snowflake documentation to find more about working with variables.

Finally, we return tbl_records, as declared at the first step of our procedure, to show the number of transactions fetched and copied into ch10_tbl.

Calling Stored Procedures

Now, let’s call our stored procedure, passing in our stg_name and tbl_name:

-- Call stored procedure
call dynamic_warehouse_data_load('ch10_stg', 'ch10_tbl');

For a more granular check of those 4000 transactions we can check ch10_tbl or/and information_schema.query_history:

-- Call stored procedure
select * from ch10_tbl;
select * from table(information_schema.query_history())
order by start_time;

We have created our very first stored procedure using Snowflake scripting which is quite a game changer in how we can develop and run SQL code in snowflake. I totally think that the more you use it, the more you will love them and change how you do things in snowflake.

I hope this blog post was clear and you found it useful!

In case you found this post interesting, please consider to:

Check out our blogs page to view more blogs on Tableau, Alteryx and Snowflake.