The OBJECT data type in Snowflake
In this blog we will explore the OBJECT data type in Snowflake. This is a very useful data type because it allows us to work with data in a semi-structured form (like with JSON). We will take a look at how to construct OBJECTs in Snowflake using Snowflake constants and compare them to python dicts and json, amongst other things. First, however, it is important to make a terminological distinction. Snowflake has 2 distinct notions of object
:
Securable Snowflake objects
Anything you create with a CREATE statement in Snowflake is a Snowflake object
. These Objects are securable in the sense that you can control what can be done with them by who. Usually, you would use GRANT and REVOKE commands for that. Note that in most cases these access control privileges are applied to ROLES, not to USER
s in Snowflake (see User Roles).
One thing to note is that some Snowflake Objects are not directly created with CREATE
Statements. An example is a table COLUMN
. It gets indirectly created, added, altered etc. For that, you would use commands that apply to the table to which the column belongs: e.g. ALTER TABLE x ADD COLUMN y VARCHAR;
. Access control is granted or revoked indirectly in a similar fashion.
Finally, it is worth to note that the SHOW OBJECTS command uses a stricter notion of object: it only shows tables and views.
The OBJECT Data Type
Now, there is also a very different and unrelated Snowflake OBJECT
: the OBJECT data type. This is one of the 3 Semi-Structured data types used by Snowflake to handle semi-structured data.
The name OBJECT
can be seen as related to the data type with the same name in JSON. (The o in JSON actually stands for object.) Just as with JSON, a Snowflake OBJECT is essentially a collection of key-value pairs. What that means, is that there is a value associated with every key. So, in the following JSON example the key the
has the value 6
.
{
"these" : 1
, "strings" : 2
, "are" : 3
, "keys" : 4
, "and" : 5
, "the" : 6
, "numbers" : 7
, "values" : 8
}
If you are familiar with Python, you will probably recognize the above as a dict
. Python dicts and JSON objects are very similar concepts. There are a few differences though: Python allows strings to be enclosed by both "
and '
, while JSON requires "
and does not allow '
.
You can enter a Snowflake OBJECT directly in a SQL script (or Snowsight). You would then use what is called an Object Constants. Interestingly enough, you are required to use '
for strings and cannot use "
in the same way. See the following SELECT
Statement:
SELECT {
'these' : 1
, 'strings' : 2
, 'are' : 3
, 'keys' : 4
, 'and' : 5
, 'the' : 6
, 'numbers' : 7
, 'values' : 8
}
;
The above code will return an OBJECT data type, that can be used in a very similar way to how JSON objects can be used (or Python dicts for that matter).
Snowflake Constants
Why does the above Snowflake code require the use of '
and disallows "
? The answer to that actually makes a lot of sense, but first we will need to look at the concept of Snowflake constants. In the previous code example constants are actually used 2 times:
- for the strings (with
''
) - for the creation of the OBJECT (with
{}
)
What are constants
? The Snowflake Docs describes them as follows:
“Constants (also known as literals) refer to fixed data values.” (Snowflake Docs accesed 2022-01-04)
What I take from this, is that if you want to refer to a fixed value in Snowflake, you need to use a constant. if the fixed value is the string the, for example, you need to use the string constant 'the'
. Similarly, if you want to use the number 6, you use the numeric constant 6
. String constants need to be enclosed in single-quotes '
in Snowflake, while numeric constants are entered as is.
Here is a list of the constants we used thus far:
So, how about OBJECT Constants? This type of constant needs to be enclosed in {}
, but it can also contain strings, numbers etc. In the above example, the OBJECT constant is created with string constants as the keys and numeric constants as the values. Since we used string constants, we needed to ad ''
s!
Single quotes and double quotes in Snowflake
So why are single quotes allowed (and required!) with string constants and double quotes not? This is, in all honesty, something that caused me quite a bit of debugging and frustration at first. Snowflake uses double quotes "
for Identifiers and single quotes for string constants. The following example will hopefully clarify the difference:
WITH "StRaNgE" AS ( SELECT {
'these' : 1
, 'strings' : 2
, 'are' : 3
, 'keys' : 4
, 'and' : 5
, 'the' : 6
, 'numbers' : 7
, 'values' : 8
} AS obj )
SELECT*FROM "StRaNgE";
The identifier StRaNgE
in the last line is the name we gave to the result of the subquery after AS
in the first line. This could also have been the name of a table, for example. Snowflake will not properly recognize StRaNgE without double quotes around it. Why? Because Snowflake would make STRANGE
from it before it tried to find it (from the preceding WITH Clause):
“When an identifier is unquoted, it is stored and resolved in uppercase.” (Snowflake Docs accessed 2022-01-04)
However, since we used double quotes around StRaNgE
immediately after WITH
, only StRaNgE
is the correct identifier. Hence, we need double quotes to prevent capitalizing:
“When an identifier is double-quoted, it is stored and resolved exactly as entered, including case.”
(Snowflake Docs accessed 2022-01-04)
Some differences between Snowflake OBJECTS, JSON and Python dicts
The following table shows some of the syntactic differences between Snowflake OBJECT constants, JSON objects and Python dicts, with respect to how fixed string values are handled.
Snowflake OBJECT Constant | single-quoted '' string constants |
JSON object | double-quoted "" strings |
Python dict | strings with single and double quotes |
In my opinion, one of the most important things to remember about using semi-structured data in Snowflake, is that, in principle, Snowflake will handle semi-structured data with one of the semi-structured data types. Since Snowflake’s SQL Data Types will mostly be used in columns (of tables, views etc.), you will usually need to access this semi-structured data by querying a column that has a semi-structured data type.
If we look at the previous SQL example and expand on it, we see that *
could actually be rewritten as obj
, since we aliased the OBJECT constant as such in the WITH Clause. Then, without going into too much detail, we can get the key the with obj:the
, convert the VARIANT (Data Type) 6 to a number with obj:the::number
. Finally, passing all if this in our query, we get the value of the key the
!
WITH "StRaNgE" AS ( SELECT {
'these' : 1
, 'strings' : 2
, 'are' : 3
, 'keys' : 4
, 'and' : 5
, 'the' : 6
, 'numbers' : 7
, 'values' : 8
} as obj )
SELECT obj:the::number "the" FROM "StRaNgE";
By the way, we need to put the last the in the last line between double quotes like "the"
if we want the column in the query results to be lowercase: the alias "the"
is an identifier and as such the previously mentioned principles apply.
If we run the above query, we get the following result:
Conclusion
We looked at the Snowflake OBJECT data type, in particular to how it relates to Snowflake securable objects. We also saw how to construct an OBJECT with several constants. Meanwhile, we discussed how single and double quotes differ in Snowflake. Finally, we saw a simple example of how an OBJECT can be queried natively in Snowflake.
Make sure to visit our site The Information Lab NL to see more blog posts, training and consultancy services regarding Tableau, Alteryx and Snowflake.