While working on some data and importing a few files from an s3 bucket, Snowflake encountered an error that took me too long to figure out: Field delimiter ',' found while expecting record delimiter '-'.
To start of , I'm using a simple csv file format :
create or replace file format my_csv_format type = csv record_delimiter = '\n' field_delimiter = ',' skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true
The files that I'm importing are structured like this example :
As you can see, the field is delimited by a ',', empty fields are present and should be treated as NULL values, and from reviewing the file, I can see that every record has its own line.
With everything set , I started the imported but was quickly hit by the dreaded error :
Field delimiter ',' found while expecting record delimiter '\now' File 'factions.csv', line 759, character 85 Row 758, column "FACTIONS"["IS_PLAYER_FACTION":9]
the corresponding line :
|74927, "Strategic Wings, Order and Retaliation Division",1475489397,80,Cooperative,,,,1|
We can simply read that "Snowflake is finding a field delimiter while expecting a record delimiter" but what does it mean? It simply means that after an X number of Field Delimiters, Snowflake expects the next delimiter to be a record delimiter. The X here is the number of columns in the table (which is 8).
If we count the number of "," we can spot the error; this line has 9 instead of 8! One of the fields contains an extra : "Strategic Wings, Order and Retaliation Division". The field has a "," in its contents but shouldn't be used as a delimiter.
This is where a change to the file format comes in :
FIELD_OPTIONALLY_ENCLOSED_BY makes it possible to handle these kind of field values. We should let Snowflake know that the fields as a whole can be enclosed by double quotes (").
Our first instinct would tell us to use FIELD_OPTIONALLY_ENCLOSED_BY = " " or FIELD_OPTIONALLY_ENCLOSED_BY ' " ' to make this happen but this is the second reason for writing this piece. Snowflake also references this in its documentation:
Character used to enclose strings. Value can be
NONE, single quote character (
'), or double quote character (
"). To use the single quote character, use the octal or hex representation (
The hex representation of a double quote is 0x22 , making the new csv file format :
create or replace file format my_csv_format type = csv record_delimiter = '\n' field_delimiter = ',' skip_header = 1 null_if = ('NULL', 'null') empty_field_as_null = true FIELD_OPTIONALLY_ENCLOSED_BY = '0x22'