Loading
×Sorry to interrupt
CSS Error
Skip to Main Content
View This Post

AG asked a question.

Different behavior of file name while unloading data

Little Background :

While unloading the data from Snowflake to S3 bucket, there are two options available related to number of file viz. single file or multi-file governed by the parameter SINGLE = "True"/"False" in copy command.

 

In my use-case, I am using a compression format (GZIP) for a csv file defined in custom file format for copy command.

 

Problem Statement :

While using Single = True, I have to specifically mention <file-name>.csv.gzip in the name of file to make sure output file can be readable otherwise the output file would be just <file-name>

BUT

in case of Single = "False", if I am giving the <file-name>.csv.gzip as output file-name what I am getting as output is multiple files like below :

<file-name>.csv.gzip_01_01_01.csv.gz

<file-name>.csv.gzip_01_01_02.csv.gz

<file-name>.csv.gzip_01_01_03.csv.gz

and so on.

that means snowflake is explicitly adding csv.gz extension in case of muli-part files which is different from usual behavior .


  • michael.walton (Snowflake)

    I'm unsure of your question here. The behavior you are seeing is the desired output. If you don't want the .csv.gzip in your SINGLE = False output, then change your file-name. The .csv is added to the end only if you are setting a file extension in your COPY statement (this can be removed, if desired). The .gz is added because you are asking for compression. SINGLE = True and SINGLE = False have different behavior, which is documented here: https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html

      Selected as BestSelected as Best
    • michael.walton (Snowflake)

      I'm unsure of your question here. The behavior you are seeing is the desired output. If you don't want the .csv.gzip in your SINGLE = False output, then change your file-name. The .csv is added to the end only if you are setting a file extension in your COPY statement (this can be removed, if desired). The .gz is added because you are asking for compression. SINGLE = True and SINGLE = False have different behavior, which is documented here: https://docs.snowflake.net/manuals/sql-reference/sql/copy-into-location.html

        Selected as BestSelected as Best
      • @michael.walton (Snowflake)​ Thanks for looking into the request.

         

        Question here is : Behavior of COPY statement is not consistent between SINGLE = True and Single = False for same file format.

        In one case (Single = True), we have to specify the file extension explicitly and in other case (Single = False) file extension comes automatically.

         

        From programming point of view, this is not consistent and I am sure a little tweak in code can rectify this.

        Expand Post
        • michael.walton (Snowflake)

          Yes - but the case in which you'd produce a single file versus multiple files are different, as well. So, if you are dynamically generating these statements within code, you'd either code those different cases into your application, or you'd choose one. In my previous life, we always used SINGLE=False and programmed around that use-case, because it's more efficient and you don't run into the possibility of a single file being too large for S3 blobs to store.

          • michael.walton (Snowflake)

            As a note, if you'd like to recommend that this be changed, I'd suggest that you post to the Idea section of Lodge.

            Loading