data loading s3 into sf
Home » Snowflake » Load data from an AWS S3 bucket into a Snowflake Table

Load data from an AWS S3 bucket into a Snowflake Table

In this blog, you will learn how to load data from an external AWS S3 bucket stage into a Snowflake table.

Post Contents

Create The IAM Role In AWS

Add Permission Policies

Assign the role name and description

Find ARN Name(Amazon Resource Name)

AWS S3 Storage Integration with Snowflake

Create File Format Objects

Create an external stage in snowflake

Create a Table in Snowflake for Data Load

Copy command to ingest data from AWS S3

Prerequisite

Upload a file to the AWS S3 bucket, as shown in the screenshot below. The bucket name is sfbucketpbi, the folder name is files, and the file name is  empData.txt.

Copy the S3 bucket folder URL and make a note of it for future reference.

Copy S3 bucket folder url

Copy S3 bucket folder url

Create The IAM Role In AWS

Log into the AWS Management Console, and from the home dashboard, select Identity & Access Management (IAM).

Alternatively, you can search for IAM and click on Roles , then proceed to click on Create role.

Create a role in Aws

Create a role in Aws

After clicking on the Create Role button, select the trusted entity, choose the AWS Account option, leave all other settings unchanged, and click on the Next button.

AWS Account

AWS Account

Add Permission Policies

Search for S3, and you will see all S3-related policy names. Choose AmazonS3FullAccess and click the Next button.

add permission in aws

add permission in aws



Assign the role name and description

Enter a unique role name, provide a description, and then click on Create Role button.

assign the role name in aws

assign the role name in aws

Now role has been created successfully.

Find ARN Name(Amazon Resource Name)

Go to IAM > Roles > Select your role > Copy the ARN and make a note of it somewhere for future use.

ARN in AWS

ARN in AWS

AWS S3 Storage Integration with Snowflake

Execute below command in snowflake Web UI.

create or replace storage integration my_aws_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE 
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::252332417099:role/mysnowflakerole'
STORAGE_ALLOWED_LOCATIONS = ('s3://sfbucketpbi/files/')
COMMENT = 'Integration with aws s3 buckets' ;

STORAGE_AWS_ROLE_ARN:  Specifies the Amazon Resource Name (ARN) of an AWS IAM role. This role is used to grant Snowflake the necessary permissions to access the specified AWS resources.

STORAGE_ALLOWED_LOCATIONS: Defines the allowed locations or paths within the specified cloud storage provider (S3 in this case). In this example, data can be loaded from or unloaded to the s3://sfbucketpbi/files/ location within the S3 bucket. Refer prerequisite section of this post.

Retrieve The AWS IAM User For Your Snowflake Account

Execute below command, Snowflake will return a result set that includes information about the specified integration.

 DESC integration my_aws_integration;
Copy ARN from Snowflake

Copy ARN from Snowflake

And copy the property value of STORAGE_AWS_IAM_USER_ARN, refer above image



Grant The IAM User Permissions To Access Bucket Objects

Switch back to the AWS Console, navigate to IAM, locate your role, and then click Edit trust policy.

update IAM AWS

update IAM AWS

Replace this with your STORAGE_AWS_IAM_USER_ARN, and click on Update policy button.

I am user ARN

I am user ARN

Create File Format Objects

Define a file format that specifies how your data is formatted in the S3 bucket files.

Let’s take a look at a sample file. The file format is TXT, and the data is in comma-separated format.

file format

file format

CREATE OR REPLACE FILE FORMAT aws_file_format
TYPE = ‘CSV’
FIELD_DELIMITER = ‘,’
SKIP_HEADER = 1;

Note:

  • Type: This specifies the file format type. In this case CSV will work for TXT file.
  • Field Delimiter: It specifies the field delimiter used in the files. Here data are separated by commas.
  • Skip Header: This line specifies the number of header rows to skip in the files.

Create an external stage

External stages are used to access and manage data files from external sources in Snowflake, such as Amazon S3, Azure Blob Storage, or Google Cloud Storage.

Create a stage object to specify to Snowflake where to retrieve the files from-

CREATE OR REPLACE STAGE my_stage_aws
URL = 's3://sfbucketpbi/files/empData.txt'
STORAGE_INTEGRATION = my_aws_integration
file_format= aws_file_format;

list @my_stage_aws;
Create external stage for aws

Create external stage for aws



Create a Table in Snowflake for Data Load

Write a CREATE TABLE statement to create a table and execute it.

create or replace TABLE DEVDB.PUBLIC.empData (
EMP_ID NUMBER(38,0) NOT NULL autoincrement,
FIRST_NAME VARCHAR(50),
LAST_NAME VARCHAR(50),
SALARY NUMBER(10,2)
);
Select * from empData;

Copy command to ingest data from AWS S3

The COPY INTO command is used to load data from external files, such as CSV, TSV or Parquet files, into a Snowflake table.

Copy into DEVDB.PUBLIC.empData
from @DEVDB.PUBLIC.my_stage_aws;

After that you can execute the below query to check the data in empData table.

Select * from empData;
Load data from aws s3 bucket to snowflake

Load data from aws s3 bucket to snowflake

Thank you for reading this post, and I hope you enjoyed it!

Leave a Reply