Redshift COPY Command errors and how to solve them Part-1

Introduction:

Amazon Redshift is an optimized, highly scalable data warehouse system. It really offers fast and robust performance even for very large number of records. In a typical data warehouse, data loading from various source is always a challenging task. Redshift takes this challenge of data loading through COPY command. COPY command is the recommend way to load huge volume of data. Redshift is comparatively a newer technology. People face various Redshift COPY Command errors during bulk data load. In this post, I have tried to address an error along with a probable solution.

Redshift COPY Command errors description:

[Error Code: 0, SQL State: XX000]  ERROR: S3ServiceException: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint

Reason of the Redshift COPY Command errors:

During COPY command source file’s S3 bucket region is not in the same region of Redshift database cluster region.

Preparing the environment to generate the Redshift COPY Command errors:

In order to demonstrate the COPY command and to be accounted with the different error message, we will create a sample table like below:

CREATE TABLE orders
(
CustomerName     VARCHAR(50) NOT NULL,
orderid          INT NOT NULL,
orderdate        DATE NOT NULL,
city             VARCHAR(20) NOT NULL,
unitprice        FLOAT NOT NULL,
quantity         INT NOT NULL,
shipregion       VARCHAR(20) NULL
)

I have a sample text file with 100 records. You can download it from here. The file is tab delimited. I will upload this test file into S3 bucket in order to generate and fix all possible COPY related errors the file looks like below:sample file

Figure: Sample text file. 

How these Redshift COPY command errors are produced:

Now let us assume I have a cluster in “us-east” region.
redshift cluster region
Figure: 1

But my source file resides in another (us-west) region.
s3 region
Figure: 2

Now, I will execute the following COPY command to load the sample file into redshift table.

COPY orders FROM 's3://sourcedatainorig/order.txtcredentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' ;

I got the following error message:
s3 exception
Figure 3

How to solve these Redshift COPY command errors:

  1. By creating the S3 bucket in the same region of the Redshift cluster.
  2. By specifying REGION parameter in the COPY command.

 By creating the S3 bucket in the same region of the Redshift cluster.

  1. Go to redshift console and find out your cluster region (see Figure 1). Assume, my cluster in US-East region.
  2. Go to S3 console and create a bucket in the same region.
  3. Keep your file into the newly created bucket.
  4. Modify COPY command source file path accordingly and execute it.

By specifying REGION parameter in the COPY command.

If you really need to load data in to redshift table from a S3 bucket that resides in another region, you need to specify REGION parameter in your COPY command. To get the region name from file in S3 follow the steps below:

  1. Go to S3 and open the bucket.
  2. Select the file (order.txt), right click on it and choose properties.
  3. Have a look the “Link” text in the properties right pane like below:

s3 region
Figure: 4

  1. Get the region (us-west-2) from the link property and organize the COPY command like below along with “REGION” parameter.
COPY orders FROM 's3://sourcedatainorig/order.txtcredentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' REGION 'us-west-2' ;

The COPY command now executed without the “S3ServiceException“error. Still the COPY command would not get succeeded. I will try to cover the error details and the possible regulation in the subsequent post.