Amazon Redshift is an optimized, highly scalable data warehouse system. It really offers fast and robust performance even for very large number of records. In a typical data warehouse, data loading from various source is always a challenging task. Redshift takes this challenge of data loading through COPY command. COPY command is the recommend way to load huge volume of data. Redshift is comparatively a newer technology. People face various Redshift COPY Command errors during bulk data load. In this post, I have tried to address an error along with a probable solution.
Redshift COPY Command errors description:
[Error Code: 0, SQL State: XX000] ERROR: S3ServiceException: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint
Reason of the Redshift COPY Command errors:
During COPY command source file’s S3 bucket region is not in the same region of Redshift database cluster region.
Preparing the environment to generate the Redshift COPY Command errors:
In order to demonstrate the COPY command and to be accounted with the different error message, we will create a sample table like below:
CREATE TABLE orders ( CustomerName VARCHAR(50) NOT NULL, orderid INT NOT NULL, orderdate DATE NOT NULL, city VARCHAR(20) NOT NULL, unitprice FLOAT NOT NULL, quantity INT NOT NULL, shipregion VARCHAR(20) NULL )
I have a sample text file with 100 records. You can download it from here. The file is tab delimited. I will upload this test file into S3 bucket in order to generate and fix all possible COPY related errors the file looks like below:
Figure: Sample text file.
How these Redshift COPY command errors are produced:
Now, I will execute the following COPY command to load the sample file into redshift table.
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' ;
How to solve these Redshift COPY command errors:
- By creating the S3 bucket in the same region of the Redshift cluster.
- By specifying REGION parameter in the COPY command.
By creating the S3 bucket in the same region of the Redshift cluster.
- Go to redshift console and find out your cluster region (see Figure 1). Assume, my cluster in US-East region.
- Go to S3 console and create a bucket in the same region.
- Keep your file into the newly created bucket.
- Modify COPY command source file path accordingly and execute it.
By specifying REGION parameter in the COPY command.
If you really need to load data in to redshift table from a S3 bucket that resides in another region, you need to specify REGION parameter in your COPY command. To get the region name from file in S3 follow the steps below:
- Go to S3 and open the bucket.
- Select the file (order.txt), right click on it and choose properties.
- Have a look the “Link” text in the properties right pane like below:
- Get the region (us-west-2) from the link property and organize the COPY command like below along with “REGION” parameter.
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' REGION 'us-west-2' ;
The COPY command now executed without the “S3ServiceException“error. Still the COPY command would not get succeeded. I will try to cover the error details and the possible regulation in the subsequent post.