Redshift COPY Command errors and how to solve them Part-2


In my previous post I have tried to demonstrate what type of error may happen and how to solve it when we want to COPY data from a S3 region that differs from the region of the Redshift cluster. In this post I will cover more couple of COPY command exception and some possible solutions. In order to get an idea about the sample source file and Redshift target table structure, please have look on the “Preparing the environment to generate the error” section of my previous blog post.

Redshift copy command errors description:

ERROR: Load into table ‘xxxx’ failed.  Check ‘stl_load_errors’ system table for details.
Invalid digit, Value ‘O’, Pos 0, Type: Integer

How Redshift copy command errors are produced?

Now, once again, to load data into orders table execute the following COPY command (assuming S3 bucket and Redshift cluster reside in same region).

COPY orders FROM 's3://sourcedatainorig/order.txtcredentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' ;

This time we get another exception and the error message is:

ERROR: Load into table ‘xxxx’ failed.  Check ‘stl_load_errors’ system table for details.

This error is a generic error, to get the details we need to query on the “stl_load_error” system table. You can query directly to this table but the query below is bit helpful. This query picks up the most recent COPY error message  which is generated when data loading operation is performed on ‘orders’ table.

  top 1 starttime,
  le.type AS column_datatype,
  le.raw_line AS error_record_line_data,
  le.raw_field_value AS attempt_to_insert_value ,
FROM stl_load_errors le
  (SELECT TOP 1 id
   WHERE name='orders') tp
ORDER BY le.starttime DESC

Here is the output of the query:
Copy Error
Figure: Copy error details

Reason of Redshift copy command errors:

First of all, have look on the “err_reason” column. It displays “Invalid digit, Value ‘O’, Pos 0, Type: Integer”. This error message is also not that much helpful to understand what actually went wrong. Now let us have a look on the other column values of the query output.

Have a look on the “line_number” column. It indicates, the error occurs when COPY command attempts to insert the first (value=1) line. In other words, value of “line_number” column indicates which lines gets failed during the execution of COPY command. As “line_number” column value=1 so COPY command failed to insert the first line of the source file (s3://sourcedatainorig/order.txt).

Now we need to see in the source data file what value contains in the first line. To get this row data, you do not need to open the source text file. You can get this row data from “error_record_line_data” column of the query output above. “error_record_line_data” column value is as follows:

“CustomerName   OrderID   OrderDate   City   UnitPrice   Quantity   ShipRegion”

After having a look on the line #1 row of source file, we came to know that it is the file header row. COPY command failed to insert this header rows value to the pertinent columns of the orders table.

Let us do bit more analysis to find out which specific column of redshift orders table refused to store value during data load. Have look on the “colname”, “column_datatype” and “attempt_to_insert_value” columns of query output once again. Form these columns we can see colname value=orderid, column_datatype=”int4” and “attempt_to_insert_value”=”OrderID”. It is now clear that COPY command attempts to insert character type value “OrderID” into an integer typed orderid column. As a result, COPY command generates “Invalid digit, Value ‘O’, Pos 0, Type: Integer “ error and terminates the entire data loading operation.

So we came to know that “header” row is responsible for this data loading failure. But this not the only reason for this type of error. Unlike header row, if any column value of source data file contains any value which data type does not compatible (or not auto convertible) to the target redshift table destination column, this error may occur.

How to solve it:

Ignoring first row (header row) of source file during COPY command.

If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file. So the COPY command will look like below:

COPY orders FROM 's3://sourcedatainorig/order.txtcredentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter 't' IGNOREHEADER 1;

When you use IGNOREHEADER 1 it will ignore top 1 row during COPY command. Instead of value 1, any integer value can be used to eliminate top (N) number of rows.

Loading data using proper transformation logic that will ignore the unexpected error prone values.

You can use different popular third party ETL tool to eliminate garbage values from your source data file.

Loading data into a temp staging table that all columns are in VARCHAR type.

If your source data file contains too much unexpected error prone or garbage values and you do not want to use any third party ETL tool, you can try to create a temporary staging table with all VARCHAR columns. In that case, you may not get such error message during data load. After successful load, use custom SQL statement to clean and transfer the data into the strongly typed main table.

I hope the information above will be helpful.