Interesting aggregate query result using WITH clause in Redshift

UPDATE: This issue has been fixed in Redshift now.

Amazon Redshift does not support table variable. For a quick testing, we sometimes do not want to store data in a real table rather we usually declare a table variable and complete our testing. This approach also saves our time from cleaning up the unused test tables and records. Redshift supports Common Table Expression and we can use it as temp table (still have limitation). Using WITH block (CTE) multiple logical tables can be generated combining arbitrary select statements for each logical expression. For more information about this visit WITH Clause  in Redshift Developer Guide. In my recent project, while doing a quick test by generating multiple logical tables with some records, I noticed Interesting aggregate query result using WITH clause in Redshift. Let me show you in details.

Generating tables to see Interesting aggregate query result using WITH clause in Redshift:

I have written a script like below:

WITH t1 AS
(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
),
t2 AS

(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
)

I have generated table expression “t1” and “t2” just to do an aggregate query to take the SUM of all “col1” values. So my final query is as follows:

WITH t1 AS
(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
),
t2 AS

(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
)

SELECT SUM(sumofcol1) AS Total
FROM(
  SELECT SUM(col1) AS sumofcol1
  FROM   t1
  UNION ALL
  SELECT SUM(col1)
  FROM   t2
) t

After executing the above query surprisingly I got result “10” but the result is supposed to be 120. See the below image:

Interesting Result
Figure 01: Incorrect query result.

To be confirm about the result, I have executed the query in SQL Server environment and got the expected output as 120. Have a look:

Result In SSMS
Figure 02: Correct Query result in SQL Server Management Studio.

So really an interesting aggregate query result using WITH clause in Redshift! I don’t know why it happens, I will be grateful if anyone let me know the reason of such interesting query behavior.

How to solve it.

In order to resolve the issue, I have tried executing the query by removing the “SUM” from the last inner most sub query and it gave the correct result! Have a look on the corrected one below:

WITH t1 AS
(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
),
t2 AS

(
 SELECT 10 AS col1
 UNION ALL
 SELECT 20 AS col1
 UNION ALL
 SELECT 30 AS col1
)

SELECT SUM(sumofcol1) AS Total
FROM(
  SELECT col1 AS sumofcol1
  FROM   t1
  UNION ALL
  SELECT col1
  FROM   t2
) t

For curiosity, I wanted to see how the query behaves when temp tables (# tables) are used instead of WITH Clause in Redshift. For this I have re-written the queries as follows:

CREATE TABLE #t (col1 INT);
CREATE TABLE #t1(col1 INT);

INSERT INTO #t
SELECT 10 AS col1
UNION ALL
SELECT 20 AS col1 
UNION ALL
SELECT 30 AS col1;
 
INSERT INTO #t1
SELECT 10 AS col1
UNION ALL
SELECT 20 AS col1 
UNION ALL
SELECT 30 AS col1;
 
SELECT SUM(sumofcol1) AS Total
FROM   (
           SELECT SUM(col1) AS sumofcol1
           FROM   #t
           UNION ALL
           SELECT SUM(col1)
           FROM   #t1
       ) t

DROP TABLE #t;
DROP TABLE #t1;

After executing the query I have found the expected result which is “120”.

Finally, Redshift generates interesting aggregate query result using WITH clause with logical, non persisted rows. Bit it produces correct result using the rows reside in actual table on the disk.




This entry was posted in Amazon Redshift, Redshift Database Development by Khorshed. Bookmark the permalink.

About Khorshed

Khorshed has 12 years of cumulative professional experience in areas of database administration, development, Business Intelligent, system appreciation, requirement analysis and software development (using .net technologies). Achieved MCITP in SQL Server in both SQL Server development and administration. He has vast knowledge in SQL Server, SSIS, SSRS, SSAS. He is also proficient in Amazon cloud services like Redshift, Data Pipeline, SNS, DynamoDB, EMR and many more.

6 thoughts on “Interesting aggregate query result using WITH clause in Redshift

Leave a Reply

Your email address will not be published. Required fields are marked *