UPDATE: This issue has been fixed in Redshift now.
Amazon Redshift does not support table variable. For a quick testing, we sometimes do not want to store data in a real table rather we usually declare a table variable and complete our testing. This approach also saves our time from cleaning up the unused test tables and records. Redshift supports Common Table Expression and we can use it as temp table (still have limitation). Using WITH block (CTE) multiple logical tables can be generated combining arbitrary select statements for each logical expression. For more information about this visit WITH Clause in Redshift Developer Guide. In my recent project, while doing a quick test by generating multiple logical tables with some records, I noticed Interesting aggregate query result using WITH clause in Redshift. Let me show you in details.
Generating tables to see Interesting aggregate query result using WITH clause in Redshift:
I have written a script like below:
WITH t1 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 ), t2 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 )
I have generated table expression “t1” and “t2” just to do an aggregate query to take the SUM of all “col1” values. So my final query is as follows:
WITH t1 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 ), t2 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 ) SELECT SUM(sumofcol1) AS Total FROM( SELECT SUM(col1) AS sumofcol1 FROM t1 UNION ALL SELECT SUM(col1) FROM t2 ) t
After executing the above query surprisingly I got result “10” but the result is supposed to be 120. See the below image:
Figure 01: Incorrect query result.
To be confirm about the result, I have executed the query in SQL Server environment and got the expected output as 120. Have a look:
Figure 02: Correct Query result in SQL Server Management Studio.
So really an interesting aggregate query result using WITH clause in Redshift! I don’t know why it happens, I will be grateful if anyone let me know the reason of such interesting query behavior.
How to solve it.
In order to resolve the issue, I have tried executing the query by removing the “SUM” from the last inner most sub query and it gave the correct result! Have a look on the corrected one below:
WITH t1 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 ), t2 AS ( SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1 ) SELECT SUM(sumofcol1) AS Total FROM( SELECT col1 AS sumofcol1 FROM t1 UNION ALL SELECT col1 FROM t2 ) t
For curiosity, I wanted to see how the query behaves when temp tables (# tables) are used instead of WITH Clause in Redshift. For this I have re-written the queries as follows:
CREATE TABLE #t (col1 INT); CREATE TABLE #t1(col1 INT); INSERT INTO #t SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1; INSERT INTO #t1 SELECT 10 AS col1 UNION ALL SELECT 20 AS col1 UNION ALL SELECT 30 AS col1; SELECT SUM(sumofcol1) AS Total FROM ( SELECT SUM(col1) AS sumofcol1 FROM #t UNION ALL SELECT SUM(col1) FROM #t1 ) t DROP TABLE #t; DROP TABLE #t1;
After executing the query I have found the expected result which is “120”.
Finally, Redshift generates interesting aggregate query result using WITH clause with logical, non persisted rows. Bit it produces correct result using the rows reside in actual table on the disk.
Very very helpful for new developer in Redshift & interesting findings too.
Thanks you!
Similar kind of problem has been describe in this linkedin group.
https://www.linkedin.com/grp/post/4884099-5954352632352038915
Sounds like this has been fixed. Running the first query returns 120 on my RedShift cluster.
Thank you so much. Yes it has been fixed by this time. I put a comment in my post as well.
I think, it is due to the function of “UNION ALL”. It does not work as the same as in other databases.