Amazon Redshift suggests to define primary key or foreign key constraint wherever applicable. But they are information only. Redshift does not enforce these constraints. In other words, a column with primary key accepts duplicate values as well as a foreign key column also allows such a value that does not exists in the referenced table. So why does Redshift consider defining primary or foreign key as their best practice list? Because query optimizer uses them to choose the most suitable execution plan. But you need to be very careful while defining these constraint in your table. Let us see why with some real-life example.
Need to think before defining primary key:
As spoke earlier, it is fine to have a primary key in a Redshift table. But keep in mind it is your responsibility to make sure the column’s values are unique. Otherwise, you are likely to get wrong query result. An example gives you more insight.
I have created a tabled named employee using the script below:
CREATE TABLE employee ( emp_id INT NOT NULL PRIMARY KEY, emp_name VARCHAR(100) NOT NULL, dept_id INT );
Then executed the following script to insert some records in to it.
INSERT INTO employee VALUES(1, 'Foo',1); INSERT INTO employee VALUES(2, 'Bar',1); INSERT INTO employee VALUES(2, 'Bar',1)
You might have noticed, the emp_id is declared as primary key and in the insert scripts, ‘Bar’ is inserted twice having emp_id 2. In Redshift this is not a problem as Primary Key constraint is not enforced. Assume we have another table called “department” with following rows:
Figure-1: Department table.
Now, let’s execute the following query to see, employee name and department of employee Id = 2. The query looks like below:
SELECT DISTINCT e.emp_id, d.dept_name FROM employee e INNER JOIN dbo.department d ON e.dept_id = d.dept_id WHERE e.emp_id = 2;
See, it produces wrong result! The “DISTINCT” keyword is not honored at all! The reason behind that is Redshift assumes all values of a primary key column are inserted by ensuring the uniqueness. So database engine does not care to make them distinct during query execution. Let’s remove the primary key constraint from the table:
ALTER TABLE employee DROP CONSTRAINT employee_pkey;
So, you got an idea of what you need to think before defining primary key. And of course, always remember Redhsift developer guide’s following comments.
“Do not define primary key and foreign key constraints unless your application enforces the constraints. Amazon Redshift does not enforce unique, primary-key, and foreign-key constraints.”
I my next post I will figure out the same thing for foreign key. Until then keep playing!!!