In this Section we will see how to delete duplicate rows in PostgreSQL. deleting duplicate rows in PostgreSQL table can be accomplished by using multiple methods like distinct keyword and Partition by which will keep only distinct rows of the table, each of these methods are explained with an example.
- Keep only distinct rows in PostgreSQL table
- Delete unwanted duplicate rows in PostgreSQL table
- Delete rows with duplicate ids and keep only rows without duplicate ids
The table which we use for depiction is
examscore:
Delete Duplicate rows in PostgreSQL: Method 1 (Select Distinct /Unique Rows in PostgreSQL)
It is Most efficient and fastest way to select data without unnecessary duplicates
select distinct * from ExamScore
Output:
Method 2: delete unwanted duplicates and return only unique rows
select distinct * from examscore where studentid in ( select studentid from ( select studentid, ROW_NUMBER() OVER(PARTITION BY studentid ORDER BY studentid asc) AS Row FROM examscore ) as foo where foo.Row = 1);
- PARTITION BY divides into groups and ORDER BY sorts them by descending order.
- ROW_NUMBER() assigns an integer number to every row in each category.
- Duplicate rows with Row numbers > 1 will be removed in the where condition which will result in the table with only unique rows in PostgreSQL
Output:
Method 3: delete the ids which are duplicates
To Delete ids which are duplicates i.e. Delete rows with duplicate ids and keep only rows without duplicate ids
SELECT * FROM examscore WHERE (studentid) IN ( SELECT studentid FROM examscore GROUP BY name, studentid HAVING COUNT(*) = 1 );
In studentid column we need to remove the duplicate ids and we need to keep only non duplicate ids and then we need to keep rows of those non duplicate ids which can be achieved with above code,
Studentid 101, and 103 are removed as they occur more than once
So the resultant table will have all non duplicate ids
Output: