Delete Duplicate Rows in PostgreSQL

In this Section we will see how to delete duplicate rows in PostgreSQL. deleting duplicate rows in PostgreSQL table can be accomplished by using multiple methods like distinct keyword and Partition by which will keep only distinct rows of the table, each of these methods are explained with an example.

  • Keep only distinct rows in PostgreSQL table
  • Delete unwanted duplicate rows in PostgreSQL table
  • Delete rows with duplicate ids and keep only rows without duplicate ids

The table which we use for depiction is

 

examscore:

Delete Duplicate Rows in PostgreSQL 1

 

Delete Duplicate rows in PostgreSQL: Method 1 (Select Distinct /Unique Rows in PostgreSQL)

It is Most efficient and fastest way to select data without unnecessary duplicates

 

select distinct * from ExamScore

Output:

Delete Duplicate Rows in PostgreSQL 2

 

Method 2: delete unwanted duplicates and return only unique rows

 

select distinct * from examscore where studentid in  (

select studentid from (

select studentid,

ROW_NUMBER() OVER(PARTITION BY studentid ORDER BY studentid asc) AS Row

FROM examscore

) as foo

where

foo.Row = 1);
  • PARTITION BY divides into groups and ORDER BY sorts them by descending order.
  • ROW_NUMBER() assigns an integer number to every row in each category.
  • Duplicate rows with Row numbers > 1 will be removed in the where condition which will result in the table with only unique rows in PostgreSQL

Output:

Delete Duplicate Rows in PostgreSQL 3

 

Method 3: delete the ids which are duplicates

To Delete ids which are duplicates i.e. Delete rows with duplicate ids and keep only rows without duplicate ids

 

SELECT *

FROM examscore

WHERE (studentid) IN (

SELECT studentid

FROM examscore

GROUP BY name, studentid

HAVING COUNT(*) = 1

);

In studentid column we need to remove the duplicate ids and we need to keep only non duplicate ids and then we need to keep rows of those non duplicate ids which can be achieved with above code,

Studentid 101, and 103 are removed as they occur more than once

Delete Duplicate Rows in PostgreSQL 4

So the resultant table will have all non duplicate ids

 Output:

Delete Duplicate Rows in PostgreSQL 5

 

 

 

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts