In this tutorial we will learn how to delete or drop the duplicate row of a dataframe in python pandas with example using drop_duplicates() function. lets learn how to
- Drop the duplicate rows
- Drop the duplicate by a column name
Create dataframe:
import pandas as pd import numpy as np #Create a DataFrame d = { 'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine', 'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'], 'Age':[26,24,23,22,23,24,26,24,22,23,24,24], 'Score':[85,63,55,74,31,77,85,63,42,62,89,77]} df = pd.DataFrame(d,columns=['Name','Age','Score']) df
so the resultant dataframe will be
Drop the duplicate rows:
Now lets simply drop the duplicate rows in pandas as shown below
# drop duplicate rows df.drop_duplicates()
In the above example first occurrence of the duplicate row is kept and subsequent occurrence will be deleted, so the output will be
Drop the duplicate by retaining last occurrence:
# drop duplicate rows df.drop_duplicates(keep='last')
In the above example keep=’last’ argument . Keeps the last duplicate row and delete the rest duplicated rows. So the output will be
Drop the duplicate by column:
Now let’s drop the rows by column name. Rows are dropped in such a way that unique column value is retained for that column as shown below
# drop duplicate by a column name df.drop_duplicates(['Name'], keep='last')
In the above example rows are deleted in such a way that, Name column contains only unique values
So the result will be