Find the duplicate rows of the dataframe in python pandas

In this section we will learn how to find the duplicate rows of the dataframe in python pandas with duplicated() Function. Lets see with an example.

We will be marking the row as TRUE if it is duplicate and FALSE if it is not duplicate. Let’s try with an example.

# import pandas as pd
import numpy as np

#Create a DataFrame
d = {
    'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
            'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
    'Age':[26,24,23,22,23,24,26,24,22,23,24,24],
     
       'Score':[85,63,55,74,31,77,85,63,42,62,89,77]}

df = pd.DataFrame(d,columns=['Name','Age','Score'])
df

so the resultant dataframe will be

Find the duplicate rows of the dataframe in python pandas 1

 

Find the duplicate row in pandas:

duplicated() function is used for find the duplicate rows of the dataframe in python pandas


df["is_duplicate"]= df.duplicated()

df

The above code finds whether the row is duplicate and tags TRUE if it is duplicate and tags FALSE if it is not duplicate. And assigns it to the column named “is_duplicate”  of the dataframe  df.

So the resultant dataframe will be,

Find the duplicate rows of the dataframe in python pandas 2

 

previous-small Find the duplicate rows of the dataframe in python pandas                                                                                                           next_small Find the duplicate rows of the dataframe in python pandas

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.

    View all posts