Get the unique values (distinct rows) of a dataframe in python Pandas

In this tutorial we will learn how to get the unique values ( distinct rows) of a dataframe in python pandas with drop_duplicates() function.  Lets see with an example on how to drop duplicates and get Distinct rows of the dataframe in pandas python.

  • Get distinct rows of dataframe in pandas python by dropping duplicates
  • Get distinct value of the dataframe in pandas by particular column
#### Create Dataframe:
import pandas as pd
import numpy as np

#Create a DataFrame
d = {
    'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
            'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
    'Age':[26,24,23,22,23,24,26,24,22,23,24,24]
}

df = pd.DataFrame(d,columns=['Name','Age'])
df

so the output will be

Get the unique values (rows) of a dataframe in python Pandas 1

 

Get the unique values (distinct rows) of the dataframe in python pandas

drop_duplicates() function is used to get the unique values (rows) of the dataframe in python pandas.


# get the unique values (rows)
df.drop_duplicates()

The above drop_duplicates() function removes all the duplicate rows and returns only unique rows. Generally it retains the first row when duplicate rows are present.

So the output will be

Get the unique values (rows) of a dataframe in python Pandas 2

 

Get the unique values (rows) of the dataframe in python pandas by retaining last row:


# get the unique values (rows) by retaining last row
df.drop_duplicates(keep='last')

The above drop_duplicates() function with keep =’last’ argument,  removes all the duplicate rows and returns only unique rows by retaining the last row when duplicate rows are present.

So the output will be

Get the unique values (rows) of a dataframe in python Pandas 3

 

Get Distinct values of the dataframe based on a column:

In this we will subset a column and extract distinct values of the dataframe based on that column.


# get distinct values of the dataframe based on column
df = df.drop_duplicates(subset = ["Age"])
df

So the resultant dataframe will have distinct values based on “Age” column

Get the unique values (distinct rows) of a dataframe in python Pandas 4

 

previous-small Get the unique values (rows) of a dataframe in python Pandas                                                                                                            next_small Get the unique values (rows) of a dataframe in python Pandas