Get the unique values (distinct rows) of a dataframe in python Pandas

In this tutorial we will learn how to get the unique values (distinct rows) of a dataframe in python pandas with drop_duplicates() function.  Lets see with an example on how to drop duplicates and get Distinct rows of the dataframe in pandas python.

  • Get distinct rows of dataframe in pandas python by dropping duplicates

Drop duplicates in pandas python 1

  • Get distinct value of the dataframe in pandas by particular column
  • Drop Duplicate in pandas on case sensitive values of dataframe

  • Keep only non duplicated values of the dataframe in pandas case sensitive

 

#### Create Dataframe:
import pandas as pd
import numpy as np

#Create a DataFrame
d = {
    'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
            'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
    'Age':[26,24,23,22,23,24,26,24,22,23,24,24]
}

df = pd.DataFrame(d,columns=['Name','Age'])
df

so the output will be

Get the unique values (rows) of a dataframe in python Pandas 1

 

Get the unique values (distinct rows) of the dataframe in python pandas

drop_duplicates() function is used to get the unique values (rows) of the dataframe in python pandas.


# get the unique values (rows)
df.drop_duplicates()

The above drop_duplicates() function removes all the duplicate rows and returns only unique rows. Generally it retains the first row when duplicate rows are present.

So the output will be

Get the unique values (rows) of a dataframe in python Pandas 2

 

Get the unique values (rows) of the dataframe in python pandas by retaining last row:


# get the unique values (rows) by retaining last row
df.drop_duplicates(keep='last')

The above drop_duplicates() function with keep =’last’ argument,  removes all the duplicate rows and returns only unique rows by retaining the last row when duplicate rows are present.

So the output will be

Get the unique values (rows) of a dataframe in python Pandas 3

 

 

Get the unique values (rows) of the dataframe in python pandas by retaining first row:


# get the unique values (rows) by retaining first row
df.drop_duplicates(keep='first')

The above drop_duplicates() function with keep =’first’ argument,  removes all the duplicate rows and returns only unique rows by retaining the first row when duplicate rows are present.

So the output will be

 

drop duplicate in pandas with first keyword

 

Remove all Duplicate rows and keep only non repeating rows :

In order to keep only non repeating rows, i.e. retaining only non duplicate rows we can use  drop_duplicate() function with keep=False as shown below


# Remove all duplicate rows 
## Get all non repeating rows in pandas. - non duplicate rows
df.drop_duplicates(keep=False)

The result will have only non repeating rows of the dataframe (non duplicate rows) in pandas

Keep non repeating rows of the dataframe (non duplicate rows) in pandas

 

Get Distinct values of the dataframe based on a column:

In this we will subset a column and extract distinct values of the dataframe based on that column.


# get distinct values of the dataframe based on column
df = df.drop_duplicates(subset = ["Age"])
df

So the resultant dataframe will have distinct values based on “Age” column

Get the unique values (distinct rows) of a dataframe in python Pandas 4

 

 

 

Drop Duplicate in pandas on case sensitive values of dataframe :

Lets create the sample dataframe

#### Create Dataframe:
import pandas as pd
import numpy as np

#Create a DataFrame
d = {
'Name':['Alisa','Bobby','kumar','raghu','CATHRINE',
'ALISA','BOBBY','KUMAR','Alex','Cathrine'],
'Age':[24,26,43,22,23,24,26,43,22,23]
}

df = pd.DataFrame(d,columns=['Name','Age'])
df

dataframe will be

case sensitive drop duplicates in pandas python 1

 

First, convert all the string values to lowercase or upper case to make them case insensitive and then  deduplicate the dataframe using drop_duplicate in pandas

### Drop duplicates in pandas case insensitive 
df2 = df.apply(lambda x: x.astype(str).str.lower()).drop_duplicates(subset=['Name'], keep='first')
print(df2)

we used apply function and first converted the string to lower case and then apply drop_duplicates() function and
removed duplicates by retaining the first row when duplicates are present,

So the resultant dataframe will have distinct rows.

case sensitive drop duplicates in pandas python 2

 

 

Keep only non duplicated values of dataframe in pandas case sensitive

First convert the values in either uppercase or lower case and then keep only non duplicated values using drop_duplicate() with KEEP=False in pandas

 
### keep only non duplicated values in pandas case insensitive 

df2 = df.apply(lambda x: x.astype(str).str.lower()).drop_duplicates(subset=['Name'], keep=False)
print(df2)

So the resultant dataframe will have only non repeating rows.

case sensitive drop duplicates in pandas python 3

 

previous-small Get the unique values (rows) of a dataframe in python Pandas                                                                                                            next_small Get the unique values (rows) of a dataframe in python Pandas

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.