Scaling and normalizing a column in Pandas python

Scaling and normalizing a column in pandas python is required,  to standardize the data, before we model a data. We will be using preprocessing method from scikitlearn package. Lets see an example which normalizes the column in pandas by scaling

 

Create a single column dataframe:

import pandas as pd
import numpy as np
from sklearn import preprocessing

# Create a DataFrame
d = {
       'Score':[62,-47,-55,74,31,77,85,63,42,67,89,81,56]}

df = pd.DataFrame(d,columns=['Score'])
print df

So the resultant dataframe will be

scaling-and-normalizing-a-column-in-pandas-dataframe-python-1

On plotting the score it will be

scaling-and-normalizing-the-column-pandas-actual-score

Step 1:  convert the column of a dataframe to float


# 1.convert the column value of the dataframe as floats

float_array = df['Score'].values.astype(float)

 

Step 2:  create a min max processing object. Pass the float column to the min_max_scaler() which scales the dataframe by processing it as shown below


# 2. create a min max processing object

min_max_scaler = preprocessing.MinMaxScaler()
scaled_array = min_max_scaler.fit_transform(float_array)

 

 

Step 3:  Convert the scaled array to the dataframe.

# 3. convert the scaled array to dataframe

df_normalized = pd.DataFrame(scaled_array)
df_normalized

so the final normalized dataframe will be

scaling-and-normalizing-a-column-in-pandas-dataframe-python-2

On plotting the scaled score the graph will be

scaling-and-normalizing-the-column-pandas-scaled-score

 

previous-small Scaling and normalizing a column in Pandas dataframe python                                                                                                           next_small Scaling and normalizing a column in Pandas dataframe python

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.