Get data type of column in Pyspark (single & Multiple columns)

In order to Get data type of column in pyspark we will be using dtypes function and printSchema() function. dtypes function is used to get the datatype of the single column and multiple columns of the dataframe. We will explain how to get data type of single and multiple columns in Pyspark with an example.

  • Get data type of single column in pyspark  using printSchema() function
  • Get data type of single column in pyspark using dtypes
  • Get data type of multiple column in pyspark using printSchema() and dtypes
  • Get data type of all the column in pyspark

We will use the dataframe named df_basket1.

Get data type of column in Pyspark (single & Multiple columns) 1

 

Get data type of single column in pyspark using printSchema() – Method 1:

dataframe.select(‘columnname’).printschema() is used to select data type of single column

df_basket1.select('Price').printSchema()

We use select function to select a column and use printSchema() function to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.

Get data type of column in Pyspark (single & Multiple columns) 2

 

Get data type of single column in pyspark using dtypes – Method 2:

dataframe.select(‘columnname’).dtypes is syntax used to select data type of single column

df_basket1.select('Price').dtypes

We use select function to select a column and use dtypes to get data type of that particular column. So in our case we get the data type of ‘Price’ column as shown above.

Get datatype of the column in pyspark d1

 


Get data type of multiple column in pyspark : Method 1

dataframe.select(‘columnname1′,’columnname2’).printSchema() is used to select data type of multiple columns

df_basket1.select('Price','Item_name').printSchema()

We use select function to select multiple columns and use printSchema() function to get data type of these columns. So in our case we get the data type of ‘Price’ and ‘Item_name’ column as shown above

Get data type of column in Pyspark (single & Multiple columns) 3

Get data type of multiple column in pyspark using dtypes : Method 2

dataframe.select(‘columnname1′,’columnname2’).dtypes is used to select data type of multiple columns

df_basket1.select('Price','Item_name').dtypes

We use select function to select multiple columns and use dtypes function to get data type of these columns. So in our case we get the data type of ‘Price’ and ‘Item_name’ column as shown above

Get datatype of the column in pyspark d2

 


Get data type of all the columns in pyspark:
Method 1: using printSchema()

dataframe.printSchema() is used to get the data type of each column in pyspark.

df_basket1.printSchema()

printSchema() function gets the data type of each column as shown below

Get data type of column in Pyspark (single & Multiple columns) 4

 

Method 2:  using dtypes
dataframe.dtypes is used to get the data type of each column in pyspark

df_basket1.dtypes

dtypes function gets the data type of each column as shown below

Get data type of column in Pyspark (single & Multiple columns) 5

 


Other Related Topics:

                                                                                                    Get data type of column in Pyspark (single & Multiple columns)

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.