Select column in Pyspark (Select single & Multiple columns)

In order to select column in pyspark we will be using select function. Select() function is used to select single column and multiple columns in pyspark. Select column name like in pyspark. We will explain how to select column in Pyspark with an example.

  • Select single column in pyspark
  • Select multiple column in pyspark
  • Select column name like in pyspark
  • Select column name using regular expression in pyspark

Syntax:

df.select(‘colname1’,‘colname2’)

df – dataframe
colname1..n – column name

We will use the dataframe named df_basket1.

Select column in Pyspark (Select single & Multiple columns) 1

 

Select single column in pyspark

Select() function with column name passed as argument is used to select that single column in pyspark.

df_basket1.select('Price').show()

We use select and show() function to select particular column. So in our case we select the ‘Price’ column as shown above.

Select column in Pyspark (Select single & Multiple columns) 2

 

 

Select multiple column in pyspark

Select() function with set of column names passed as argument is used to select those set of columns

df_basket1.select('Price','Item_name').show()

We use select function to select columns and use show() function along with it. So in our case we select the ‘Price’ and ‘Item_name’ columns as shown above.

Select column in Pyspark (Select single & Multiple columns) 3

 

 

Select using Regex with column name like in pyspark (select column name like):

colRegex() function with regular expression inside is used to select the column with regular expression.

## select using Regex with column name like

df_basket1.select(df_basket1.colRegex("`(Item)+?.+`")).show()

the above code selects column with column name like Item%

Select column in Pyspark (Select single & Multiple columns) 4

 

Select column in Pyspark (Select single & Multiple columns)                                                                                              Select column in Pyspark (Select single & Multiple columns)