Windows Function in R using Dplyr

Like SQL, dplyr uses windows function in R that are used to subset data within a group. It returns a vector of values. We could use min_rank() function that calculates rank in the preceding example

We will be using iris data to depict the example of group_by() function

library(dplyr)
mydata2 <-iris # windows function in R using Dplyr mydata2 %>% select(Species,Sepal.Length) %>%
    group_by(Species) %>%
    filter(min_rank(desc(Sepal.Length))<=5)

Let’s say you have a question like, “What are the top 5 sepal lengths based on the species?” To answer this question, we can simply use one of the rank functions called ‘min_rank()’ from dplyr and call it directly inside the ‘filter()’ function

So the output will be

By the way, I’m using ‘min_rank()’ function here but there is another rank function called ‘dense_rank()’ from dplyr. Both functions return the ranking number based on a given measure column (e.g. Sepal.Length), and only the difference is when there are ties like below.

As you can see when two same rank (tie) occurs min_rank() function skips one rank and assigns next to next rank wheres the dense_rank() Function assigns subsequent ranks without skipping any rank as shown above

 

Types of window functions:

Ranking and ordering functions: row_number(), min_rank (RANK in SQL), dense_rank(), cume_dist(), percent_rank(), and ntile(). These functions all take a vector to order by, and return various types of ranks.

 

Windows Function in R using Dplyr                                                                                                          Windows Function in R using Dplyr

Author

  • Sridhar Venkatachalam

    With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.