Windows Function in R using Dplyr

Like SQL, dplyr uses windows function in R that are used to subset data within a group. It returns a vector of values. We could use min_rank() function that calculates rank in the preceding example

We will be using iris data to depict the example of group_by() function

library(dplyr)
mydata2 <-iris # windows function in R using Dplyr mydata2 %>% select(Species,Sepal.Length) %>%
    group_by(Species) %>%
    filter(min_rank(desc(Sepal.Length))<=5)

Let’s say you have a question like, “What are the top 5 sepal lengths based on the species?” To answer this question, we can simply use one of the rank functions called ‘min_rank()’ from dplyr and call it directly inside the ‘filter()’ function

So the output will be

By the way, I’m using ‘min_rank()’ function here but there is another rank function called ‘dense_rank()’ from dplyr. Both functions return the ranking number based on a given measure column (e.g. Sepal.Length), and only the difference is when there are ties like below.

As you can see when two same rank (tie) occurs min_rank() function skips one rank and assigns next to next rank wheres the dense_rank() Function assigns subsequent ranks without skipping any rank as shown above

Types of window functions:

Ranking and ordering functions: row_number(), min_rank (RANK in SQL), dense_rank(), cume_dist(), percent_rank(), and ntile(). These functions all take a vector to order by, and return various types of ranks.

Author

Sridhar Venkatachalam

With close to 10 years on Experience in data science and machine learning Have extensively worked on programming languages like R, Python (Pandas), SAS, Pyspark.
View all posts

Types of window functions:

Author

Related Posts:

.