Like SQL, dplyr uses windows function in R that are used to subset data within a group. It returns a vector of values. We could use min_rank() function that calculates rank in the preceding example
We will be using iris data to depict the example of group_by() function
library(dplyr) mydata2 <-iris # windows function in R using Dplyr mydata2 %>% select(Species,Sepal.Length) %>% group_by(Species) %>% filter(min_rank(desc(Sepal.Length))<=5)
Let’s say you have a question like, “What are the top 5 sepal lengths based on the species?” To answer this question, we can simply use one of the rank functions called ‘min_rank()’ from dplyr and call it directly inside the ‘filter()’ function
So the output will be
By the way, I’m using ‘min_rank()’ function here but there is another rank function called ‘dense_rank()’ from dplyr. Both functions return the ranking number based on a given measure column (e.g. Sepal.Length), and only the difference is when there are ties like below.
As you can see when two same rank (tie) occurs min_rank() function skips one rank and assigns next to next rank wheres the dense_rank() Function assigns subsequent ranks without skipping any rank as shown above
Types of window functions:
Ranking and ordering functions: row_number(), min_rank (RANK in SQL), dense_rank(), cume_dist(), percent_rank(), and ntile(). These functions all take a vector to order by, and return various types of ranks.