Is there an easier way to ensure that a data frame's rows are ordered according to a "target" vector as the one I implemented in the short example below?
df <- data.frame(name = letters[1:4], value = c(rep(TRUE, 2), rep(FALSE, 2)))
df
# name value
# 1 a TRUE
# 2 b TRUE
# 3 c FALSE
# 4 d FALSE
target <- c("b", "c", "a", "d")
This somehow seems to be a bit too "complicated" to get the job done:
idx <- sapply(target, function(x) {
which(df$name == x)
})
df <- df[idx,]
rownames(df) <- NULL
df
# name value
# 1 b TRUE
# 2 c FALSE
# 3 a TRUE
# 4 d FALSE
We can adjust the factor levels based on target
and use it in arrange
library(dplyr)
df %>% arrange(factor(name, levels = target))
# name value
#1 b TRUE
#2 c FALSE
#3 a TRUE
#4 d FALSE
Or order
it and use it in slice
df %>% slice(order(factor(name, levels = target)))
If you don't want to use any libraries and you have reoccurrences in your data, you can use which
with sapply
as well.
new_order <- sapply(target, function(x,df){which(df$name == x)}, df=df)
df <- df[new_order,]
This method is a bit different, it provided me with a bit more flexibility than the previous answer.
By making it into an ordered factor, you can use it nicely in arrange
and such. I used reorder.factor from the gdata
package.
df <- data.frame(name=letters[1:4], value=c(rep(TRUE, 2), rep(FALSE, 2)))
target <- c("b", "c", "a", "d")
require(gdata)
df$name <- reorder.factor(df$name, new.order=target)
Next, use the fact that it is now ordered:
require(dplyr)
df %>%
arrange(name)
name value
1 b TRUE
2 c FALSE
3 a TRUE
4 d FALSE
If you want to go back to the original (alphabetic) ordering, just use as.character()
to get it back to the original state.
I prefer to use ***_join
in dplyr
whenever I need to match data. One possible try for this
left_join(data.frame(name=target),df,by="name")
Note that the input for ***_join
require tbls or data.frame
Here's a similar system for the situation where you have a variable you want to sort by, initially, but then you want to sort by a secondary variable according to the order that this secondary variable first appears in the initial sort.
In the function below, the initial sort variable is called order_by
and the secondary variable is called order_along
- as in "order by this variable along its initial order".
library(dplyr, warn.conflicts = FALSE)
df <- structure(
list(
msoa11hclnm = c(
"Bewbush", "Tilgate", "Felpham",
"Selsey", "Brunswick", "Ratton", "Ore", "Polegate", "Mile Oak",
"Upperton", "Arundel", "Kemptown"
),
lad20nm = c(
"Crawley", "Crawley",
"Arun", "Chichester", "Brighton and Hove", "Eastbourne", "Hastings",
"Wealden", "Brighton and Hove", "Eastbourne", "Arun", "Brighton and Hove"
),
shape_area = c(
1328821, 3089180, 3540014, 9738033, 448888, 10152663, 5517102,
7036428, 5656430, 2653589, 72832514, 826151
)
),
row.names = c(NA, -12L), class = "data.frame"
)
this does not give me what I need:
df %>%
dplyr::arrange(shape_area, lad20nm)
#> msoa11hclnm lad20nm shape_area
#> 1 Brunswick Brighton and Hove 448888
#> 2 Kemptown Brighton and Hove 826151
#> 3 Bewbush Crawley 1328821
#> 4 Upperton Eastbourne 2653589
#> 5 Tilgate Crawley 3089180
#> 6 Felpham Arun 3540014
#> 7 Ore Hastings 5517102
#> 8 Mile Oak Brighton and Hove 5656430
#> 9 Polegate Wealden 7036428
#> 10 Selsey Chichester 9738033
#> 11 Ratton Eastbourne 10152663
#> 12 Arundel Arun 72832514
Here’s a function:
order_along <- function(df, order_along, order_by) {
cols <- colnames(df)
df <- df %>%
dplyr::arrange({{ order_by }})
df %>%
dplyr::select({{ order_along }}) %>%
dplyr::distinct() %>%
dplyr::full_join(df) %>%
dplyr::select(dplyr::all_of(cols))
}
order_along(df, lad20nm, shape_area)
#> Joining, by = "lad20nm"
#> msoa11hclnm lad20nm shape_area
#> 1 Brunswick Brighton and Hove 448888
#> 2 Kemptown Brighton and Hove 826151
#> 3 Mile Oak Brighton and Hove 5656430
#> 4 Bewbush Crawley 1328821
#> 5 Tilgate Crawley 3089180
#> 6 Upperton Eastbourne 2653589
#> 7 Ratton Eastbourne 10152663
#> 8 Felpham Arun 3540014
#> 9 Arundel Arun 72832514
#> 10 Ore Hastings 5517102
#> 11 Polegate Wealden 7036428
#> 12 Selsey Chichester 9738033
Created on 2021-01-12 by the reprex package (v0.3.0)
Source: Stackoverflow.com