Someone should have asked this already, but I couldn't find an answer. Say I have:
x = data.frame(q=1,w=2,e=3, ...and many many columns...)
what is the most elegant way to rename an arbitrary subset of columns, whose position I don't necessarily know, into some other arbitrary names?
e.g. Say I want to rename "q"
and "e"
into "A"
and "B"
, what is the most elegant code to do this?
Obviously, I can do a loop:
oldnames = c("q","e")
newnames = c("A","B")
for(i in 1:2) names(x)[names(x) == oldnames[i]] = newnames[i]
But I wonder if there is a better way? Maybe using some of the packages? (plyr::rename
etc.)
So I recently ran into this myself, if you're not sure if the columns exist and only want to rename those that do:
existing <- match(oldNames,names(x))
names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]
names(x)[names(x) %in% c("q","e")]<-c("A","B")
There are a few answers mentioning the functions dplyr::rename_with
and rlang::set_names
already. By they are separate. this answer illustrates the differences between the two and the use of functions and formulas to rename columns.
rename_with
from the dplyr
package can use either a function or a formula
to rename a selection of columns given as the .cols
argument. For example passing the function name toupper
:
library(dplyr)
rename_with(head(iris), toupper, starts_with("Petal"))
Is equivalent to passing the formula ~ toupper(.x)
:
rename_with(head(iris), ~ toupper(.x), starts_with("Petal"))
When renaming all columns, you can also use set_names
from the rlang package. To make a different example, let's use paste0
as a renaming function. pasteO
takes 2 arguments, as a result there are different ways to pass the second argument depending on whether we use a function or a formula.
rlang::set_names(head(iris), paste0, "_hi")
rlang::set_names(head(iris), ~ paste0(.x, "_hi"))
The same can be achieved with rename_with
by passing the data frame as first
argument .data
, the function as second argument .fn
, all columns as third
argument .cols=everything()
and the function parameters as the fourth
argument ...
. Alternatively you can place the second, third and fourth
arguments in a formula given as the second argument.
rename_with(head(iris), paste0, everything(), "_hi")
rename_with(head(iris), ~ paste0(.x, "_hi"))
rename_with
only works with data frames. set_names
is more generic and can
also perform vector renaming
rlang::set_names(1:4, c("a", "b", "c", "d"))
Here is the most efficient way I have found to rename multiple columns using a combination of purrr::set_names()
and a few stringr
operations.
library(tidyverse)
# Make a tibble with bad names
data <- tibble(
`Bad NameS 1` = letters[1:10],
`bAd NameS 2` = rnorm(10)
)
data
# A tibble: 10 x 2
`Bad NameS 1` `bAd NameS 2`
<chr> <dbl>
1 a -0.840
2 b -1.56
3 c -0.625
4 d 0.506
5 e -1.52
6 f -0.212
7 g -1.50
8 h -1.53
9 i 0.420
10 j 0.957
# Use purrr::set_names() with annonymous function of stringr operations
data %>%
set_names(~ str_to_lower(.) %>%
str_replace_all(" ", "_") %>%
str_replace_all("bad", "good"))
# A tibble: 10 x 2
good_names_1 good_names_2
<chr> <dbl>
1 a -0.840
2 b -1.56
3 c -0.625
4 d 0.506
5 e -1.52
6 f -0.212
7 g -1.50
8 h -1.53
9 i 0.420
10 j 0.957
Lot's of sort-of-answers, so I just wrote the function so you can copy/paste.
rename <- function(x, old_names, new_names) {
stopifnot(length(old_names) == length(new_names))
# pull out the names that are actually in x
old_nms <- old_names[old_names %in% names(x)]
new_nms <- new_names[old_names %in% names(x)]
# call out the column names that don't exist
not_nms <- setdiff(old_names, old_nms)
if(length(not_nms) > 0) {
msg <- paste(paste(not_nms, collapse = ", "),
"are not columns in the dataframe, so won't be renamed.")
warning(msg)
}
# rename
names(x)[names(x) %in% old_nms] <- new_nms
x
}
x = data.frame(q = 1, w = 2, e = 3)
rename(x, c("q", "e"), c("Q", "E"))
Q w E
1 1 2 3
Building on @user3114046's answer:
x <- data.frame(q=1,w=2,e=3)
x
# q w e
#1 1 2 3
names(x)[match(oldnames,names(x))] <- newnames
x
# A w B
#1 1 2 3
This won't be reliant on a specific ordering of columns in the x
dataset.
With dplyr you would do:
library(dplyr)
df = data.frame(q = 1, w = 2, e = 3)
df %>% rename(A = q, B = e)
# A w B
#1 1 2 3
Or if you want to use vectors, as suggested by @Jelena-bioinf:
library(dplyr)
df = data.frame(q = 1, w = 2, e = 3)
oldnames = c("q","e")
newnames = c("A","B")
df %>% rename_at(vars(oldnames), ~ newnames)
# A w B
#1 1 2 3
L. D. Nicolas May suggested a change given rename_at
is being superseded by rename_with
:
df %>%
rename_with(~ newnames[which(oldnames == .x)], .cols = oldnames)
# A w B
#1 1 2 3
If one row of the data contains the names you want to change all columns to you can do
names(data) <- data[row,]
Given data
is your dataframe and row
is the row number containing the new values.
Then you can remove the row containing the names with
data <- data[-row,]
You can get the name set, save it as a list, and then do your bulk renaming on the string. A good example of this is when you are doing a long to wide transition on a dataset:
names(labWide)
Lab1 Lab10 Lab11 Lab12 Lab13 Lab14 Lab15 Lab16
1 35.75366 22.79493 30.32075 34.25637 30.66477 32.04059 24.46663 22.53063
nameVec <- names(labWide)
nameVec <- gsub("Lab","LabLat",nameVec)
names(labWide) <- nameVec
"LabLat1" "LabLat10" "LabLat11" "LabLat12" "LabLat13" "LabLat14""LabLat15" "LabLat16" "
This is the function that you need: Then just pass the x in a rename(X) and it will rename all values that appear and if it isn't in there it won't error
rename <-function(x){
oldNames = c("a","b","c")
newNames = c("d","e","f")
existing <- match(oldNames,names(x))
names(x)[na.omit(existing)] <- newNames[which(!is.na(existing))]
return(x)
}
If the table contains two columns with the same name then the code goes like this,
rename(df,newname=oldname.x,newname=oldname.y)
Another solution for dataframes which are not too large is (building on @thelatemail answer):
x <- data.frame(q=1,w=2,e=3)
> x
q w e
1 1 2 3
colnames(x) <- c("A","w","B")
> x
A w B
1 1 2 3
Alternatively, you can also use:
names(x) <- c("C","w","D")
> x
C w D
1 1 2 3
Furthermore, you can also rename a subset of the columnnames:
names(x)[2:3] <- c("E","F")
> x
C E F
1 1 2 3
This would change all the occurrences of those letters in all names:
names(x) <- gsub("q", "A", gsub("e", "B", names(x) ) )
You can use a named vector.
With base R (maybe somewhat clunky):
x = data.frame(q = 1, w = 2, e = 3)
rename_vec <- c(q = "A", e = "B")
names(x) <- ifelse(is.na(rename_vec[names(x)]), names(x), rename_vec[names(x)])
x
#> A w B
#> 1 1 2 3
Or a dplyr
option with !!!
:
library(dplyr)
rename_vec <- c(A = "q", B = "e") # the names are just the other way round than in the base R way!
x %>% rename(!!!rename_vec)
#> A w B
#> 1 1 2 3
The latter works because the 'big-bang' operator !!!
is forcing evaluation of a list or a vector.
?`!!`
!!! forces-splice a list of objects. The elements of the list are spliced in place, meaning that they each become one single argument.
The newest dplyr version became more flexible by adding rename_with()
where _with
refers to a function as input. The trick is to reformulate the character vector newnames
into a formula (by ~
), so it would be equivalent to function(x) return (newnames)
.
In my subjective opinion, that is the most elegant dplyr expression.
# shortest & most elegant expression
df %>% rename_with(~ newnames, oldnames)
A w B
1 1 2 3
If you reverse the order, argument .fn must be specified as .fn is expected before .cols argument.
df %>% rename_with(oldnames, .fn = ~ newnames)
A w B
1 1 2 3
Sidenote, if you want to concatenate one string to all of the column names, you can just use this simple code.
colnames(df) <- paste("renamed_",colnames(df),sep="")
Source: Stackoverflow.com