[r] Dynamically select data frame columns using $ and a character value

I have a vector of different column names and I want to be able to loop over each of them to extract that column from a data.frame. For example, consider the data set mtcars and some variable names stored in a character vector cols. When I try to select a variable from mtcars using a dynamic subset of cols, nether of these work

cols <- c("mpg", "cyl", "am")
col <- cols[1]
col
# [1] "mpg"

mtcars$col
# NULL
mtcars$cols[1]
# NULL

how can I get these to return the same values as

mtcars$mpg

Furthermore how can I loop over all the columns in cols to get the values in some sort of loop.

for(x in seq_along(cols)) {
   value <- mtcars[ order(mtcars$cols[x]), ]
}

This question is related to r dataframe r-faq

The answer is


Another solution is to use #get:

> cols <- c("cyl", "am")
> get(cols[1], mtcars)
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

too late.. but I guess I have the answer -

Here's my sample study.df dataframe -

   >study.df
   study   sample       collection_dt other_column
   1 DS-111 ES768098 2019-01-21:04:00:30         <NA>
   2 DS-111 ES768099 2018-12-20:08:00:30   some_value
   3 DS-111 ES768100                <NA>   some_value

And then -

> ## Selecting Columns in an Given order
> ## Create ColNames vector as per your Preference
> 
> selectCols <- c('study','collection_dt','sample')
> 
> ## Select data from Study.df with help of selection vector
> selectCols %>% select(.data=study.df,.)
   study       collection_dt   sample
1 DS-111 2019-01-21:04:00:30 ES768098
2 DS-111 2018-12-20:08:00:30 ES768099
3 DS-111                <NA> ES768100
> 

mtcars[do.call(order, mtcars[cols]), ]

Had similar problem due to some CSV files that had various names for the same column.
This was the solution:

I wrote a function to return the first valid column name in a list, then used that...

# Return the string name of the first name in names that is a column name in tbl
# else null
ChooseCorrectColumnName <- function(tbl, names) {
for(n in names) {
    if (n %in% colnames(tbl)) {
        return(n)
    }
}
return(null)
}

then...

cptcodefieldname = ChooseCorrectColumnName(file, c("CPT", "CPT.Code"))
icdcodefieldname = ChooseCorrectColumnName(file, c("ICD.10.CM.Code", "ICD10.Code"))

if (is.null(cptcodefieldname) || is.null(icdcodefieldname)) {
        print("Bad file column name")
}

# Here we use the hash table implementation where 
# we have a string key and list value so we need actual strings,
# not Factors
file[cptcodefieldname] = as.character(file[cptcodefieldname])
file[icdcodefieldname] = as.character(file[icdcodefieldname])
for (i in 1:length(file[cptcodefieldname])) {
    cpt_valid_icds[file[cptcodefieldname][i]] <<- unique(c(cpt_valid_icds[[file[cptcodefieldname][i]]], file[icdcodefieldname][i]))
}

if you want to select column with specific name then just do

A=mtcars[,which(conames(mtcars)==cols[1])]
#and then
colnames(mtcars)[A]=cols[1]

you can run it in loop as well reverse way to add dynamic name eg if A is data frame and xyz is column to be named as x then I do like this

A$tmp=xyz
colnames(A)[colnames(A)=="tmp"]=x

again this can also be added in loop


Using dplyr provides an easy syntax for sorting the data frames

library(dplyr)
mtcars %>% arrange(gear, desc(mpg))

It might be useful to use the NSE version as shown here to allow dynamically building the sort list

sort_list <- c("gear", "desc(mpg)")
mtcars %>% arrange_(.dots = sort_list)

If I understand correctly, you have a vector containing variable names and would like loop through each name and sort your data frame by them. If so, this example should illustrate a solution for you. The primary issue in yours (the full example isn't complete so I"m not sure what else you may be missing) is that it should be order(Q1_R1000[,parameter[X]]) instead of order(Q1_R1000$parameter[X]), since parameter is an external object that contains a variable name opposed to a direct column of your data frame (which when the $ would be appropriate).

set.seed(1)
dat <- data.frame(var1=round(rnorm(10)),
                   var2=round(rnorm(10)),
                   var3=round(rnorm(10)))
param <- paste0("var",1:3)
dat
#   var1 var2 var3
#1    -1    2    1
#2     0    0    1
#3    -1   -1    0
#4     2   -2   -2
#5     0    1    1
#6    -1    0    0
#7     0    0    0
#8     1    1   -1
#9     1    1    0
#10    0    1    0

for(p in rev(param)){
   dat <- dat[order(dat[,p]),]
 }
dat
#   var1 var2 var3
#3    -1   -1    0
#6    -1    0    0
#1    -1    2    1
#7     0    0    0
#2     0    0    1
#10    0    1    0
#5     0    1    1
#8     1    1   -1
#9     1    1    0
#4     2   -2   -2

Examples related to r

How to get AIC from Conway–Maxwell-Poisson regression via COM-poisson package in R? R : how to simply repeat a command? session not created: This version of ChromeDriver only supports Chrome version 74 error with ChromeDriver Chrome using Selenium How to show code but hide output in RMarkdown? remove kernel on jupyter notebook Function to calculate R2 (R-squared) in R Center Plot title in ggplot2 R ggplot2: stat_count() must not be used with a y aesthetic error in Bar graph R multiple conditions in if statement What does "The following object is masked from 'package:xxx'" mean?

Examples related to dataframe

Trying to merge 2 dataframes but get ValueError How to show all of columns name on pandas dataframe? Python Pandas - Find difference between two data frames Pandas get the most frequent values of a column Display all dataframe columns in a Jupyter Python Notebook How to convert column with string type to int form in pyspark data frame? Display/Print one column from a DataFrame of Series in Pandas Binning column with python pandas Selection with .loc in python Set value to an entire column of a pandas dataframe

Examples related to r-faq

What does "The following object is masked from 'package:xxx'" mean? What does "Error: object '<myvariable>' not found" mean? How do I deal with special characters like \^$.?*|+()[{ in my regex? What does %>% function mean in R? How to plot a function curve in R Use dynamic variable names in `dplyr` Error: unexpected symbol/input/string constant/numeric constant/SPECIAL in my code How should I deal with "package 'xxx' is not available (for R version x.y.z)" warning? How to select the row with the maximum value in each group R data formats: RData, Rda, Rds etc