New in 2014:
Especially if you're also interested in data manipulation in general (including sorting, filtering, subsetting, summarizing etc.), you should definitely take a look at dplyr
, which comes with a variety of functions all designed to facilitate your work specifically with data frames and certain other database types. It even offers quite an elaborate SQL interface, and even a function to convert (most) SQL code directly into R.
The four joining-related functions in the dplyr package are (to quote):
inner_join(x, y, by = NULL, copy = FALSE, ...)
: return all rows from
x where there are matching values in y, and all columns from x and y left_join(x, y, by = NULL, copy = FALSE, ...)
: return all rows from x, and all columns from x and y semi_join(x, y, by = NULL, copy = FALSE, ...)
: return all rows from x where there are matching values in
y, keeping just columns from x. anti_join(x, y, by = NULL, copy = FALSE, ...)
: return all rows from x
where there are not matching values in y, keeping just columns from xIt's all here in great detail.
Selecting columns can be done by select(df,"column")
. If that's not SQL-ish enough for you, then there's the sql()
function, into which you can enter SQL code as-is, and it will do the operation you specified just like you were writing in R all along (for more information, please refer to the dplyr/databases vignette). For example, if applied correctly, sql("SELECT * FROM hflights")
will select all the columns from the "hflights" dplyr table (a "tbl").