[r] Global and local variables in R

<- does assignment in the current environment.

When you're inside a function R creates a new environment for you. By default it includes everything from the environment in which it was created so you can use those variables as well but anything new you create will not get written to the global environment.

In most cases <<- will assign to variables already in the global environment or create a variable in the global environment even if you're inside a function. However, it isn't quite as straightforward as that. What it does is checks the parent environment for a variable with the name of interest. If it doesn't find it in your parent environment it goes to the parent of the parent environment (at the time the function was created) and looks there. It continues upward to the global environment and if it isn't found in the global environment it will assign the variable in the global environment.

This might illustrate what is going on.

bar <- "global"
foo <- function(){
    bar <- "in foo"
    baz <- function(){
        bar <- "in baz - before <<-"
        bar <<- "in baz - after <<-"
        print(bar)
    }
    print(bar)
    baz()
    print(bar)
}
> bar
[1] "global"
> foo()
[1] "in foo"
[1] "in baz - before <<-"
[1] "in baz - after <<-"
> bar
[1] "global"

The first time we print bar we haven't called foo yet so it should still be global - this makes sense. The second time we print it's inside of foo before calling baz so the value "in foo" makes sense. The following is where we see what <<- is actually doing. The next value printed is "in baz - before <<-" even though the print statement comes after the <<-. This is because <<- doesn't look in the current environment (unless you're in the global environment in which case <<- acts like <-). So inside of baz the value of bar stays as "in baz - before <<-". Once we call baz the copy of bar inside of foo gets changed to "in baz" but as we can see the global bar is unchanged. This is because the copy of bar that is defined inside of foo is in the parent environment when we created baz so this is the first copy of bar that <<- sees and thus the copy it assigns to. So <<- isn't just directly assigning to the global environment.

<<- is tricky and I wouldn't recommend using it if you can avoid it. If you really want to assign to the global environment you can use the assign function and tell it explicitly that you want to assign globally.

Now I change the <<- to an assign statement and we can see what effect that has:

bar <- "global"
foo <- function(){
    bar <- "in foo"   
    baz <- function(){
        assign("bar", "in baz", envir = .GlobalEnv)
    }
    print(bar)
    baz()
    print(bar)
}
bar
#[1] "global"
foo()
#[1] "in foo"
#[1] "in foo"
bar
#[1] "in baz"

So both times we print bar inside of foo the value is "in foo" even after calling baz. This is because assign never even considered the copy of bar inside of foo because we told it exactly where to look. However, this time the value of bar in the global environment was changed because we explicitly assigned there.

Now you also asked about creating local variables and you can do that fairly easily as well without creating a function... We just need to use the local function.

bar <- "global"
# local will create a new environment for us to play in
local({
    bar <- "local"
    print(bar)
})
#[1] "local"
bar
#[1] "global"