To help facilitate understanding of closures it might be useful to examine how they might be implemented in a procedural language. This explanation will follow a simplistic implementation of closures in Scheme.
To start, I must introduce the concept of a namespace. When you enter a command into a Scheme interpreter, it must evaluate the various symbols in the expression and obtain their value. Example:
(define x 3)
(define y 4)
(+ x y) returns 7
The define expressions store the value 3 in the spot for x and the value 4 in the spot for y. Then when we call (+ x y), the interpreter looks up the values in the namespace and is able to perform the operation and return 7.
However, in Scheme there are expressions that allow you to temporarily override the value of a symbol. Here's an example:
(define x 3)
(define y 4)
(let ((x 5))
(+ x y)) returns 9
x returns 3
What the let keyword does is introduces a new namespace with x as the value 5. You will notice that it's still able to see that y is 4, making the sum returned to be 9. You can also see that once the expression has ended x is back to being 3. In this sense, x has been temporarily masked by the local value.
Procedural and object-oriented languages have a similar concept. Whenever you declare a variable in a function that has the same name as a global variable you get the same effect.
How would we implement this? A simple way is with a linked list - the head contains the new value and the tail contains the old namespace. When you need to look up a symbol, you start at the head and work your way down the tail.
Now let's skip to the implementation of first-class functions for the moment. More or less, a function is a set of instructions to execute when the function is called culminating in the return value. When we read in a function, we can store these instructions behind the scenes and run them when the function is called.
(define x 3)
(define (plus-x y)
(+ x y))
(let ((x 5))
(plus-x 4)) returns ?
We define x to be 3 and plus-x to be its parameter, y, plus the value of x. Finally we call plus-x in an environment where x has been masked by a new x, this one valued 5. If we merely store the operation, (+ x y), for the function plus-x, since we're in the context of x being 5 the result returned would be 9. This is what's called dynamic scoping.
However, Scheme, Common Lisp, and many other languages have what's called lexical scoping - in addition to storing the operation (+ x y) we also store the namespace at that particular point. That way, when we're looking up the values we can see that x, in this context, is really 3. This is a closure.
(define x 3)
(define (plus-x y)
(+ x y))
(let ((x 5))
(plus-x 4)) returns 7
In summary, we can use a linked list to store the state of the namespace at the time of function definition, allowing us to access variables from enclosing scopes, as well as providing us the ability to locally mask a variable without affecting the rest of the program.