[c] With arrays, why is it the case that a[5] == 5[a]?

As Joel points out in Stack Overflow podcast #34, in C Programming Language (aka: K & R), there is mention of this property of arrays in C: a[5] == 5[a]

Joel says that it's because of pointer arithmetic but I still don't understand. Why does a[5] == 5[a]?

This question is related to c arrays pointers pointer-arithmetic

The answer is


To answer the question literally. It is not always true that x == x

double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;

prints

false

In C

 int a[]={10,20,30,40,50};
 int *p=a;
 printf("%d\n",*p++);//output will be 10
 printf("%d\n",*a++);//will give an error

Pointer p is a "variable", array name a is a "mnemonic" or "synonym", so p++ is valid but a++ is invalid.

a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)


And, of course

 ("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)


Not an answer, but just some food for thought. If class is having overloaded index/subscript operator, the expression 0[x] will not work:

class Sub
{
public:
    int operator [](size_t nIndex)
    {
        return 0;
    }   
};

int main()
{
    Sub s;
    s[0];
    0[s]; // ERROR 
}

Since we dont have access to int class, this cannot be done:

class int
{
   int operator[](const Sub&);
};

I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !

int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a;  //  s == 5

for(int i = 0 ; i < s ; ++i) {  
           
    cout << a[a[a[i]]] << endl;
    // ... is equivalent to ...
    cout << i[a][a][a] << endl;  // but I prefer this one, it's easier to increase the level of indirection (without loop)
    
}

Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)


It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C by Ted Jensen.

Ted Jensen explained it as:

In fact, this is true, i.e wherever one writes a[i] it can be replaced with *(a + i) without any problems. In fact, the compiler will create the same code in either case. Thus we see that pointer arithmetic is the same thing as array indexing. Either syntax produces the same result.

This is NOT saying that pointers and arrays are the same thing, they are not. We are only saying that to identify a given element of an array we have the choice of two syntaxes, one using array indexing and the other using pointer arithmetic, which yield identical results.

Now, looking at this last expression, part of it.. (a + i), is a simple addition using the + operator and the rules of C state that such an expression is commutative. That is (a + i) is identical to (i + a). Thus we could write *(i + a) just as easily as *(a + i). But *(i + a) could have come from i[a] ! From all of this comes the curious truth that if:

char a[20];

writing

a[3] = 'x';

is the same as writing

3[a] = 'x';

I know the question is answered, but I couldn't resist sharing this explanation.

I remember Principles of Compiler design, Let's assume a is an int array and size of int is 2 bytes, & Base address for a is 1000.

How a[5] will work ->

Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

So,

Similarly when the c code is broken down into 3-address code, 5[a] will become ->

Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010 

So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].

This explanation is also the reason why negative indexes in arrays work in C.

i.e. if I access a[-5] it will give me

Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

It will return me object at location 990.


Well, this is a feature that is only possible because of the language support.

The compiler interprets a[i] as *(a+i) and the expression 5[a] evaluates to *(5+a). Since addition is commutative it turns out that both are equal. Hence the expression evaluates to true.


Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.


A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:

let V = vec 10

that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as

let J = V!5

really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)

The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.

Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.


in c compiler

a[i]
i[a]
*(a+i)

are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)


And, of course

 ("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)


For pointers in C, we have

a[5] == *(a + 5)

and also

5[a] == *(5 + a)

Hence it is true that a[5] == 5[a].


Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.


I know the question is answered, but I couldn't resist sharing this explanation.

I remember Principles of Compiler design, Let's assume a is an int array and size of int is 2 bytes, & Base address for a is 1000.

How a[5] will work ->

Base Address of your Array a + (5*size of(data type for array a))
i.e. 1000 + (5*2) = 1010

So,

Similarly when the c code is broken down into 3-address code, 5[a] will become ->

Base Address of your Array a + (size of(data type for array a)*5)
i.e. 1000 + (2*5) = 1010 

So basically both the statements are pointing to the same location in memory and hence, a[5] = 5[a].

This explanation is also the reason why negative indexes in arrays work in C.

i.e. if I access a[-5] it will give me

Base Address of your Array a + (-5 * size of(data type for array a))
i.e. 1000 + (-5*2) = 990

It will return me object at location 990.


And, of course

 ("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)


To answer the question literally. It is not always true that x == x

double zero = 0.0;
double a[] = { 0,0,0,0,0, zero/zero}; // NaN
cout << (a[5] == 5[a] ? "true" : "false") << endl;

prints

false

Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.


Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.


For pointers in C, we have

a[5] == *(a + 5)

and also

5[a] == *(5 + a)

Hence it is true that a[5] == 5[a].


And, of course

 ("ABCD"[2] == 2["ABCD"]) && (2["ABCD"] == 'C') && ("ABCD"[2] == 'C')

The main reason for this was that back in the 70's when C was designed, computers didn't have much memory (64KB was a lot), so the C compiler didn't do much syntax checking. Hence "X[Y]" was rather blindly translated into "*(X+Y)"

This also explains the "+=" and "++" syntaxes. Everything in the form "A = B + C" had the same compiled form. But, if B was the same object as A, then an assembly level optimization was available. But the compiler wasn't bright enough to recognize it, so the developer had to (A += C). Similarly, if C was 1, a different assembly level optimization was available, and again the developer had to make it explicit, because the compiler didn't recognize it. (More recently compilers do, so those syntaxes are largely unnecessary these days)


I think something is being missed by the other answers.

Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].

(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)

But the commutativity of addition is not all that obvious in this case.

When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.

But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)

The C standard's description of the + operator (N1570 6.5.6) says:

For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type.

It could just as easily have said:

For addition, either both operands shall have arithmetic type, or the left operand shall be a pointer to a complete object type and the right operand shall have integer type.

in which case both i + p and i[p] would be illegal.

In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:

pointer operator+(pointer p, integer i);

and

pointer operator+(integer i, pointer p);

of which only the first is really necessary.

So why is it this way?

C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).

So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.

Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.

And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).

Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.

So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.


A little bit of history now. Among other languages, BCPL had a fairly major influence on C's early development. If you declared an array in BCPL with something like:

let V = vec 10

that actually allocated 11 words of memory, not 10. Typically V was the first, and contained the address of the immediately following word. So unlike C, naming V went to that location and picked up the address of the zeroeth element of the array. Therefore array indirection in BCPL, expressed as

let J = V!5

really did have to do J = !(V + 5) (using BCPL syntax) since it was necessary to fetch V to get the base address of the array. Thus V!5 and 5!V were synonymous. As an anecdotal observation, WAFL (Warwick Functional Language) was written in BCPL, and to the best of my memory tended to use the latter syntax rather than the former for accessing the nodes used as data storage. Granted this is from somewhere between 35 and 40 years ago, so my memory is a little rusty. :)

The innovation of dispensing with the extra word of storage and having the compiler insert the base address of the array when it was named came later. According to the C history paper this happened at about the time structures were added to C.

Note that ! in BCPL was both a unary prefix operator and a binary infix operator, in both cases doing indirection. just that the binary form included an addition of the two operands before doing the indirection. Given the word oriented nature of BCPL (and B) this actually made a lot of sense. The restriction of "pointer and integer" was made necessary in C when it gained data types, and sizeof became a thing.


One thing no-one seems to have mentioned about Dinah's problem with sizeof:

You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.


Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.


Not an answer, but just some food for thought. If class is having overloaded index/subscript operator, the expression 0[x] will not work:

class Sub
{
public:
    int operator [](size_t nIndex)
    {
        return 0;
    }   
};

int main()
{
    Sub s;
    s[0];
    0[s]; // ERROR 
}

Since we dont have access to int class, this cannot be done:

class int
{
   int operator[](const Sub&);
};

in c compiler

a[i]
i[a]
*(a+i)

are different ways to refer to an element in an array ! (NOT AT ALL WEIRD)


Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.


I think something is being missed by the other answers.

Yes, p[i] is by definition equivalent to *(p+i), which (because addition is commutative) is equivalent to *(i+p), which (again, by the definition of the [] operator) is equivalent to i[p].

(And in array[i], the array name is implicitly converted to a pointer to the array's first element.)

But the commutativity of addition is not all that obvious in this case.

When both operands are of the same type, or even of different numeric types that are promoted to a common type, commutativity makes perfect sense: x + y == y + x.

But in this case we're talking specifically about pointer arithmetic, where one operand is a pointer and the other is an integer. (Integer + integer is a different operation, and pointer + pointer is nonsense.)

The C standard's description of the + operator (N1570 6.5.6) says:

For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to a complete object type and the other shall have integer type.

It could just as easily have said:

For addition, either both operands shall have arithmetic type, or the left operand shall be a pointer to a complete object type and the right operand shall have integer type.

in which case both i + p and i[p] would be illegal.

In C++ terms, we really have two sets of overloaded + operators, which can be loosely described as:

pointer operator+(pointer p, integer i);

and

pointer operator+(integer i, pointer p);

of which only the first is really necessary.

So why is it this way?

C++ inherited this definition from C, which got it from B (the commutativity of array indexing is explicitly mentioned in the 1972 Users' Reference to B), which got it from BCPL (manual dated 1967), which may well have gotten it from even earlier languages (CPL? Algol?).

So the idea that array indexing is defined in terms of addition, and that addition, even of a pointer and an integer, is commutative, goes back many decades, to C's ancestor languages.

Those languages were much less strongly typed than modern C is. In particular, the distinction between pointers and integers was often ignored. (Early C programmers sometimes used pointers as unsigned integers, before the unsigned keyword was added to the language.) So the idea of making addition non-commutative because the operands are of different types probably wouldn't have occurred to the designers of those languages. If a user wanted to add two "things", whether those "things" are integers, pointers, or something else, it wasn't up to the language to prevent it.

And over the years, any change to that rule would have broken existing code (though the 1989 ANSI C standard might have been a good opportunity).

Changing C and/or C++ to require putting the pointer on the left and the integer on the right might break some existing code, but there would be no loss of real expressive power.

So now we have arr[3] and 3[arr] meaning exactly the same thing, though the latter form should never appear outside the IOCCC.


I just find out this ugly syntax could be "useful", or at least very fun to play with when you want to deal with an array of indexes which refer to positions into the same array. It can replace nested square brackets and make the code more readable !

int a[] = { 2 , 3 , 3 , 2 , 4 };
int s = sizeof a / sizeof *a;  //  s == 5

for(int i = 0 ; i < s ; ++i) {  
           
    cout << a[a[a[i]]] << endl;
    // ... is equivalent to ...
    cout << i[a][a][a] << endl;  // but I prefer this one, it's easier to increase the level of indirection (without loop)
    
}

Of course, I'm quite sure that there is no use case for that in real code, but I found it interesting anyway :)


Nice question/answers.

Just want to point out that C pointers and arrays are not the same, although in this case the difference is not essential.

Consider the following declarations:

int a[10];
int* p = a;

In a.out, the symbol a is at an address that's the beginning of the array, and symbol p is at an address where a pointer is stored, and the value of the pointer at that memory location is the beginning of the array.


One thing no-one seems to have mentioned about Dinah's problem with sizeof:

You can only add an integer to a pointer, you can't add two pointers together. That way when adding a pointer to an integer, or an integer to a pointer, the compiler always knows which bit has a size that needs to be taken into account.


In C

 int a[]={10,20,30,40,50};
 int *p=a;
 printf("%d\n",*p++);//output will be 10
 printf("%d\n",*a++);//will give an error

Pointer p is a "variable", array name a is a "mnemonic" or "synonym", so p++ is valid but a++ is invalid.

a[2] is equals to 2[a] because the internal operation on both of this is "Pointer Arithmetic" internally calculated as *(a+2) equals *(2+a)


In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.


Because array access is defined in terms of pointers. a[i] is defined to mean *(a + i), which is commutative.


In C arrays, arr[3] and 3[arr] are the same, and their equivalent pointer notations are *(arr + 3) to *(3 + arr). But on the contrary [arr]3 or [3]arr is not correct and will result into syntax error, as (arr + 3)* and (3 + arr)* are not valid expressions. The reason is dereference operator should be placed before the address yielded by the expression, not after the address.


It has very good explanation in A TUTORIAL ON POINTERS AND ARRAYS IN C by Ted Jensen.

Ted Jensen explained it as:

In fact, this is true, i.e wherever one writes a[i] it can be replaced with *(a + i) without any problems. In fact, the compiler will create the same code in either case. Thus we see that pointer arithmetic is the same thing as array indexing. Either syntax produces the same result.

This is NOT saying that pointers and arrays are the same thing, they are not. We are only saying that to identify a given element of an array we have the choice of two syntaxes, one using array indexing and the other using pointer arithmetic, which yield identical results.

Now, looking at this last expression, part of it.. (a + i), is a simple addition using the + operator and the rules of C state that such an expression is commutative. That is (a + i) is identical to (i + a). Thus we could write *(i + a) just as easily as *(a + i). But *(i + a) could have come from i[a] ! From all of this comes the curious truth that if:

char a[20];

writing

a[3] = 'x';

is the same as writing

3[a] = 'x';

Examples related to c

conflicting types for 'outchar' Can't compile C program on a Mac after upgrade to Mojave Program to find largest and second largest number in array Prime numbers between 1 to 100 in C Programming Language In c, in bool, true == 1 and false == 0? How I can print to stderr in C? Visual Studio Code includePath "error: assignment to expression with array type error" when I assign a struct field (C) Compiling an application for use in highly radioactive environments How can you print multiple variables inside a string using printf?

Examples related to arrays

PHP array value passes to next row Use NSInteger as array index How do I show a message in the foreach loop? Objects are not valid as a React child. If you meant to render a collection of children, use an array instead Iterating over arrays in Python 3 Best way to "push" into C# array Sort Array of object by object field in Angular 6 Checking for duplicate strings in JavaScript array what does numpy ndarray shape do? How to round a numpy array?

Examples related to pointers

Method Call Chaining; returning a pointer vs a reference? lvalue required as left operand of assignment error when using C++ Error: stray '\240' in program Reference to non-static member function must be called How to convert const char* to char* in C? Why should I use a pointer rather than the object itself? Function stoi not declared C pointers and arrays: [Warning] assignment makes pointer from integer without a cast Constant pointer vs Pointer to constant How to get the real and total length of char * (char array)?

Examples related to pointer-arithmetic

Pointer arithmetic for void pointer in C With arrays, why is it the case that a[5] == 5[a]?