As you already state in your question, the main difference between union
and struct
is that union
members overlay the memory of each other so that the sizeof of a union is the one , while struct
members are laid out one after each other (with optional padding in between). Also an union is large enough to contain all its members, and have an alignment that fits all its members. So let's say int
can only be stored at 2 byte addresses and is 2 bytes wide, and long can only be stored at 4 byte addresses and is 4 bytes long. The following union
union test {
int a;
long b;
};
could have a sizeof
of 4, and an alignment requirement of 4. Both an union and a struct can have padding at the end, but not at their beginning. Writing to a struct changes only the value of the member written to. Writing to a member of an union will render the value of all other members invalid. You cannot access them if you haven't written to them before, otherwise the behavior is undefined. GCC provides as an extension that you can actually read from members of an union, even though you haven't written to them most recently. For an Operation System, it doesn't have to matter whether a user program writes to an union or to a structure. This actually is only an issue of the compiler.
Another important property of union and struct is, they allow that a pointer to them can point to types of any of its members. So the following is valid:
struct test {
int a;
double b;
} * some_test_pointer;
some_test_pointer can point to int*
or double*
. If you cast an address of type test
to int*
, it will point to its first member, a
, actually. The same is true for an union too. Thus, because an union will always have the right alignment, you can use an union to make pointing to some type valid:
union a {
int a;
double b;
};
That union will actually be able to point to an int, and a double:
union a * v = (union a*)some_int_pointer;
*some_int_pointer = 5;
v->a = 10;
return *some_int_pointer;
is actually valid, as stated by the C99 standard:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
- a type compatible with the effective type of the object
- ...
- an aggregate or union type that includes one of the aforementioned types among its members
The compiler won't optimize out the v->a = 10;
as it could affect the value of *some_int_pointer
(and the function will return 10
instead of 5
).