The danger of the second expression comes if the type of byte1
is char
. In that case, some implementations can have it signed char
, which will result in sign extension when evaluating.
signed char byte1 = 0x80;
signed char byte2 = 0x10;
unsigned short value1 = ((byte2 << 8) | (byte1 & 0xFF));
unsigned short value2 = ((byte2 << 8) | byte1);
printf("value1=%hu %hx\n", value1, value1);
printf("value2=%hu %hx\n", value2, value2);
will print
value1=4224 1080 right
value2=65408 ff80 wrong!!
I tried it on gcc v3.4.6 on Solaris SPARC 64 bit and the result is the same with byte1
and byte2
declared as char
.
TL;DR
The masking is to avoid implicit sign extension.
EDIT: I checked, it's the same behaviour in C++.
EDIT2: As requested explanation of sign extension.
Sign extension is a consequence of the way C evaluates expressions. There is a rule in C called promotion rule. C will implicitly cast all small types to int
before doing the evaluation. Let's see what happens to our expression:
unsigned short value2 = ((byte2 << 8) | byte1);
byte1
is a variable containing bit pattern 0xFF. If char
is unsigned
that value is interpreted as 255, if it is signed
it is -128. When doing the calculation, C will extend the value to an int
size (16 or 32 bits generally). This means that if the variable is unsigned
and we will keep the value 255, the bit-pattern of that value as int
will be 0x000000FF. If it is signed
we want the value -128 which bit pattern is 0xFFFFFFFF. The sign was extended to the size of the tempory used to do the calculation.
And thus oring the temporary will yield the wrong result.
On x86 assembly it is done with the movsx
instruction (movzx
for the zero extend). Other CPU's had other instructions for that (6809 had SEX
).