Note that, at least for Windows, htonl() is much slower than their intrinsic counterpart _byteswap_ulong(). The former is a DLL library call into ws2_32.dll, the latter is one BSWAP assembly instruction. Therefore, if you are writing some platform-dependent code, prefer using the intrinsics for speed:
#define htonl(x) _byteswap_ulong(x)
This may be especially important for .PNG image processing where all integers are saved in Big Endian with explanation "One can use htonl()..." {to slow down typical Windows programs, if you are not prepared}.