It needs to be clarified that not only data should be cache-friendly, it is just as important for the code. This is in addition to branch predicition, instruction reordering, avoiding actual divisions and other techniques.
Typically the denser the code, the fewer cache lines will be required to store it. This results in more cache lines being available for data.
The code should not call functions all over the place as they typically will require one or more cache lines of their own, resulting in fewer cache lines for data.
A function should begin at a cache line-alignment-friendly address. Though there are (gcc) compiler switches for this be aware that if the the functions are very short it might be wasteful for each one to occupy an entire cache line. For example, if three of the most often used functions fit inside one 64 byte cache line, this is less wasteful than if each one has its own line and results in two cache lines less available for other usage. A typical alignment value could be 32 or 16.
So spend some extra time to make the code dense. Test different constructs, compile and review the generated code size and profile.