I came across this recently, it certainly helped me understand: https://www.cs.ryerson.ca/~aharley/vis/conv/
So there's an input, a Conv2D, MaxPooling2D etc, the Flatten layers are at the end and show exactly how they are formed and how they go on to define the final classifications (0-9).