Stack memory is specifically the range of memory that is accessible via the Stack register of the CPU. The Stack was used as a way to implement the "Jump-Subroutine"-"Return" code pattern in assembly language, and also as a means to implement hardware-level interrupt handling. For instance, during an interrupt, the Stack was used to store various CPU registers, including Status (which indicates the results of an operation) and Program Counter (where was the CPU in the program when the interrupt occurred).
Stack memory is very much the consequence of usual CPU design. The speed of its allocation/deallocation is fast because it is strictly a last-in/first-out design. It is a simple matter of a move operation and a decrement/increment operation on the Stack register.
Heap memory was simply the memory that was left over after the program was loaded and the Stack memory was allocated. It may (or may not) include global variable space (it's a matter of convention).
Modern pre-emptive multitasking OS's with virtual memory and memory-mapped devices make the actual situation more complicated, but that's Stack vs Heap in a nutshell.