[parallel-processing] Optimal number of threads per core

One example of lots of threads ("thread pool") vs one per core is that of implementing a web-server in Linux or in Windows.

Since sockets are polled in Linux a lot of threads may increase the likelihood of one of them polling the right socket at the right time - but the overall processing cost will be very high.

In Windows the server will be implemented using I/O Completion Ports - IOCPs - which will make the application event driven: if an I/O completes the OS launches a stand-by thread to process it. When the processing has completed (usually with another I/O operation as in a request-response pair) the thread returns to the IOCP port (queue) to wait for the next completion.

If no I/O has completed there is no processing to be done and no thread is launched.

Indeed, Microsoft recommends no more than one thread per core in IOCP implementations. Any I/O may be attached to the IOCP mechanism. IOCs may also be posted by the application, if necessary.