I have a program that consists of a master server and distributed slave servers. The slave servers send status updates to the server, and if the server hasn't heard from a specific slave in a fixed period, it marks the slave as down. This is happening consistently.
From inspecting logs, I have found that the slave is only able to send one status update to the server, and then is never able to send another update, always failing on the call to connect() "Cannot assign requested address (99).
Oddly enough, the slave is able to send several other updates to the server, and all of the connections are happening on the same port. It seems that the most common cause of this failure is that connections are left open, but I'm having trouble finding anything left open. Are there other possible explanations?
To clarify, here's how I'm connecting:
struct sockaddr *sa; // parameter size_t sa_size; //parameter int i = 1; int stream; stream = socket(AF_INET,SOCK_STREAM,0); setsockopt(stream,SOL_SOCKET,SO_REUSEADDR,&i,sizeof(i)); bindresvport(stream,NULL); connect(stream,sa,sa_size);
This code is in a function to obtain a connection to another server, and a failure on any of those 4 calls causes the function to fail.
It turns out that the problem really was that the address was busy - the busyness was caused by some other problems in how we are handling network communications. Your inputs have helped me figure this out. Thank you.
EDIT: to be specific, the problems in handling our network communications were that these status updates would be constantly re-sent if the first failed. It was only a matter of time until we had every distributed slave trying to send its status update at the same time, which was over-saturating our network.