[python] How can I debug what is causing a connection refused or a connection time out?

The problem

The problem is in the network layer. Here are the status codes explained:

  • Connection refused: The peer is not listening on the respective network port you're trying to connect to. This usually means that either a firewall is actively denying the connection or the respective service is not started on the other site or is overloaded.

  • Connection timed out: During the attempt to establish the TCP connection, no response came from the other side within a given time limit. In the context of urllib this may also mean that the HTTP response did not arrive in time. This is sometimes also caused by firewalls, sometimes by network congestion or heavy load on the remote (or even local) site.

In context

That said, it is probably not a problem in your script, but on the remote site. If it's occuring occasionally, it indicates that the other site has load problems or the network path to the other site is unreliable.

Also, as it is a problem with the network, you cannot tell what happened on the other side. It is possible that the packets travel fine in the one direction but get dropped (or misrouted) in the other.

It is also not a (direct) DNS problem, that would cause another error (Name or service not known or something similar). It could however be the case that the DNS is configured to return different IP addresses on each request, which would connect you (DNS caching left aside) to different addresses hosts on each connection attempt. It could in turn be the case that some of these hosts are misconfigured or overloaded and thus cause the aforementioned problems.

Debugging this

As suggested in the another answer, using a packet analyzer can help to debug the issue. You won't see much however except the packets reflecting exactly what the error message says.

To rule out network congestion as a problem you could use a tool like mtr or traceroute or even ping to see if packets get lost to the remote site. Note that, if you see loss in mtr (and any traceroute tool for that matter), you must always consider the first host where loss occurs (in the route from yours to remote) as the one dropping packets, due to the way ICMP works. If the packets get lost only at the last hop over a long time (say, 100 packets), that host definetly has an issue. If you see that this behaviour is persistent (over several days), you might want to contact the administrator.

Loss in a middle of the route usually corresponds to network congestion (possibly due to maintenance), and there's nothing you could do about it (except whining at the ISP about missing redundance).

If network congestion is not a problem (i.e. not more than, say, 5% of the packets get lost), you should contact the remote server administrator to figure out what's wrong. He may be able to see relevant infos in system logs. Running a packet analyzer on the remote site might also be more revealing than on the local site. Checking whether the port is open using netstat -tlp is definetly recommended then.