[python] How can I debug what is causing a connection refused or a connection time out?

I have the following code that has worked for about a year:

import urllib2

req = urllib2.Request('https://somewhere.com','<Request></Request>')
data = urllib2.urlopen(req)
print data.read()

Lately, there have been some random errors:

  • urllib2.URLError: <urlopen error [Errno 111] Connection refused>
  • <urlopen error [Errno 110] Connection timed out>

The trace of the failure is:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    data = urllib2.urlopen(req).read()
  File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 400, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 418, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1215, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

The above errors happen randomly, the script can run successfully the first time but then fails on the second run and vice versa.

What should I do to debug and figure out where the issue is coming from? How can I tell if the endpoint has consumed my request and returned a response but never reached me?

With telnet

I just tested with telnet, sometimes it succeeds, sometimes it doesn't, just like my Python.

On success:

$ telnet somewhere.com 443
Trying XXX.YY.ZZZ.WWW...
Connected to somewhere.com.
Escape character is '^]'.
Connection closed by foreign host.

On a refused connection:

$ telnet somewhere.com 443
Trying XXX.YY.ZZZ.WWW...
telnet: Unable to connect to remote host: Connection refused

On a timeout:

$ telnet somewhere.com 443
Trying XXX.YY.ZZZ.WWW...
telnet: Unable to connect to remote host: Connection timed out

This question is related to python networking

The answer is


The problem

The problem is in the network layer. Here are the status codes explained:

  • Connection refused: The peer is not listening on the respective network port you're trying to connect to. This usually means that either a firewall is actively denying the connection or the respective service is not started on the other site or is overloaded.

  • Connection timed out: During the attempt to establish the TCP connection, no response came from the other side within a given time limit. In the context of urllib this may also mean that the HTTP response did not arrive in time. This is sometimes also caused by firewalls, sometimes by network congestion or heavy load on the remote (or even local) site.

In context

That said, it is probably not a problem in your script, but on the remote site. If it's occuring occasionally, it indicates that the other site has load problems or the network path to the other site is unreliable.

Also, as it is a problem with the network, you cannot tell what happened on the other side. It is possible that the packets travel fine in the one direction but get dropped (or misrouted) in the other.

It is also not a (direct) DNS problem, that would cause another error (Name or service not known or something similar). It could however be the case that the DNS is configured to return different IP addresses on each request, which would connect you (DNS caching left aside) to different addresses hosts on each connection attempt. It could in turn be the case that some of these hosts are misconfigured or overloaded and thus cause the aforementioned problems.

Debugging this

As suggested in the another answer, using a packet analyzer can help to debug the issue. You won't see much however except the packets reflecting exactly what the error message says.

To rule out network congestion as a problem you could use a tool like mtr or traceroute or even ping to see if packets get lost to the remote site. Note that, if you see loss in mtr (and any traceroute tool for that matter), you must always consider the first host where loss occurs (in the route from yours to remote) as the one dropping packets, due to the way ICMP works. If the packets get lost only at the last hop over a long time (say, 100 packets), that host definetly has an issue. If you see that this behaviour is persistent (over several days), you might want to contact the administrator.

Loss in a middle of the route usually corresponds to network congestion (possibly due to maintenance), and there's nothing you could do about it (except whining at the ISP about missing redundance).

If network congestion is not a problem (i.e. not more than, say, 5% of the packets get lost), you should contact the remote server administrator to figure out what's wrong. He may be able to see relevant infos in system logs. Running a packet analyzer on the remote site might also be more revealing than on the local site. Checking whether the port is open using netstat -tlp is definetly recommended then.


Use a packet analyzer to intercept the packets to/from somewhere.com. Studying those packets should tell you what is going on.

Time-outs or connections refused could mean that the remote host is too busy.