Bug 771691 - [abrt] mercurial-1.9.3-1.fc16: ssl.py:172:getpeercert:AttributeError: 'NoneType' object has no attribute 'peer_certificate'
Summary: [abrt] mercurial-1.9.3-1.fc16: ssl.py:172:getpeercert:AttributeError: 'NoneTy...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: mercurial
Version: 16
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Neal Becker
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard: abrt_hash:ad2ddacea4effdbd38669f53fdd...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-04 16:21 UTC by Robin Lee
Modified: 2012-02-14 08:36 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-02-14 08:36:50 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of "tcpdump -n port 443" during a blocked hg through https (4.97 KB, text/plain)
2012-01-05 01:41 UTC, Robin Lee
no flags Details
Output of the script in #c14 (28.87 KB, text/plain)
2012-01-06 15:15 UTC, Robin Lee
no flags Details

Description Robin Lee 2012-01-04 16:21:51 UTC
libreport version: 2.0.8
abrt_version:   2.0.7
cmdline:        /usr/bin/python /usr/bin/hg clone https://code.google.com/p/lonote/
executable:     /usr/bin/hg
kernel:         3.1.6-1.fc16.x86_64
reason:         ssl.py:172:getpeercert:AttributeError: 'NoneType' object has no attribute 'peer_certificate'
time:           Thu 05 Jan 2012 12:16:29 AM CST
uid:            500
username:       cheese

backtrace:
:ssl.py:172:getpeercert:AttributeError: 'NoneType' object has no attribute 'peer_certificate'
:
:Traceback (most recent call last):
:  File "/usr/bin/hg", line 38, in <module>
:    mercurial.dispatch.run()
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 27, in run
:    sys.exit(dispatch(request(sys.argv[1:])))
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 64, in dispatch
:    return _runcatch(req)
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 87, in _runcatch
:    return _dispatch(req)
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 688, in _dispatch
:    cmdpats, cmdoptions)
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 463, in runcommand
:    ret = _runcommand(ui, options, cmd, d)
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 742, in _runcommand
:    return checkargs()
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 696, in checkargs
:    return cmdfunc()
:  File "/usr/lib64/python2.7/site-packages/mercurial/dispatch.py", line 685, in <lambda>
:    d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
:  File "/usr/lib64/python2.7/site-packages/mercurial/util.py", line 389, in check
:    return func(*args, **kwargs)
:  File "/usr/lib64/python2.7/site-packages/mercurial/commands.py", line 1041, in clone
:    branch=opts.get('branch'))
:  File "/usr/lib64/python2.7/site-packages/mercurial/hg.py", line 217, in clone
:    srcrepo = repository(remoteui(ui, peeropts), source)
:  File "/usr/lib64/python2.7/site-packages/mercurial/hg.py", line 93, in repository
:    repo = _peerlookup(path).instance(ui, path, create)
:  File "/usr/lib64/python2.7/site-packages/mercurial/httprepo.py", line 233, in instance
:    inst._fetchcaps()
:  File "/usr/lib64/python2.7/site-packages/mercurial/httprepo.py", line 56, in _fetchcaps
:    self.caps = set(self._call('capabilities').split())
:  File "/usr/lib64/python2.7/site-packages/mercurial/httprepo.py", line 166, in _call
:    fp = self._callstream(cmd, **args)
:  File "/usr/lib64/python2.7/site-packages/mercurial/httprepo.py", line 116, in _callstream
:    resp = self.urlopener.open(req)
:  File "/usr/lib64/python2.7/urllib2.py", line 394, in open
:    response = self._open(req, data)
:  File "/usr/lib64/python2.7/urllib2.py", line 412, in _open
:    '_open', req)
:  File "/usr/lib64/python2.7/urllib2.py", line 372, in _call_chain
:    result = func(*args)
:  File "/usr/lib64/python2.7/site-packages/mercurial/url.py", line 362, in https_open
:    return self.do_open(self._makeconnection, req)
:  File "/usr/lib64/python2.7/site-packages/mercurial/keepalive.py", line 256, in do_open
:    self._start_transaction(h, req)
:  File "/usr/lib64/python2.7/site-packages/mercurial/url.py", line 348, in _start_transaction
:    return keepalive.KeepAliveHandler._start_transaction(self, h, req)
:  File "/usr/lib64/python2.7/site-packages/mercurial/keepalive.py", line 354, in _start_transaction
:    h.endheaders()
:  File "/usr/lib64/python2.7/httplib.py", line 951, in endheaders
:    self._send_output(message_body)
:  File "/usr/lib64/python2.7/httplib.py", line 811, in _send_output
:    self.send(msg)
:  File "/usr/lib64/python2.7/site-packages/mercurial/url.py", line 135, in _sendfile
:    orgsend(self, data)
:  File "/usr/lib64/python2.7/site-packages/mercurial/keepalive.py", line 537, in safesend
:    self.connect()
:  File "/usr/lib64/python2.7/site-packages/mercurial/url.py", line 337, in connect
:    sslutil.validator(self.ui, host)(self.sock)
:  File "/usr/lib64/python2.7/site-packages/mercurial/sslutil.py", line 105, in __call__
:    peercert = sock.getpeercert(True)
:  File "/usr/lib64/python2.7/ssl.py", line 172, in getpeercert
:    return self._sslobj.peer_certificate(binary_form)
:AttributeError: 'NoneType' object has no attribute 'peer_certificate'
:
:Local variables in innermost frame:
:self: <ssl.SSLSocket object at 0x1cd6c80>
:binary_form: True

comment:
:Just run:
:hg clone https://code.google.com/p/lonote/

Comment 1 Mads Kiilerich 2012-01-04 16:31:28 UTC
Are you by any chance located in China, behind the not-so-great firewall?

It shouldn't be possible to end up in a situation where _sslobj is None. We need a description of how to reproduce it ... or someone who can reproduce it and has the skills to debug it.

Comment 2 Robin Lee 2012-01-04 16:43:33 UTC
Yes, I am in Mainland China. That should be the very problem.

Comment 3 Mads Kiilerich 2012-01-04 16:58:29 UTC
If you know some python then please try to debug it.
Edit /usr/lib*/python2.7/ssl.py and try to figure out why _sslobj is None. A stacktrace showing where it is set to None would be helpful.

Comment 4 Robin Lee 2012-01-04 18:00:04 UTC
I peak at mercurial/sslutil.py of version 2.0.2. At line 107, it may be better first test the connection through sock.cipher() before running sock.getpeercert .

Comment 5 Mads Kiilerich 2012-01-04 18:08:55 UTC
Yes, that or some other workaround could have given a custom error message.

There is however no indication in the python documentation that cipher should be checked first or that getpeercert should fail. It thus looks a lot like a Python bug, but before I file a bug report or make a workaround in Mercurial I would like to understand in which situations this can occur and be sure that it isn't a bug in Mercurials use of ssl.

Please try to figure out when and why _sslobj silently is set to None.

Comment 6 Robin Lee 2012-01-04 19:15:00 UTC
At ssl.py SSLSocket.__init__:

        try:
            socket.getpeername(self)
        except socket_error, e:
            if e.errno != errno.ENOTCONN:
                raise
            # no, no connection yet                                                       
            self._connected = False
            self._sslobj = None # !!!! _sslobj was set to None

And e.errno is 107.

How about asking the question at the Python community? I may not be able to dig into the C implementation. I am not familiar with Python C API.

Comment 7 Mads Kiilerich 2012-01-04 23:57:20 UTC
Thanks.

I haven't seen the problem myself, and until now I haven't seen enough reliable evidence to know what to say and ask for. But we are getting closer. You are welcome to go ahead now and investigate and report it with a minimal test case.

To help us understand (more of) the exact conditions where it fails, can you please show the output of 
  tcpdump -n port 443
of a failing clone?

Comment 8 Robin Lee 2012-01-05 01:41:04 UTC
Created attachment 550792 [details]
output of "tcpdump -n port 443" during a blocked hg through https

Comment 9 Mads Kiilerich 2012-01-05 12:15:39 UTC
Thanks. We are getting closer.

Do the dump show several 'hg clone' attempts?
Do each attempt hang for 30 seconds before it fails?
Did all attempts fail the same way, or was it only the last one that failed with the crash we are discussing?

Depending on answers to these questions I assume that the connection attempts simply was blocked in different ways by the wall. Could you try to specify different IP addresses of code.google.com explicitly and see if they simply are blocked in different ways?

I wonder why the first connections stopped after 30 seconds. Can you try
  tcpdump -n port 443 or icmp
and see if something reports failure back over ICMP?

I assume that the following always fails nicely with a 'connection reset':

import socket, ssl
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
ssl_sock = ssl.wrap_socket(s)
ssl_sock.connect(('code.google.com', 443))
ssl_sock.getpeercert(True)

while this can fail with the NoneType AttributeError:

import socket, ssl
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('code.google.com', 443))
ssl_sock = ssl.wrap_socket(s)
ssl_sock.getpeercert(True)

?

Comment 10 Robin Lee 2012-01-06 05:52:57 UTC
(In reply to comment #9)
> Do the dump show several 'hg clone' attempts?
How to check this?

> Do each attempt hang for 30 seconds before it fails?
There are 30 seconds between last failure and an attempt to another IP address. Failures are incurred by checksum incorrect.

> Did all attempts fail the same way, or was it only the last one that failed
> with the crash we are discussing?
I can summarize the three kinds of situations:

 1) Connection timeout and hg throws NoneType. 'hg clone' tries several IP addresses of the domain and the transmissions failed mostly with checksum incorrect.

 2) Connection reset and hg fails quickly and shows the right error.

 3) Connection reset and hg fails quickly and throws NoneType.

> 
> Depending on answers to these questions I assume that the connection attempts
> simply was blocked in different ways by the wall. Could you try to specify
> different IP addresses of code.google.com explicitly and see if they simply are
> blocked in different ways?
$ hg clone https://74.125.71.100/p/lonote/
abort: error: Connection reset by peer

This is just the second situation. And the other two also occur with explicit IP addresses.

> 
> I wonder why the first connections stopped after 30 seconds. Can you try
>   tcpdump -n port 443 or icmp
> and see if something reports failure back over ICMP?
There seems no ICMP report.


> I assume that the following always fails nicely with a 'connection reset':
> 
> import socket, ssl
> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> ssl_sock = ssl.wrap_socket(s)
> ssl_sock.connect(('code.google.com', 443))
> ssl_sock.getpeercert(True)
> 
> while this can fail with the NoneType AttributeError:
> 
> import socket, ssl
> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> s.connect(('code.google.com', 443))
> ssl_sock = ssl.wrap_socket(s)
> ssl_sock.getpeercert(True)
> 
> ?
> 
Result of the first script:

$ python expect_Reset.py 
Traceback (most recent call last):
  File "expect_Reset.py", line 8, in <module>
    ssl_sock.connect(('code.google.com', 443))
  File "/usr/lib64/python2.7/ssl.py", line 322, in connect
    self._real_connect(addr, False)
  File "/usr/lib64/python2.7/ssl.py", line 315, in _real_connect
    raise e
socket.error: [Errno 110] Connection timed out

Result of the second script:

$ python expect_NoneType.py 
Traceback (most recent call last):
  File "expect_NoneType.py", line 6, in <module>
    s.connect(('code.google.com', 443))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 110] Connection timed out

Since the Firewall is not static, the result may differ like the three situations.

Comment 11 Mads Kiilerich 2012-01-06 12:59:05 UTC
(In reply to comment #10)
> (In reply to comment #9)
> > Do the dump show several 'hg clone' attempts?
> How to check this?

Did you issue one or several 'hg clone' commands while tcpdump was running? 

> > Do each attempt hang for 30 seconds before it fails?
> There are 30 seconds between last failure and an attempt to another IP address.

'hg clone' is automatically failing over to the next IP address?

> Failures are incurred by checksum incorrect.

What observation leads to that conclusion? You don't receive any response at all, so how can the checksum be incorrect?

> > Did all attempts fail the same way, or was it only the last one that failed
> > with the crash we are discussing?
> I can summarize the three kinds of situations:
> 
>  1) Connection timeout and hg throws NoneType. 'hg clone' tries several IP
> addresses of the domain and the transmissions failed mostly with checksum
> incorrect.

One 'hg clone https://...' invocation will retry several IP addresses? That is very surprising and not indicated by source or documentation and I can't reproduce that.

Are you using plain Mercurial without any wrappers or modifications or extensions?

>  2) Connection reset and hg fails quickly and shows the right error.
> 
>  3) Connection reset and hg fails quickly and throws NoneType.

That is very surprising and not the behaviour I would expect from looking at the source or documentation and I can't reproduce that.

Can you correlate the different failure modes with different tcpdumps?

> > Depending on answers to these questions I assume that the connection attempts
> > simply was blocked in different ways by the wall. Could you try to specify
> > different IP addresses of code.google.com explicitly and see if they simply are
> > blocked in different ways?
> $ hg clone https://74.125.71.100/p/lonote/
> abort: error: Connection reset by peer
> 
> This is just the second situation. And the other two also occur with explicit
> IP addresses.

Is the same IP address able to fail in different ways?

> > I wonder why the first connections stopped after 30 seconds. Can you try
> >   tcpdump -n port 443 or icmp
> > and see if something reports failure back over ICMP?
> There seems no ICMP report.

So what makes the connection fail after 30 seconds? When I configure my router to reproduce the 'no response at all' scenario it keeps retrying much longer. Do you have any special configuration on your machine? Or is it plain f16?

> $ python expect_Reset.py 
> Traceback (most recent call last):
>   File "expect_Reset.py", line 8, in <module>
>     ssl_sock.connect(('code.google.com', 443))
>   File "/usr/lib64/python2.7/ssl.py", line 322, in connect
>     self._real_connect(addr, False)
>   File "/usr/lib64/python2.7/ssl.py", line 315, in _real_connect
>     raise e
> socket.error: [Errno 110] Connection timed out
> 
> Result of the second script:
> 
> $ python expect_NoneType.py 
> Traceback (most recent call last):
>   File "expect_NoneType.py", line 6, in <module>
>     s.connect(('code.google.com', 443))
>   File "/usr/lib64/python2.7/socket.py", line 224, in meth
>     return getattr(self._sock,name)(*args)
> socket.error: [Errno 110] Connection timed out
> 
> Since the Firewall is not static, the result may differ like the three
> situations.

Hmm. That seems inconsistent and contradicts my theory of what the problem is. I thought I had found a way to reproduce the problem here ... bat apprently not.

Are these behaviours as consistently reproducible as the 'hg clone' failure?

How long time does it take before they fail?

Comment 12 Mads Kiilerich 2012-01-06 13:22:09 UTC
(In reply to comment #11)
> 'hg clone' is automatically failing over to the next IP address?

Ok, yes it is.

> One 'hg clone https://...' invocation will retry several IP addresses? That is
> very surprising and not indicated by source or documentation and I can't
> reproduce that.

Ok, Pythons socket.create_connection documention has 'room for improvement'.

Most of comment #11 is no longer relevant, but:

Is the behaviour of the test scripts 100% consistent?

Can you do 
  for i in $(seq 100); do echo $i; python expect_Reset.py; done
and
  for i in $(seq 100); do echo $i; python expect_None.py; done

and see if you ever get anything else than timeout?

Comment 13 Robin Lee 2012-01-06 13:39:15 UTC
A connection reset result of the first script:

Traceback (most recent call last):
  File "expect_Reset.py", line 8, in <module>
    ssl_sock.connect(('code.google.com', 443))
  File "/usr/lib64/python2.7/ssl.py", line 322, in connect
    self._real_connect(addr, False)
  File "/usr/lib64/python2.7/ssl.py", line 315, in _real_connect
    raise e
socket.error: [Errno 104] Connection reset by peer

A connection reset result of the second script:
Traceback (most recent call last):
  File "expect_NoneType.py", line 7, in <module>
    ssl_sock = ssl.wrap_socket(s)
  File "/usr/lib64/python2.7/ssl.py", line 372, in wrap_socket
    ciphers=ciphers)
  File "/usr/lib64/python2.7/ssl.py", line 134, in __init__
    self.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 296, in do_handshake
    self._sslobj.do_handshake()
socket.error: [Errno 104] Connection reset by peer


There are also many timeouts. No other kind of result was shown during the 100 tries.

Comment 14 Mads Kiilerich 2012-01-06 14:13:36 UTC
That was fast. I assume that means that they all fail instantaneously? Does that mean that a tcpdump today shows immediate responses from the firewall? But 'hg clone' still hangs for a long time? 

(Logging/debugging of the hanging 'hg clone' can be improved temporarily with
--- /usr/lib64/python2.7/socket.py.org	2012-01-06 14:46:01.128099111 +0100
+++ /usr/lib64/python2.7/socket.py	2012-01-06 14:13:14.732211105 +0100
@@ -551,6 +551,7 @@
     host, port = address
     err = None
     for res in getaddrinfo(host, port, 0, SOCK_STREAM):
+        print res
         af, socktype, proto, canonname, sa = res
         sock = None
         try:
@@ -564,6 +565,7 @@
 
         except error as _:
             err = _
+            print err
             if sock is not None:
                 sock.close()
)


Do you see the problem - and slow connects - with 100 times:

import socket, ssl
s = socket.create_connection(('code.google.com', 443))
ssl_sock = ssl.wrap_socket(s)
ssl_sock.getpeercert(True)

- which essentially is what Mercurial do. ?

Comment 15 Robin Lee 2012-01-06 15:15:23 UTC
Created attachment 551168 [details]
Output of the script in #c14

Comment 16 Mads Kiilerich 2012-01-06 15:40:16 UTC
(In reply to comment #15)

Ok, that shows that something like 39% of the connect attempts times out after 60 seconds (because they don't get any response at all, like the first 3 attempts in your first tcpdump). The other 61% gets a 'connection reset' (like in the 4th attempt in your first tcpdump) (which stops the create_connection looping). But we see the reset while doing the handshake, so the getpeername that failed for Mercurial must have succeeded. I guess that is because the RESET from the peer haven't been received yet. Can you try with an extra delay:

import socket, ssl, time
s = socket.create_connection(('code.google.com', 443))
time.sleep(1)
ssl_sock = ssl.wrap_socket(s)
ssl_sock.getpeercert(True)


Do 'hg clone' with the debug annotation in ssl.py also show this pattern with something like 39% timeout+failover?

Comment 17 Robin Lee 2012-01-06 15:59:31 UTC
With the delay, the output all turns to something like:

(2, 1, 6, '', ('74.125.71.101', 443))
[Errno 110] Connection timed out
(2, 1, 6, '', ('74.125.71.100', 443))
[Errno 110] Connection timed out
(2, 1, 6, '', ('74.125.71.102', 443))
[Errno 110] Connection timed out
(2, 1, 6, '', ('74.125.71.139', 443))
Traceback (most recent call last):
  File "socket-create_connection.py", line 8, in <module>
    ssl_sock.getpeercert(True)
  File "/usr/lib64/python2.7/ssl.py", line 172, in getpeercert
    return self._sslobj.peer_certificate(binary_form)
AttributeError: 'NoneType' object has no attribute 'peer_certificate'

real    3m10.466s
user    0m0.040s
sys     0m0.009s


> Do 'hg clone' with the debug annotation in ssl.py also show this pattern with
> something like 39% timeout+failover?
hg should be in the same pattern.

Comment 18 Mads Kiilerich 2012-01-06 16:15:34 UTC
Excellent, thank you. I think that makes the problem and its impact fully understood.

The crash will be fixed in the next Mercurial release Februar 1st which will be included in Fedora 17.

It will obviously still not be possible to bypass the wall. The fix will thus be purely cosmetic and I doubt it will be back-ported to Fedora 16.

Comment 19 Robin Lee 2012-01-07 05:40:43 UTC
I am happy to test it before final release.

Comment 20 Mads Kiilerich 2012-01-10 12:38:05 UTC
It has been fixed in upstream Mercurial in http://selenic.com/repo/hg/rev/0cc4ad757c77 - and improved further in later changesets.

You can test it with something like:
sudo yum-builddep mercurial
hg clone http://selenic.com/hg
cd hg
make local
./hg clone https://... /tmp/...

Comment 21 Robin Lee 2012-01-10 13:23:36 UTC
No exception is thrown now.

Comment 22 Mads Kiilerich 2012-01-10 13:28:36 UTC
Thanks for helping with the debugging and testing!

Comment 23 cnangel 2012-02-14 08:15:53 UTC
hg clone somethings from googlecode.com

Package: mercurial-1.9.3-1.fc16
Architecture: i686
OS Release: Fedora release 16 (Verne)


Note You need to log in before you can comment on or make changes to this bug.