Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 950501

Summary: python clients throw "[Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry"
Product: Red Hat Enterprise MRG Reporter: Stuart Auchterlonie <sauchter>
Component: python-qpidAssignee: Ken Giusti <kgiusti>
Status: CLOSED ERRATA QA Contact: mick <mgoulish>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.3CC: jross, kgiusti, lzhaldyb, mgoulish, rbinkhor
Target Milestone: 3.0Keywords: OtherQA
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: python-qpid-0.22-3.el6, python-qpid-0.22-2.el5 Doc Type: Bug Fix
Doc Text:
In circumstances when the python client code tried to re-write queued data to a previously blocked socket, it was discovered that the python client code was changing the pointer used prior to the socket blocking. The underlying OpenSSL library detected that the pointer had changed since the last call to write, and threw an exception. The python client is now modified to keep the original pointer constant when new data is added. When the socket becomes writable again, the address of the pointer passed to the socket is identical to the address passed prior to the socket blocking, as required by OpenSSL.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-09-24 15:07:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 785156    
Attachments:
Description Flags
Reproducer none

Description Stuart Auchterlonie 2013-04-10 11:13:57 UTC
See upstream: https://issues.apache.org/jira/browse/QPID-3175

Description of problem:

when using the ssl transport layer in Python clients, the client is sending messages in burst to the broker in asynchronous manner (sync=False in Sender.send) the exception is occasionally thrown with the following output:

[Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry

The working theory is that when the client's socket gets full, the next underlying SSLSocket.write() throws the SSLError (with SSL_ERROR_WANT_WRITE as a code) and this isn't handled properly

Setting the socket to blocking is one possible workaround.

Version-Release number of selected component (if applicable):

python-qpid-0.18-4.el6

How reproducible:

"sometimes"

Steps to Reproduce:
1. Push a lot of data at the ssl socket such that a blocking socket would block, but the non blocking socket returns the relevant error.
2.
3.
  
Actual results:

[Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry

Expected results:

The code handles the full socket gracefully or uses blocking sockets if that is appropriate

Additional info:

Comment 1 Ken Giusti 2013-05-20 17:54:16 UTC
I suspect the root of the problem is actually due to this:

http://bugs.python.org/issue8240

which hasn't been fixed upstream to date.

I haven't been able to reproduce this, but I suspect that the SSL_ERROR_WANT_WRITE is being raised at some point, and the output buffer is being updated before the write is retried.  This would cause python to re-allocate the output buffer, which would change the underlying pointer, which would then cause the SSL_Write to fail with the given exception.

While we _could_ re-write the python client's SSL code to save a reference to the original buffer object and re-supply the write()/recv() call with the same arguments on re-try as required, we're still not guaranteed that the python implementation will ensure that the same physical pointer is used.

I'd much rather we turn on blocking mode.  AFAIK, non-ssl TCP connections use blocking mode.  Why does the SSL implementation use non-blocking?

Justin - Rafi is the original author of this code, could he weigh in on why non-blocking was used?

-K

Comment 2 Ken Giusti 2013-05-20 19:13:33 UTC
Created attachment 750695 [details]
Reproducer

Attached a simple reproducer.

If I run three or four of these in separate shells, each in a loop, it will trigger the exception.

Comment 3 Ken Giusti 2013-05-22 19:27:00 UTC
Fix submitted upstream:

http://svn.apache.org/viewvc?view=revision&revision=1485331

Comment 4 mick 2013-07-26 09:18:35 UTC
Ken -- How frequently should I see the error when I run several of your reproducers in separate windows?

Comment 5 Ken Giusti 2013-07-26 14:18:47 UTC
Hi Mick,

From what I can recall - it was very hard to reproduce.

Having said that - having it happen once is certainly enough.  I (hope) that we should not get this error once the fix is in place.

But there's more:  this fix is actually a work-around for a bug in python.  They've since fixed this bug, but I'll be darned if I can tell what release of python the fix is in :(

So, if you're NOT seeing a reproducer using the latest python for your environment, it -could- be due to the bug being fixed in python.  I suspect that's probably not likely, but should be confirmed before wasting too much time.

The python bug info is here:

http://bugs.python.org/issue8240

looks like it was fixed back in May, but I can't find any info about which release(s) it is in in that bug report.

In any case, our work-around is still necessary to deal with existing (unpatched) python environments.

Comment 6 mick 2013-07-30 16:53:16 UTC
Notes on reproduing this.


===========================================
  Making the SSL info, for the broker:
===========================================

#! /bin/bash

#----------------------------------------------------
# create certificate and key databases with single,
# simple,self-signed certificate in it   #----------------------------------------------------

CERT_DIR=test_cert_dir
CERT_PW_FILE=cert.password
TEST_HOSTNAME=127.0.0.1

rm -rf ${CERT_DIR} ${CERT_PW_FILE}
mkdir ${CERT_DIR}

echo password > ${CERT_PW_FILE}

certutil -N -d ${CERT_DIR} -f ${CERT_PW_FILE}
certutil -S -d ${CERT_DIR} -n ${TEST_HOSTNAME} \
         -s "CN=${TEST_HOSTNAME}" -t "CT,," -x \
         -f ${CERT_PW_FILE} -z /usr/bin/certutil 2> /dev/null



======================================
Running the broker
======================================
qpidd -d                                            \
  --port     5801                                   \
  --ssl-port 5802                                   \
  --load-module /usr/lib64/qpid/daemon/ssl.so       \
  --require-encryption                              \
  --auth no                                         \
  --ssl-cert-password-file /home/mick/cert.password \
  --ssl-cert-db /home/mick/test_cert_dir            \
  --ssl-cert-name 127.0.0.1



=============================================
Repro script
( May need several in separate CLIs. )
=============================================

#!/usr/bin/env python
#
from qpid.messaging import *

conn = Connection( "amqps://127.0.0.1:5671" )
try:
  conn.open()
  ssn = conn.session()
  snd = ssn.sender( "ken; {create: always}" )
  snd.send( u"m"*1024*1024*4, sync=False )
  snd.send( u"m"*1024*1024*2, sync=False )
  snd.send( u"m"*1024*1024*4, sync=False )
  snd.send( u"m"*1024*1024*2, sync=False )
  print "X"
except SendError, e:
  print e
except KeyboardInterrupt:
  pass

conn.close()

Comment 7 mick 2013-07-30 18:37:20 UTC


"stable" packages on bug-repro machine

{

cyrus-sasl-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-devel-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64
python-qpid-0.18-4.el6.noarch
python-qpid-qmf-0.18-15.el6.x86_64
python-saslwrapper-0.18-1.el6_3.x86_64
qpid-cpp-client-0.18-14.el6.x86_64
qpid-cpp-client-devel-0.18-14.el6.x86_64
qpid-cpp-client-devel-docs-0.18-14.el6.noarch
qpid-cpp-client-rdma-0.18-14.el6.x86_64
qpid-cpp-client-ssl-0.18-14.el6.x86_64
qpid-cpp-debuginfo-0.14-22.el6_3.x86_64
qpid-cpp-server-0.18-14.el6.x86_64
qpid-cpp-server-cluster-0.18-14.el6.x86_64
qpid-cpp-server-devel-0.18-14.el6.x86_64
qpid-cpp-server-rdma-0.18-14.el6.x86_64
qpid-cpp-server-ssl-0.18-14.el6.x86_64
qpid-cpp-server-store-0.18-14.el6.x86_64
qpid-cpp-server-xml-0.18-14.el6.x86_64
qpid-java-client-0.18-7.el6.noarch
qpid-java-common-0.18-7.el6.noarch
qpid-java-example-0.18-7.el6.noarch
qpid-jca-0.18-8.el6.noarch
qpid-jca-xarecovery-0.18-8.el6.noarch
qpid-proton-c-0.4-2.2.el6.x86_64
qpid-proton-c-devel-0.4-2.2.el6.x86_64
qpid-qmf-0.18-15.el6.x86_64
qpid-qmf-debuginfo-0.14-14.el6_3.x86_64
qpid-qmf-devel-0.18-15.el6.x86_64
qpid-tests-0.18-2.el6.noarch
qpid-tools-0.18-8.el6.noarch
saslwrapper-0.18-1.el6_3.x86_64
saslwrapper-devel-0.18-1.el6_3.x86_64
}






latest packages on 32-bit machine

---------------------------------------------------------

{

cyrus-sasl-2.1.23-13.el6_3.1.i686
cyrus-sasl-devel-2.1.23-13.el6_3.1.i686
cyrus-sasl-gssapi-2.1.23-13.el6_3.1.i686
cyrus-sasl-lib-2.1.23-13.el6_3.1.i686
cyrus-sasl-md5-2.1.23-13.el6_3.1.i686
cyrus-sasl-plain-2.1.23-13.el6_3.1.i686
python-qpid-0.22-4.el6.noarch
python-qpid-qmf-0.22-7.el6.i686
python-saslwrapper-0.22-3.el6.i686
qpid-cpp-client-0.22-8.el6.i686
qpid-cpp-client-devel-0.22-8.el6.i686
qpid-cpp-client-devel-docs-0.22-8.el6.noarch
qpid-cpp-client-rdma-0.22-8.el6.i686
qpid-cpp-client-ssl-0.22-8.el6.i686
qpid-cpp-debuginfo-0.22-8.el6.i686
qpid-cpp-server-0.22-8.el6.i686
qpid-cpp-server-devel-0.22-8.el6.i686
qpid-cpp-server-ha-0.22-8.el6.i686
qpid-cpp-server-rdma-0.22-8.el6.i686
qpid-cpp-server-ssl-0.22-8.el6.i686
qpid-cpp-server-store-0.22-8.el6.i686
qpid-cpp-server-xml-0.22-8.el6.i686
qpid-cpp-tar-0.22-8.el6.noarch
qpid-java-client-0.22-5.el6.noarch
qpid-java-common-0.22-5.el6.noarch
qpid-java-example-0.22-5.el6.noarch
qpid-proton-c-0.4-2.2.el6.i686
qpid-proton-c-devel-0.4-2.2.el6.i686
qpid-proton-debuginfo-0.4-2.2.el6.i686
qpid-qmf-0.22-7.el6.i686
qpid-qmf-debuginfo-0.22-7.el6.i686
qpid-qmf-devel-0.22-7.el6.i686
qpid-snmpd-1.0.0-12.el6.i686
qpid-snmpd-debuginfo-1.0.0-12.el6.i686
qpid-tests-0.22-4.el6.noarch
qpid-tools-0.22-3.el6.noarch
saslwrapper-0.22-3.el6.i686

}






latest packages on 64-bit machine

--------------------------------------------------------------------

{

cyrus-sasl-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-devel-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64
cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64
python-qpid-0.22-4.el6.noarch
python-qpid-qmf-0.22-7.el6.x86_64
python-saslwrapper-0.22-3.el6.x86_64
qpid-cpp-client-0.22-8.el6.x86_64
qpid-cpp-client-devel-0.22-8.el6.x86_64
qpid-cpp-client-devel-docs-0.22-8.el6.noarch
qpid-cpp-client-rdma-0.22-8.el6.x86_64
qpid-cpp-client-ssl-0.22-8.el6.x86_64
qpid-cpp-debuginfo-0.22-8.el6.x86_64
qpid-cpp-server-0.22-8.el6.x86_64
qpid-cpp-server-devel-0.22-8.el6.x86_64
qpid-cpp-server-ha-0.22-8.el6.x86_64
qpid-cpp-server-rdma-0.22-8.el6.x86_64
qpid-cpp-server-ssl-0.22-8.el6.x86_64
qpid-cpp-server-store-0.22-8.el6.x86_64
qpid-cpp-server-xml-0.22-8.el6.x86_64
qpid-cpp-tar-0.22-8.el6.noarch
qpid-java-client-0.22-5.el6.noarch
qpid-java-common-0.22-5.el6.noarch
qpid-java-example-0.22-5.el6.noarch
qpid-proton-c-0.4-2.2.el6.x86_64
qpid-proton-c-devel-0.4-2.2.el6.x86_64
qpid-proton-debuginfo-0.4-2.2.el6.x86_64
qpid-qmf-0.22-7.el6.x86_64
qpid-qmf-debuginfo-0.22-7.el6.x86_64
qpid-qmf-devel-0.22-7.el6.x86_64
qpid-snmpd-1.0.0-12.el6.x86_64
qpid-snmpd-debuginfo-1.0.0-12.el6.x86_64
qpid-tests-0.22-4.el6.noarch
qpid-tools-0.22-3.el6.noarch
saslwrapper-0.22-3.el6.x86_64
saslwrapper-devel-0.22-3.el6.x86_64

}

Comment 9 errata-xmlrpc 2014-09-24 15:07:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html