Red Hat Bugzilla – Bug 950501
python clients throw "[Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry"
Last modified: 2014-09-24 11:07:22 EDT
See upstream: https://issues.apache.org/jira/browse/QPID-3175 Description of problem: when using the ssl transport layer in Python clients, the client is sending messages in burst to the broker in asynchronous manner (sync=False in Sender.send) the exception is occasionally thrown with the following output: [Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry The working theory is that when the client's socket gets full, the next underlying SSLSocket.write() throws the SSLError (with SSL_ERROR_WANT_WRITE as a code) and this isn't handled properly Setting the socket to blocking is one possible workaround. Version-Release number of selected component (if applicable): python-qpid-0.18-4.el6 How reproducible: "sometimes" Steps to Reproduce: 1. Push a lot of data at the ssl socket such that a blocking socket would block, but the non blocking socket returns the relevant error. 2. 3. Actual results: [Errno 1] _ssl.c:1217: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry Expected results: The code handles the full socket gracefully or uses blocking sockets if that is appropriate Additional info:
I suspect the root of the problem is actually due to this: http://bugs.python.org/issue8240 which hasn't been fixed upstream to date. I haven't been able to reproduce this, but I suspect that the SSL_ERROR_WANT_WRITE is being raised at some point, and the output buffer is being updated before the write is retried. This would cause python to re-allocate the output buffer, which would change the underlying pointer, which would then cause the SSL_Write to fail with the given exception. While we _could_ re-write the python client's SSL code to save a reference to the original buffer object and re-supply the write()/recv() call with the same arguments on re-try as required, we're still not guaranteed that the python implementation will ensure that the same physical pointer is used. I'd much rather we turn on blocking mode. AFAIK, non-ssl TCP connections use blocking mode. Why does the SSL implementation use non-blocking? Justin - Rafi is the original author of this code, could he weigh in on why non-blocking was used? -K
Created attachment 750695 [details] Reproducer Attached a simple reproducer. If I run three or four of these in separate shells, each in a loop, it will trigger the exception.
Fix submitted upstream: http://svn.apache.org/viewvc?view=revision&revision=1485331
Ken -- How frequently should I see the error when I run several of your reproducers in separate windows?
Hi Mick, From what I can recall - it was very hard to reproduce. Having said that - having it happen once is certainly enough. I (hope) that we should not get this error once the fix is in place. But there's more: this fix is actually a work-around for a bug in python. They've since fixed this bug, but I'll be darned if I can tell what release of python the fix is in :( So, if you're NOT seeing a reproducer using the latest python for your environment, it -could- be due to the bug being fixed in python. I suspect that's probably not likely, but should be confirmed before wasting too much time. The python bug info is here: http://bugs.python.org/issue8240 looks like it was fixed back in May, but I can't find any info about which release(s) it is in in that bug report. In any case, our work-around is still necessary to deal with existing (unpatched) python environments.
Notes on reproduing this. =========================================== Making the SSL info, for the broker: =========================================== #! /bin/bash #---------------------------------------------------- # create certificate and key databases with single, # simple,self-signed certificate in it #---------------------------------------------------- CERT_DIR=test_cert_dir CERT_PW_FILE=cert.password TEST_HOSTNAME=127.0.0.1 rm -rf ${CERT_DIR} ${CERT_PW_FILE} mkdir ${CERT_DIR} echo password > ${CERT_PW_FILE} certutil -N -d ${CERT_DIR} -f ${CERT_PW_FILE} certutil -S -d ${CERT_DIR} -n ${TEST_HOSTNAME} \ -s "CN=${TEST_HOSTNAME}" -t "CT,," -x \ -f ${CERT_PW_FILE} -z /usr/bin/certutil 2> /dev/null ====================================== Running the broker ====================================== qpidd -d \ --port 5801 \ --ssl-port 5802 \ --load-module /usr/lib64/qpid/daemon/ssl.so \ --require-encryption \ --auth no \ --ssl-cert-password-file /home/mick/cert.password \ --ssl-cert-db /home/mick/test_cert_dir \ --ssl-cert-name 127.0.0.1 ============================================= Repro script ( May need several in separate CLIs. ) ============================================= #!/usr/bin/env python # from qpid.messaging import * conn = Connection( "amqps://127.0.0.1:5671" ) try: conn.open() ssn = conn.session() snd = ssn.sender( "ken; {create: always}" ) snd.send( u"m"*1024*1024*4, sync=False ) snd.send( u"m"*1024*1024*2, sync=False ) snd.send( u"m"*1024*1024*4, sync=False ) snd.send( u"m"*1024*1024*2, sync=False ) print "X" except SendError, e: print e except KeyboardInterrupt: pass conn.close()
"stable" packages on bug-repro machine { cyrus-sasl-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-devel-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 python-qpid-0.18-4.el6.noarch python-qpid-qmf-0.18-15.el6.x86_64 python-saslwrapper-0.18-1.el6_3.x86_64 qpid-cpp-client-0.18-14.el6.x86_64 qpid-cpp-client-devel-0.18-14.el6.x86_64 qpid-cpp-client-devel-docs-0.18-14.el6.noarch qpid-cpp-client-rdma-0.18-14.el6.x86_64 qpid-cpp-client-ssl-0.18-14.el6.x86_64 qpid-cpp-debuginfo-0.14-22.el6_3.x86_64 qpid-cpp-server-0.18-14.el6.x86_64 qpid-cpp-server-cluster-0.18-14.el6.x86_64 qpid-cpp-server-devel-0.18-14.el6.x86_64 qpid-cpp-server-rdma-0.18-14.el6.x86_64 qpid-cpp-server-ssl-0.18-14.el6.x86_64 qpid-cpp-server-store-0.18-14.el6.x86_64 qpid-cpp-server-xml-0.18-14.el6.x86_64 qpid-java-client-0.18-7.el6.noarch qpid-java-common-0.18-7.el6.noarch qpid-java-example-0.18-7.el6.noarch qpid-jca-0.18-8.el6.noarch qpid-jca-xarecovery-0.18-8.el6.noarch qpid-proton-c-0.4-2.2.el6.x86_64 qpid-proton-c-devel-0.4-2.2.el6.x86_64 qpid-qmf-0.18-15.el6.x86_64 qpid-qmf-debuginfo-0.14-14.el6_3.x86_64 qpid-qmf-devel-0.18-15.el6.x86_64 qpid-tests-0.18-2.el6.noarch qpid-tools-0.18-8.el6.noarch saslwrapper-0.18-1.el6_3.x86_64 saslwrapper-devel-0.18-1.el6_3.x86_64 } latest packages on 32-bit machine --------------------------------------------------------- { cyrus-sasl-2.1.23-13.el6_3.1.i686 cyrus-sasl-devel-2.1.23-13.el6_3.1.i686 cyrus-sasl-gssapi-2.1.23-13.el6_3.1.i686 cyrus-sasl-lib-2.1.23-13.el6_3.1.i686 cyrus-sasl-md5-2.1.23-13.el6_3.1.i686 cyrus-sasl-plain-2.1.23-13.el6_3.1.i686 python-qpid-0.22-4.el6.noarch python-qpid-qmf-0.22-7.el6.i686 python-saslwrapper-0.22-3.el6.i686 qpid-cpp-client-0.22-8.el6.i686 qpid-cpp-client-devel-0.22-8.el6.i686 qpid-cpp-client-devel-docs-0.22-8.el6.noarch qpid-cpp-client-rdma-0.22-8.el6.i686 qpid-cpp-client-ssl-0.22-8.el6.i686 qpid-cpp-debuginfo-0.22-8.el6.i686 qpid-cpp-server-0.22-8.el6.i686 qpid-cpp-server-devel-0.22-8.el6.i686 qpid-cpp-server-ha-0.22-8.el6.i686 qpid-cpp-server-rdma-0.22-8.el6.i686 qpid-cpp-server-ssl-0.22-8.el6.i686 qpid-cpp-server-store-0.22-8.el6.i686 qpid-cpp-server-xml-0.22-8.el6.i686 qpid-cpp-tar-0.22-8.el6.noarch qpid-java-client-0.22-5.el6.noarch qpid-java-common-0.22-5.el6.noarch qpid-java-example-0.22-5.el6.noarch qpid-proton-c-0.4-2.2.el6.i686 qpid-proton-c-devel-0.4-2.2.el6.i686 qpid-proton-debuginfo-0.4-2.2.el6.i686 qpid-qmf-0.22-7.el6.i686 qpid-qmf-debuginfo-0.22-7.el6.i686 qpid-qmf-devel-0.22-7.el6.i686 qpid-snmpd-1.0.0-12.el6.i686 qpid-snmpd-debuginfo-1.0.0-12.el6.i686 qpid-tests-0.22-4.el6.noarch qpid-tools-0.22-3.el6.noarch saslwrapper-0.22-3.el6.i686 } latest packages on 64-bit machine -------------------------------------------------------------------- { cyrus-sasl-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-devel-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-gssapi-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-lib-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-md5-2.1.23-13.el6_3.1.x86_64 cyrus-sasl-plain-2.1.23-13.el6_3.1.x86_64 python-qpid-0.22-4.el6.noarch python-qpid-qmf-0.22-7.el6.x86_64 python-saslwrapper-0.22-3.el6.x86_64 qpid-cpp-client-0.22-8.el6.x86_64 qpid-cpp-client-devel-0.22-8.el6.x86_64 qpid-cpp-client-devel-docs-0.22-8.el6.noarch qpid-cpp-client-rdma-0.22-8.el6.x86_64 qpid-cpp-client-ssl-0.22-8.el6.x86_64 qpid-cpp-debuginfo-0.22-8.el6.x86_64 qpid-cpp-server-0.22-8.el6.x86_64 qpid-cpp-server-devel-0.22-8.el6.x86_64 qpid-cpp-server-ha-0.22-8.el6.x86_64 qpid-cpp-server-rdma-0.22-8.el6.x86_64 qpid-cpp-server-ssl-0.22-8.el6.x86_64 qpid-cpp-server-store-0.22-8.el6.x86_64 qpid-cpp-server-xml-0.22-8.el6.x86_64 qpid-cpp-tar-0.22-8.el6.noarch qpid-java-client-0.22-5.el6.noarch qpid-java-common-0.22-5.el6.noarch qpid-java-example-0.22-5.el6.noarch qpid-proton-c-0.4-2.2.el6.x86_64 qpid-proton-c-devel-0.4-2.2.el6.x86_64 qpid-proton-debuginfo-0.4-2.2.el6.x86_64 qpid-qmf-0.22-7.el6.x86_64 qpid-qmf-debuginfo-0.22-7.el6.x86_64 qpid-qmf-devel-0.22-7.el6.x86_64 qpid-snmpd-1.0.0-12.el6.x86_64 qpid-snmpd-debuginfo-1.0.0-12.el6.x86_64 qpid-tests-0.22-4.el6.noarch qpid-tools-0.22-3.el6.noarch saslwrapper-0.22-3.el6.x86_64 saslwrapper-devel-0.22-3.el6.x86_64 }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1296.html