Description of problem: Calling receiver.fetch(timeout=10) in a loop, when network drops packages for a while, this causes uncaught exception KeyError. High prio/sev due to https://bugzilla.redhat.com/show_bug.cgi?id=1169416#c5. Upstream qpid fails with infinite loop (see QPID-6297 for details). Version-Release number of selected component (if applicable): python-qpid-0.22-19.el6.noarch python-qpid-0.30-3.el6.noarch How reproducible: 100% Steps to Reproduce: 0. decrease TCP keepalives for faster reproducer: echo 2 > /proc/sys/net/ipv4/tcp_retries2 1. Run below script against qpidd: #!/usr/bin/env python from qpid.messaging import * import datetime conn = Connection("localhost:5672", reconnect=1) timeout=10 try: conn.open() sess = conn.session() recv = sess.receiver("testQueue;{create:always}") while (1): print "%s: before fetch, timeout=%s" %(datetime.datetime.now(), timeout) msg = Message() try: msg = recv.fetch(timeout=timeout) except ReceiverError, e: print e print "%s: after fetch, msg=%s" %(datetime.datetime.now(), msg) sess.close() except ReceiverError, e: print e except KeyboardInterrupt: pass conn.close() 2. Simulate network outage: iptables -A OUTPUT -p tcp --dport 5672 -j REJECT; date 3. Once python script writes "No handlers could be found for logger "qpid.messaging"", flush iptables (iptables -F) 4. Wait 10 seconds Actual results: python-qpid-0.22* raises exception: File "/root/Python_MRG/test_gofer_like.py", line 18, in <module> msg = recv.fetch(timeout=timeout) File "<string>", line 6, in fetch File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 1041, in fetch self._ecwait(lambda: not self.draining) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait result = self._ewait(lambda: self.closed or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 993, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 580, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 219, in _ewait self.check_error() File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 212, in check_error raise e InternalError: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 663, in write op.dispatch(self) File "/usr/lib/python2.7/site-packages/qpid/ops.py", line 84, in dispatch getattr(target, handler)(self, *args) File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 888, in do_session_detached sst = self._sessions.pop(dtc.channel) KeyError: 0 python-qpid-0.30* raises exception: File "Python_MRG/test_gofer_like.py", line 18, in <module> msg = recv.fetch(timeout=timeout) File "<string>", line 6, in fetch File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1066, in fetch self._ecwait(lambda: not self.draining) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 50, in _ecwait result = self._ewait(lambda: self.closed or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 1018, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.6/site-packages/qpid/messaging/endpoints.py", line 595, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) .. RuntimeError: maximum recursion depth exceeded Expected results: No exception raised Additional info:
The infinite recursion in python-qpid-0.30* is solved by removing the patch for https://issues.apache.org/jira/browse/QPID-5852. That patch is being removed for 3.2. Still looking into the KeyError exception.
Created attachment 1028385 [details] Check to see if channel is associated with a session before using it https://reviews.apache.org/r/34560/
Created attachment 1028876 [details] set connection.reconnect to false on socket errors From the review: Alan Conway wrote: >There is something more going on here. If the client is re-connecting then it >should be re-connecting on an entirely new connection so sessions should not be >able to clash, the new connection should have no session on it. >It sounds to me like somehow the client is re-using the original connection >after you unblock it at the firewall, which is definitely not right - there >could be all kinds of invalid state in that connections sessions. If the client >decides a connection is faulty it should definitively close it and forget it >before re-connecting and re-establishing sessions. You need to track down >how/why it is managing to use the old connection after it has failed. The engine is reusing the connection because the connection has a reconnect argument. So the new patch will not try and reconnect using the same connection on a socket error. This has the effect of raising the error back to the client.
Created attachment 1037366 [details] Enable force flag in SessionManager::attach Enables the force flag in SessionManager::attach and sends force=True in python engine if the engine is started with a connection that has existing sessions.
Committed r1684716 http://svn.apache.org/viewvc?view=revision&revision=1684716
This issue was tested on RHEL 6 i386 && x86_64 and RHEL 7 x86_64 with following packages: python-qpid-0.34-2 python-qpid-proton-0.10-2 python-qpid-proton-doc-0.10-2 python-qpid-qmf-0.34-4 qpid-cpp-client-0.34-5 qpid-cpp-client-devel-0.34-5 qpid-cpp-client-devel-docs-0.34-5 qpid-cpp-client-rdma-0.34-5 qpid-cpp-debuginfo-0.34-5 qpid-cpp-server-0.34-5 qpid-cpp-server-devel-0.34-5 qpid-cpp-server-ha-0.34-5 qpid-cpp-server-linearstore-0.34-5 qpid-cpp-server-rdma-0.34-5 qpid-cpp-server-xml-0.34-5 qpid-java-client-0.30-7 qpid-java-common-0.30-7 qpid-java-example-0.30-7 qpid-jms-client-0.5.0-2 qpid-jms-client-docs-0.5.0-2 qpid-jms-client-examples-0.5.0-2 qpid-jms-client-maven-repo-0.5.0-2 qpid-proton-c-0.10-2 qpid-proton-java-0.10.0-1 qpid-proton-java-maven-repo-0.10.0-1 qpid-qmf-0.34-4 qpid-tools-0.34-1 -> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-1879.html