Created attachment 519993 [details] Script to reproduce problem. Description of problem: Trying to send a 'large' message from the python-qpid client to the broker results in the connection hanging and other clients not being able to connect to the broker. Version-Release number of selected component (if applicable): I've tried it against the following brokers: qpid-cpp-server-0.10-8 qpid-cpp-server-0.7.946106-28.el5 Client package is python-qpid-0.10-1. How reproducible: Everytime Steps to Reproduce: 1.Create a queue with the following options { create:receiver, node: { type: queue, durable:False, x-declare: {exclusive: True, auto-delete:True } } } 2. Send a python object to this address which is 15k on disk (see attached script) Actual results: Message does not send Also it seems to render other clients unable to connect. They hang at the connection.open() Expected results: Message is sent Additional info: I've tried sending small messages, they send fine. I have not yet experimented so see where the cut off point is. I've also included the small script that I used to reproduce the error. When I hit Ctrl-C to kill the script this is the traceback I get Traceback (most recent call last): File "/tmp/test_beaker_qpid_client.py", line 39, in <module> snd.send(m_) File "<string>", line 6, in send File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 861, in send self.sync() File "<string>", line 6, in sync File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 872, in sync if not self._ewait(lambda: self.acked >= mno, timeout=timeout): File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait result = self.session._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait result = self.connection._ewait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait result = self._wait(lambda: self.error or predicate(), timeout) File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait return self._waiter.wait(predicate, timeout=timeout) File "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 57, in wait self.condition.wait(3) File "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait sw.wait(timeout) File "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait ready, _, _ = select([self], [], [], timeout)
The hosts we have managed to reproduce the problem on are rhel 5 xen guests with the above versions of qpid. If you need any more information about the guests or the host hardware they are running on I can provide it
Fixed upstream as http://svn.apache.org/viewvc?view=rev&rev=1162060
I've tried to reproduce this problem on RHEL6.1 (i386). The python-qpid client was, indeed, unable to send message this big. Further testing revealed, though, that there was no connection hanging. That is, other clients were able to send and receive (average-sized) messages. In addition, when I changed sasl_mechanisms from 'GSSAPI' to 'ANONYMOUS' in the attached script, the exact same huge message was sent successfully. 'PLAIN' worked as well. So, the problem might have something to do with Kerberos. Package versions used for testing: python-qpid-0.10-1.el6.noarch python-qpid-qmf-0.10-10.el6.i686 qpid-cpp-client-0.10-6.el6.i686 qpid-cpp-client-devel-0.10-6.el6.i686 qpid-cpp-client-devel-docs-0.10-6.el6.noarch qpid-cpp-server-0.10-6.el6.i686 qpid-cpp-server-devel-0.10-6.el6.i686 qpid-cpp-server-store-0.10-6.el6.i686 qpid-cpp-server-xml-0.10-6.el6.i686 qpid-java-client-0.10-10.el6.noarch qpid-java-common-0.10-10.el6.noarch qpid-java-example-0.10-10.el6.noarch qpid-java-jca-0.10-10.el6.noarch qpid-qmf-0.10-10.el6.i686 qpid-qmf-debuginfo-0.10-10.el6.i686 qpid-qmf-devel-0.10-10.el6.i686 qpid-tools-0.10-5.el6.noarch I'm going to try and reproduce it on RHEL5.
(In reply to comment #5) > I've tried to reproduce this problem on RHEL6.1 (i386). The python-qpid client > was, indeed, unable to send message this big. > Further testing revealed, though, that there was no connection hanging. That > is, other clients were able to send and receive (average-sized) messages. > I hope I wasn't mistaken about the connection.open() comment. I may have to confirm that.
Testing this on RHEL5 showed the exact same results as on RHEL6: python client gets stuck when trying to send a huge message. Both main architectures (i386 and x86_64) were tested. Package versions for RHEL5: python-qpid-0.10-1.el5 python-qpid-qmf-0.10-10.el5 qpid-cpp-client-0.10-8.el5 qpid-cpp-client-devel-0.10-8.el5 qpid-cpp-client-devel-docs-0.10-8.el5 qpid-cpp-client-ssl-0.10-8.el5 qpid-cpp-mrg-debuginfo-0.10-8.el5 qpid-cpp-server-0.10-8.el5 qpid-cpp-server-cluster-0.10-8.el5 qpid-cpp-server-devel-0.10-8.el5 qpid-cpp-server-ssl-0.10-8.el5 qpid-cpp-server-store-0.10-8.el5 qpid-cpp-server-xml-0.10-8.el5 qpid-java-client-0.10-9.el5 qpid-java-common-0.10-9.el5 qpid-java-example-0.10-9.el5 qpid-java-jca-0.10-9.el5 qpid-qmf-0.10-10.el5 qpid-qmf-debuginfo-0.10-10.el5 qpid-qmf-devel-0.10-10.el5 qpid-tools-0.10-6.el5 It seems to me that sending such a big message does not prevent other clients from sending and receiving messages, per se.But it does lead to the qpidd process using up to 100% of CPU. And the server doesn't release it even after the "stuck" client gets interrupted. After I tried to run the test script 4 times in a row, all four CPUs on my test machine got used up and the server was unable to process even small messages any more. BTW, the 'magical number' seems to be equal to 4085. That is, if the message is 4085 bytes long (or is bigger than that), then python client is unable to send it using GSSAPI. cpp client, on the other hand, doesn't seem to have this problem. I tested sending up to 10M messages without discovering any issues.
The issue has been fixed. Tested on RHEL5.7 / RHEL6.1 i386 / x86_64 on packages: RHEL5.7: python-qpid-0.10-1.el5 python-qpid-qmf-0.10-10.el5 qpid-cpp-client-0.10-9.el5 qpid-cpp-client-devel-0.10-9.el5 qpid-cpp-client-devel-docs-0.10-9.el5 qpid-cpp-client-ssl-0.10-9.el5 qpid-cpp-mrg-debuginfo-0.10-9.el5 qpid-cpp-server-0.10-9.el5 qpid-cpp-server-cluster-0.10-9.el5 qpid-cpp-server-devel-0.10-9.el5 qpid-cpp-server-ssl-0.10-9.el5 qpid-cpp-server-store-0.10-9.el5 qpid-cpp-server-xml-0.10-9.el5 qpid-java-client-0.10-11.el5 qpid-java-common-0.10-11.el5 qpid-java-example-0.10-11.el5 qpid-java-jca-0.10-11.el5 qpid-java-jca-zip-0.10-11.el5 qpid-qmf-0.10-10.el5 qpid-qmf-debuginfo-0.10-10.el5 qpid-qmf-devel-0.10-10.el5 qpid-tools-0.10-6.el5 RHEL6.1: python-qpid-0.10-1.el6 python-qpid-qmf-0.10-10.el6 qpid-cpp-client-0.10-7.el6_1 qpid-cpp-client-devel-0.10-7.el6_1 qpid-cpp-client-devel-docs-0.10-7.el6_1 qpid-cpp-server-0.10-7.el6_1 qpid-cpp-server-devel-0.10-7.el6_1 qpid-cpp-server-store-0.10-7.el6_1 qpid-cpp-server-xml-0.10-7.el6_1 qpid-java-client-0.10-11.el6 qpid-java-common-0.10-11.el6 qpid-java-example-0.10-11.el6 qpid-java-jca-0.10-11.el6 qpid-java-jca-zip-0.10-11.el6 qpid-qmf-0.10-10.el6 qpid-qmf-debuginfo-0.10-10.el6 qpid-qmf-devel-0.10-10.el6 qpid-tools-0.10-5.el6 -> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Prior to this update, when a large message (over 4KB) was sent from the python-qpid client to the broker, the connection became unresponsive and other clients were unable to connect to the broker. This bug has been fixed, and clients no longer hang in the described scenario.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1399.html