Bug 733543 - Client freezes up when sending a large message
Summary: Client freezes up when sending a large message
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.0
Hardware: Unspecified
OS: Linux
high
high
Target Milestone: 2.0.10
: ---
Assignee: Gordon Sim
QA Contact: Leonid Zhaldybin
URL:
Whiteboard:
Depends On:
Blocks: 742040 744036 745231
TreeView+ depends on / blocked
 
Reported: 2011-08-25 23:33 UTC by Raymond Mancy
Modified: 2014-12-08 01:09 UTC (History)
11 users (show)

Fixed In Version: qpid-cpp-0.10-7.el6_1
Doc Type: Bug Fix
Doc Text:
Prior to this update, when a large message (over 4KB) was sent from the python-qpid client to the broker, the connection became unresponsive and other clients were unable to connect to the broker. This bug has been fixed, and clients no longer hang in the described scenario.
Clone Of:
Environment:
Last Closed: 2011-10-24 11:43:24 UTC
Target Upstream Version:


Attachments (Terms of Use)
Script to reproduce problem. (15.52 KB, application/octet-stream)
2011-08-25 23:33 UTC, Raymond Mancy
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1399 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 2.0 bug fix update 2011-10-24 11:43:08 UTC

Description Raymond Mancy 2011-08-25 23:33:40 UTC
Created attachment 519993 [details]
Script to reproduce problem.

Description of problem:

Trying to send a 'large' message from the python-qpid client to the broker
results in the connection hanging and other clients not being able to connect to the broker.

Version-Release number of selected component (if applicable):

I've tried it against the following brokers:
  qpid-cpp-server-0.10-8
  qpid-cpp-server-0.7.946106-28.el5

Client package is python-qpid-0.10-1.

How reproducible:

Everytime

Steps to Reproduce:
1.Create a queue with the following options
{ create:receiver, 
  node: { type: queue, durable:False, 
    x-declare: {exclusive: True, auto-delete:True } 
   } 
}

2. Send a python object to this address which is 15k on disk (see attached script)

  
Actual results:
Message does not send

Also it seems to render other clients unable to connect. They hang at the connection.open()

Expected results:
Message is sent

Additional info:
I've tried sending small messages, they send fine. I have not yet experimented
so see where the cut off point is.

I've also included the small script that I used to reproduce the error.

When I hit Ctrl-C to kill the script this is the traceback I get

Traceback (most recent call last):
  File "/tmp/test_beaker_qpid_client.py", line 39, in <module>
    snd.send(m_)
  File "<string>", line 6, in send
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 861, in send
    self.sync()
  File "<string>", line 6, in sync
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 872, in sync
    if not self._ewait(lambda: self.acked >= mno, timeout=timeout):
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 786, in _ewait
    result = self.session._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 553, in _ewait
    result = self.connection._ewait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 195, in _ewait
    result = self._wait(lambda: self.error or predicate(), timeout)
  File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 180, in _wait
    return self._waiter.wait(predicate, timeout=timeout)
  File "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 57, in wait
    self.condition.wait(3)
  File "/usr/lib/python2.7/site-packages/qpid/concurrency.py", line 96, in wait
    sw.wait(timeout)
  File "/usr/lib/python2.7/site-packages/qpid/compat.py", line 53, in wait
    ready, _, _ = select([self], [], [], timeout)

Comment 1 Graeme Gillies 2011-08-26 01:00:55 UTC
The hosts we have managed to reproduce the problem on are rhel 5 xen guests with the above versions of qpid.

If you need any more information about the guests or the host hardware they are running on I can provide it

Comment 2 Gordon Sim 2011-08-26 11:09:03 UTC
Fixed upstream as http://svn.apache.org/viewvc?view=rev&rev=1162060

Comment 5 Leonid Zhaldybin 2011-09-23 06:22:14 UTC
I've tried to reproduce this problem on RHEL6.1 (i386). The python-qpid client was, indeed, unable to send message this big.
Further testing revealed, though, that there was no connection hanging. That is, other clients were able to send and receive (average-sized) messages.

In addition, when I changed sasl_mechanisms from 'GSSAPI' to 'ANONYMOUS' in the attached script, the exact same huge message was sent successfully.
'PLAIN' worked as well. So, the problem might have something to do with Kerberos.

Package versions used for testing:

python-qpid-0.10-1.el6.noarch
python-qpid-qmf-0.10-10.el6.i686
qpid-cpp-client-0.10-6.el6.i686
qpid-cpp-client-devel-0.10-6.el6.i686
qpid-cpp-client-devel-docs-0.10-6.el6.noarch
qpid-cpp-server-0.10-6.el6.i686
qpid-cpp-server-devel-0.10-6.el6.i686
qpid-cpp-server-store-0.10-6.el6.i686
qpid-cpp-server-xml-0.10-6.el6.i686
qpid-java-client-0.10-10.el6.noarch
qpid-java-common-0.10-10.el6.noarch
qpid-java-example-0.10-10.el6.noarch
qpid-java-jca-0.10-10.el6.noarch
qpid-qmf-0.10-10.el6.i686
qpid-qmf-debuginfo-0.10-10.el6.i686
qpid-qmf-devel-0.10-10.el6.i686
qpid-tools-0.10-5.el6.noarch

I'm going to try and reproduce it on RHEL5.

Comment 6 Raymond Mancy 2011-09-23 06:41:45 UTC
(In reply to comment #5)
> I've tried to reproduce this problem on RHEL6.1 (i386). The python-qpid client
> was, indeed, unable to send message this big.
> Further testing revealed, though, that there was no connection hanging. That
> is, other clients were able to send and receive (average-sized) messages.
> 
I hope I wasn't mistaken about the connection.open() comment. I may have to confirm that.

Comment 7 Leonid Zhaldybin 2011-09-23 15:00:48 UTC
Testing this on RHEL5 showed the exact same results as on RHEL6: python client gets stuck when trying to send a huge message.
Both main architectures (i386 and x86_64) were tested.

Package versions for RHEL5:

python-qpid-0.10-1.el5
python-qpid-qmf-0.10-10.el5
qpid-cpp-client-0.10-8.el5
qpid-cpp-client-devel-0.10-8.el5
qpid-cpp-client-devel-docs-0.10-8.el5
qpid-cpp-client-ssl-0.10-8.el5
qpid-cpp-mrg-debuginfo-0.10-8.el5
qpid-cpp-server-0.10-8.el5
qpid-cpp-server-cluster-0.10-8.el5
qpid-cpp-server-devel-0.10-8.el5
qpid-cpp-server-ssl-0.10-8.el5
qpid-cpp-server-store-0.10-8.el5
qpid-cpp-server-xml-0.10-8.el5
qpid-java-client-0.10-9.el5
qpid-java-common-0.10-9.el5
qpid-java-example-0.10-9.el5
qpid-java-jca-0.10-9.el5
qpid-qmf-0.10-10.el5
qpid-qmf-debuginfo-0.10-10.el5
qpid-qmf-devel-0.10-10.el5
qpid-tools-0.10-6.el5

It seems to me that sending such a big message does not prevent other clients from sending and receiving messages, per se.But it does lead to the qpidd process using up to 100% of CPU. And the server doesn't release it even after the "stuck" client gets interrupted. After I tried to run the test script 4 times in a row, all four CPUs on my test machine got used up and the server was unable to process even small messages any more.

BTW, the 'magical number' seems to be equal to 4085. That is, if the message is 4085 bytes long (or is bigger than that), then python client is unable to send it using GSSAPI.

cpp client, on the other hand, doesn't seem to have this problem. I tested sending up to 10M messages without discovering any issues.

Comment 9 Leonid Zhaldybin 2011-10-07 13:04:46 UTC
The issue has been fixed. Tested on RHEL5.7 / RHEL6.1  i386 / x86_64 on
packages:

RHEL5.7:
python-qpid-0.10-1.el5
python-qpid-qmf-0.10-10.el5
qpid-cpp-client-0.10-9.el5
qpid-cpp-client-devel-0.10-9.el5
qpid-cpp-client-devel-docs-0.10-9.el5
qpid-cpp-client-ssl-0.10-9.el5
qpid-cpp-mrg-debuginfo-0.10-9.el5
qpid-cpp-server-0.10-9.el5
qpid-cpp-server-cluster-0.10-9.el5
qpid-cpp-server-devel-0.10-9.el5
qpid-cpp-server-ssl-0.10-9.el5
qpid-cpp-server-store-0.10-9.el5
qpid-cpp-server-xml-0.10-9.el5
qpid-java-client-0.10-11.el5
qpid-java-common-0.10-11.el5
qpid-java-example-0.10-11.el5
qpid-java-jca-0.10-11.el5
qpid-java-jca-zip-0.10-11.el5
qpid-qmf-0.10-10.el5
qpid-qmf-debuginfo-0.10-10.el5
qpid-qmf-devel-0.10-10.el5
qpid-tools-0.10-6.el5

RHEL6.1:
python-qpid-0.10-1.el6
python-qpid-qmf-0.10-10.el6
qpid-cpp-client-0.10-7.el6_1
qpid-cpp-client-devel-0.10-7.el6_1
qpid-cpp-client-devel-docs-0.10-7.el6_1
qpid-cpp-server-0.10-7.el6_1
qpid-cpp-server-devel-0.10-7.el6_1
qpid-cpp-server-store-0.10-7.el6_1
qpid-cpp-server-xml-0.10-7.el6_1
qpid-java-client-0.10-11.el6
qpid-java-common-0.10-11.el6
qpid-java-example-0.10-11.el6
qpid-java-jca-0.10-11.el6
qpid-java-jca-zip-0.10-11.el6
qpid-qmf-0.10-10.el6
qpid-qmf-debuginfo-0.10-10.el6
qpid-qmf-devel-0.10-10.el6
qpid-tools-0.10-5.el6

-> VERIFIED

Comment 10 Tomas Capek 2011-10-17 14:06:29 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Prior to this update, when a large message (over 4KB) was sent from the python-qpid client to the broker, the connection became unresponsive and other clients were unable to connect to the broker. This bug has been fixed, and clients no longer hang in the described scenario.

Comment 11 errata-xmlrpc 2011-10-24 11:43:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1399.html


Note You need to log in before you can comment on or make changes to this bug.