+++ This bug was initially created as a clone of Bug #466955 +++ Consuming messages with the python client, using an accept mode of explicit (the default) as demonstrated in the python tutorial causes the broker to consume extreme amounts of CPU. I've filed this under the broker component, however it's entirely possible that the client is triggering this by misbehaving, e.g. forgetting to ack messages. --- Additional comment from gsim on 2008-10-15 12:00:22 CEST --- I believe the problem is that the broker retains a record of each delivery and keeps this until the delivery has been accepted/released (at which point the message is released) AND completed. The reason that it holds them until completion is to have a record of the bytes to be reallocated in window mode when completion occurs. However this is only necessary for subscriptions that are in windowing mode. As the python client doesn't send completions automatically, the list of records builds up as messages are sent and this slows down processing of subsequent accepts. --- Additional comment from gsim on 2008-10-15 12:09:13 CEST --- Fixed by r704838 which prevents the broker from holding onto the records until completion is received unless it is in windowing mode. Also changed the mode used in start() on the incoming queue in the python client to be credit mode (which appears to be in keeping with the spirit of that method). To verify the test I modifed the pubsub python examples to allow a steady stream of messages to flow to the consumer. Prior to the change, after a large number of messages the broker CPU would start rise and remain high; with the change the test could run for a long period without any noticebale increase in broker load. --- Additional comment from gsim on 2008-10-15 12:10:29 CEST --- Created attachment 320409 [details] Test subscriber --- Additional comment from gsim on 2008-10-15 12:12:40 CEST --- Created attachment 320410 [details] Test publisher The attached publisher and subscriber were the tests I used to detect the issue andverify the fix. I ran the publisher in a loop (while ./examples/pubsub/mypub.py ; do true; done) while the subscriber was running and monitored the CPU usage. --- Additional comment from errata-xmlrpc on 2008-11-13 21:27:20 CET --- Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHEA-2008:0994-01 http://errata.devel.redhat.com/errata/show/8043 --- Additional comment from freznice on 2008-11-14 11:24:02 CET --- RHTS test qpid_test_explicit_accept_mode_bz466955 proves that issue has been fixed. Validated on RHEL 4.7 / 5.2 i386 / x86_64 using packages: qpidd-0.3.713378-1.el5/rhm-0.3.2783-1.el5 vs. mrg 1.0.1 packages ->VERIFIED --- Additional comment from errata-xmlrpc on 2009-01-26 15:52:17 CET --- Bug report changed to RELEASE_PENDING status by Errata System. Advisory RHEA-2009:0035-05 has been changed to HOLD status. http://errata.devel.redhat.com/errata/show/8043 --- Additional comment from errata-xmlrpc on 2009-02-04 16:35:25 CET --- An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html
Running our automated test (see TCMS link above) revealed that this problem is present on ppc64 architecture. Packages used for testing: qpid-cpp-client-0.14-16.el6.ppc64 qpid-cpp-client-devel-0.14-16.el6.ppc64 qpid-cpp-client-rdma-0.14-16.el6.ppc64 qpid-cpp-client-ssl-0.14-16.el6.ppc64 qpid-cpp-debuginfo-0.14-16.el6.ppc64 qpid-cpp-server-0.14-16.el6.ppc64 qpid-cpp-server-devel-0.14-16.el6.ppc64 qpid-cpp-server-rdma-0.14-16.el6.ppc64 qpid-cpp-server-ssl-0.14-16.el6.ppc64 qpid-cpp-server-store-0.14-16.el6.ppc64 qpid-java-client-0.14-3.el6.noarch qpid-java-common-0.14-3.el6.noarch qpid-java-example-0.14-3.el6.noarch qpid-qmf-0.14-7.el6_2.ppc64 qpid-qmf-debuginfo-0.14-7.el6_2.ppc64 qpid-qmf-devel-0.14-7.el6_2.ppc64 qpid-tests-0.14-1.el6_2.noarch qpid-tools-0.14-2.el6_2.noarch
Created attachment 592165 [details] reproducer script The attached reproducer ran for about 90 seconds on s390x machine, whereas running it on ppc64 one took about 500 seconds.
I suspect this is fixed for ppc64 (and for any other architecture) by r704838.