Bug 832512 - problems with explicit accept mode on ppc64
Summary: problems with explicit accept mode on ppc64
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 2.1.2
Hardware: ppc64
OS: Linux
low
medium
Target Milestone: ---
: ---
Assignee: Gordon Sim
QA Contact: MRG Quality Engineering
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-06-15 15:21 UTC by Leonid Zhaldybin
Modified: 2020-11-04 22:31 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 466955
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
reproducer script (4.36 KB, text/x-python)
2012-06-15 15:28 UTC, Leonid Zhaldybin
no flags Details

Description Leonid Zhaldybin 2012-06-15 15:21:35 UTC
+++ This bug was initially created as a clone of Bug #466955 +++

Consuming messages with the python client, using an accept mode of explicit (the default) as demonstrated in the python tutorial causes the broker to consume extreme amounts of CPU.

I've filed this under the broker component, however it's entirely possible that the client is triggering this by misbehaving, e.g. forgetting to ack messages.

--- Additional comment from gsim on 2008-10-15 12:00:22 CEST ---

I believe the problem is that the broker retains a record of each delivery and keeps this until the delivery has been accepted/released (at which point the message is released) AND completed. 

The reason that it holds them until completion is to have a record of the bytes to be reallocated in window mode when completion occurs. However this is only necessary for subscriptions that are in windowing mode.

As the python client doesn't send completions automatically, the list of records builds up as messages are sent and this slows down processing of subsequent accepts.

--- Additional comment from gsim on 2008-10-15 12:09:13 CEST ---

Fixed by r704838 which prevents the broker from holding onto the records until completion is received unless it is in windowing mode. Also changed the mode used in start() on the incoming queue in the python client to be credit mode (which appears to be in keeping with the spirit of that method).

To verify the test I modifed the pubsub python examples to allow a steady stream of messages to flow to the consumer. Prior to the change, after a large number of messages the broker CPU would start rise and remain high; with the change the test could run for a long period without any noticebale increase in broker load.

--- Additional comment from gsim on 2008-10-15 12:10:29 CEST ---

Created attachment 320409 [details]
Test subscriber

--- Additional comment from gsim on 2008-10-15 12:12:40 CEST ---

Created attachment 320410 [details]
Test publisher

The attached publisher and subscriber were the tests I used to detect the issue andverify the fix. I ran the publisher in a loop (while ./examples/pubsub/mypub.py ; do true; done) while the subscriber was running and monitored the CPU usage.

--- Additional comment from errata-xmlrpc on 2008-11-13 21:27:20 CET ---

Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHEA-2008:0994-01
http://errata.devel.redhat.com/errata/show/8043

--- Additional comment from freznice on 2008-11-14 11:24:02 CET ---

RHTS test qpid_test_explicit_accept_mode_bz466955 proves that issue has been
fixed.
Validated on RHEL 4.7 / 5.2 i386 / x86_64 using packages:
qpidd-0.3.713378-1.el5/rhm-0.3.2783-1.el5 vs. mrg 1.0.1 packages
->VERIFIED

--- Additional comment from errata-xmlrpc on 2009-01-26 15:52:17 CET ---

Bug report changed to RELEASE_PENDING status by Errata System.
Advisory RHEA-2009:0035-05 has been changed to HOLD status.
http://errata.devel.redhat.com/errata/show/8043

--- Additional comment from errata-xmlrpc on 2009-02-04 16:35:25 CET ---

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html

Comment 1 Leonid Zhaldybin 2012-06-15 15:24:03 UTC
Running our automated test (see TCMS link above) revealed that this problem is present on ppc64 architecture.

Packages used for testing:
qpid-cpp-client-0.14-16.el6.ppc64
qpid-cpp-client-devel-0.14-16.el6.ppc64
qpid-cpp-client-rdma-0.14-16.el6.ppc64
qpid-cpp-client-ssl-0.14-16.el6.ppc64
qpid-cpp-debuginfo-0.14-16.el6.ppc64
qpid-cpp-server-0.14-16.el6.ppc64
qpid-cpp-server-devel-0.14-16.el6.ppc64
qpid-cpp-server-rdma-0.14-16.el6.ppc64
qpid-cpp-server-ssl-0.14-16.el6.ppc64
qpid-cpp-server-store-0.14-16.el6.ppc64
qpid-java-client-0.14-3.el6.noarch
qpid-java-common-0.14-3.el6.noarch
qpid-java-example-0.14-3.el6.noarch
qpid-qmf-0.14-7.el6_2.ppc64
qpid-qmf-debuginfo-0.14-7.el6_2.ppc64
qpid-qmf-devel-0.14-7.el6_2.ppc64
qpid-tests-0.14-1.el6_2.noarch
qpid-tools-0.14-2.el6_2.noarch

Comment 2 Leonid Zhaldybin 2012-06-15 15:28:19 UTC
Created attachment 592165 [details]
reproducer script

The attached reproducer ran for about 90 seconds on s390x machine, whereas running it on ppc64 one took about 500 seconds.

Comment 3 Justin Ross 2013-02-27 11:25:52 UTC
I suspect this is fixed for ppc64 (and for any other architecture) by r704838.


Note You need to log in before you can comment on or make changes to this bug.