Bug 692132

Summary: Assertion in AsyncCompletion during scalability tests
Product: Red Hat Enterprise MRG Reporter: Kim van der Riet <kim.vdriet>
Component: qpid-cppAssignee: Ken Giusti <kgiusti>
Status: CLOSED ERRATA QA Contact: MRG Quality Engineering <mrgqe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: DevelopmentCC: gsim, jneedle, mgoulish, ppecka
Target Milestone: 2.0   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.10-4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-23 15:43:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Stack trace
none
Second stack trace none

Description Kim van der Riet 2011-03-30 13:54:23 UTC
Created attachment 488764 [details]
Stack trace

While running scalability tests against in-tree builds of r.1085065/r.4447, the following assertion was observed:

lt-qpidd: ./qpid/broker/AsyncCompletion.h:167: void qpid::broker::AsyncCompletion::end(qpid::broker::AsyncCompletion::Callback&): Assertion `completionsNeeded.get() > 0' failed.

Two test loops in a row have resulted in this failure at around iteration 8 or 9 (see below), but other test loops have succeeded without showing this problem, so it may be probabilistic in nature.

Steps to reproduce:
Two boxes, mrg42, mrg43 with 10g interfaces enabled as 20.0.10.{42,43}
Both boxes have modified environments:
limits.conf: nofile: 65536
syscfg.conf: fs.aio-max-nr: 262144

In addition, mrg42 (which runs qpid-perftest) has:
ulimits.conf: nproc: 65536

The broker is run as follows on mrg43:

rm -rf /tmp/rhm; ./qpidd --auth no -m no --max-connections 65100 --load-module /home/kpvdr/mrg/store/lib/.libs/msgstore.so --store-dir /tmp --jfile-size-pgs 48 --num-jfiles 16 --log-enable info+

The client is run 10 times in a row against the broker in a bash loop on mrg42 using the 10g interface:

./qpid-perftest --mode shared --summary --pub-confirm no --sync-publish no --sub-ack 0 -b 20.0.10.43 --npubs 1 --qt 10000 --nsubs 1 --count 100

Note that although the store is loaded, the test is a transient test.

Comment 1 Kim van der Riet 2011-03-30 14:09:28 UTC
Created attachment 488771 [details]
Second stack trace

This stack trace shows that the broker was almost idle at the time of the failure.

Comment 2 Ken Giusti 2011-03-31 13:29:20 UTC
Most likely caused by a code path that is "completing" the enqueue more than once.  No obvious candidate in the stack trace - will have to reproduce.

Comment 3 mick 2011-04-01 11:47:58 UTC
Please note similarity to stack trace that I found in https://bugzilla.redhat.com/show_bug.cgi?id=692546 .

( Thanks for noticing it, Gordon! )

The stacks are identical from level 9 up!   

I have an ultra-low-frequency (almost useless) reproducer.

I would rather not close that bug yet, just in case that avenue leads to an idea.

Comment 4 Ken Giusti 2011-04-01 13:13:40 UTC
*** Bug 692546 has been marked as a duplicate of this bug. ***

Comment 5 Ken Giusti 2011-04-01 13:18:19 UTC
Upstream JIRA:

https://issues.apache.org/jira/browse/QPID-3174

Comment 6 Ken Giusti 2011-04-01 19:37:46 UTC
Potential fix submitted upstream:

Committed revision 1087868.
http://svn.apache.org/viewvc?view=revision&revision=1087868

Comment 7 Gordon Sim 2011-04-04 14:39:47 UTC
Further similar change committed as http://svn.apache.org/viewvc?rev=1088539&view=rev and this plus change above merged to 0.10 release branch: http://svn.apache.org/viewvc?rev=1088634&view=rev

Comment 10 Ken Giusti 2011-05-03 17:19:57 UTC
This BZ was fixed prior to -4 release - I missed setting the state to "MODIFIED"...

Comment 12 ppecka 2011-06-08 17:01:37 UTC
Verified on rhel5 / rhel6 - both i686 / x86_64
(1000 runs)

rpm -qa | grep qpid
python-qpid-0.10-1.el5
qpid-cpp-server-xml-0.10-7.el5
qpid-qmf-devel-0.10-10.el5
qpid-cpp-client-0.10-7.el5
qpid-java-client-0.10-6.el5
qpid-cpp-client-devel-0.10-7.el5
qpid-cpp-server-devel-0.10-7.el5
qpid-java-common-0.10-6.el5
qpid-qmf-0.10-10.el5
qpid-cpp-client-ssl-0.10-7.el5
qpid-cpp-server-cluster-0.10-7.el5
qpid-cpp-server-0.10-7.el5
qpid-java-example-0.10-6.el5
python-qpid-qmf-0.10-10.el5
qpid-cpp-client-devel-docs-0.10-7.el5
qpid-cpp-server-ssl-0.10-7.el5
qpid-tools-0.10-5.el5
qpid-cpp-server-store-0.10-7.el5


--> VERIFIED

Comment 13 errata-xmlrpc 2011-06-23 15:43:39 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2011-0890.html