Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 648927

Summary: Clustered broker crashes in assertion in cluster/ExpiryPolicy.cpp
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: ppecka <ppecka>
Severity: medium Docs Contact:
Priority: high    
Version: 1.3CC: freznice, iboverma, jneedle, tross
Target Milestone: 1.3.2-RC1   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-mrg-0.7.946106-26 Doc Type: Bug Fix
Doc Text:
When a message with the time to live (TTL) value set was sent to multiple queues by a fanout or topic exchange before a new member joined the cluster, it could time out too early on the new member. This could put queues to an inconsistent state, causing a broker to terminate unexpectedly. With this update, the underlying source code has been adapted to manage message expiration in a cluster correctly, and this error no longer occurs.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-15 12:11:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 654872    
Attachments:
Description Flags
Repliable reproducer script. none

Description Alan Conway 2010-11-02 14:29:40 UTC
Description of problem:

Don't know how to replicate as yet. Reported in:

https://issues.apache.org/jira/browse/QPID-2874

and has occured in ptol tests:

http://mrg2.lab.bos.redhat.com:2765/qpid-cpp/11516/test.out.gz
http://mrg18.lab.bos.redhat.com:2765/qpid-cpp-test/2891/test.out.gz

Version-Release number of selected component (if applicable): r1029686

How reproducible: unknown

Steps to Reproduce: unknown

Actual results: core dump

Expected results: no core dump

Comment 1 Alan Conway 2010-11-15 20:19:32 UTC
To reproduce run

  make check TESTS=run_cluster_tests "CLUSTER_TESTS=*.test_management -DDURATION=4"

in a loop. I've seen the failure in 2-6 iterations.

Note you must use a debug build. A release build with -DNDEBUG has assertions compiled out so it will not show this problem. An RPM is a release build, you won't see this issue with RPM-installed qpidd.

Comment 2 Alan Conway 2010-11-16 16:58:56 UTC
Created attachment 460882 [details]
Repliable reproducer script.

Attached script reproduces the problem reliably, every time.

The problem is to do with messages that are fanned-out to multiple queues.
The cluster update process does not recognize the same message on different queues and updates as if it were two distinct messages. The cluster expiry code expects a 1-1 correspondence between messages and expiry-ids, which are assigned per message, not per-queued-message.

Comment 3 Alan Conway 2010-11-18 19:51:28 UTC
Fixed on trunk r1036589

Comment 4 Alan Conway 2010-11-18 19:51:28 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
C: Bug in code managing message expiry in a cluster.
C: If a message with TTL (Time To Live) set 
is sent to multiple queues by a fanout or topic exchange before a new member joins the cluster, it could be timed out too early on the new member. This could lead to queues becoming inconsistent causing a broker to exit with an invalid-argument error. 
F: The bug was corrected.
R: Error no longer occurs.

Comment 5 Alan Conway 2010-11-19 15:42:51 UTC
This may be the same issue as Bug 654872

Comment 7 ppecka 2011-02-01 10:06:15 UTC
VERIFIED RHEL 5.6 i386 / x86_64:

packages used
qpid-cpp-mrg-0.7.946106-27.el5.src.rpm
qpid-tools-0.7.946106-12.el5.src.rpm
openais-0.80.6-28.el5

--> VERIFIED

Comment 8 Jaromir Hradilek 2011-02-09 18:48:24 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,5 +1 @@
-C: Bug in code managing message expiry in a cluster.
+When a message with the time to live (TTL) value set was sent to multiple queues by a fanout or topic exchange before a new member joined the cluster, it could time out too early on the new member. This could put queues to an inconsistent state, causing a broker to terminate unexpectedly. With this update, the underlying source code has been adapted to manage message expiration in a cluster correctly, and this error no longer occurs.-C: If a message with TTL (Time To Live) set 
-is sent to multiple queues by a fanout or topic exchange before a new member joins the cluster, it could be timed out too early on the new member. This could lead to queues becoming inconsistent causing a broker to exit with an invalid-argument error. 
-F: The bug was corrected.
-R: Error no longer occurs.

Comment 9 errata-xmlrpc 2011-02-15 12:11:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0217.html