Red Hat Bugzilla – Bug 509800
If journal capacity is exceeded as a result of cluster-durable mode being invoked, last man standing exits
Last modified: 2010-10-14 11:59:31 EDT
Description of problem:
The 'cluster-durable' mode is supposed to force transient messages to be persistent when cluster memberships drops down to one node. However if a queue contains more messages that can fit in the journal when this happens that last node will also exit at this point.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. start two node cluster
2. create queue with cluster-durability enabled
qpid-config add queue test-queue --durable --cluster-durable
3. fill queue with large number of transient messages
for i in `seq 1 300000`; do echo "Message$i"; done | sender
4. kill one of the cluster nodes
The other node (not the one killed) exits with:
2009-jul-06 06:13:31 notice 10.16.44.221:26093(READY) last broker standing, update queue policies
2009-jul-06 06:13:31 warning Journal "test-queue": Enqueue capacity threshold exceeded on queue "test-queue".
2009-jul-06 06:13:31 error Error delivering frames: Enqueue capacity threshold exceeded on queue "test-queue". (JournalImpl.cpp:576)
2009-jul-06 06:13:31 notice 10.16.44.221:26093(LEFT) leaving cluster grs-mrg14-test-cluster
2009-jul-06 06:13:31 notice Shut down
Should not exit. Probably should just print an error indicating that not all messages could be persisted.
I believe that the solution is to add exception handling in or around Queue::setLastNodeFailure(). This is the only place where there issufficient context to know how to handle the error and log an approriate error message.
Fixed with unit test
Transmitting file data ..
Committed revision 799658.
Still needs system test before it can be marked modified.
on 752581 bug appears
on 946106 does not. It has been fixed
validated on RHEL 5.5 i386 / x86_64 not on RHEL4 because of no clustering
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
When the "--cluster-durable" mode was enabled, exceeding the journal capacity caused the last node to exit with the following error:
Error delivering frames: Enqueue capacity threshold exceeded on queue "queue-name". (JournalImpl.cpp:576)
With this update, the last node no longer shuts down when the journal capacity is exceeded.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.