Red Hat Bugzilla – Bug 509801
cluster-durable mode does not work for messages enqueued on more than on queue
Last modified: 2009-07-14 13:32:21 EDT
Description of problem:
If a message is routed by an exchange to more than one queue with the cluster-durable property enabled, it will only become persistent on the first of those queue should the cluster-durable functionality be invoked.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. start two node cluster
2. create some queues with cluster durability enabled
e.g. for q in `seq 1 10`; do qpid-config add queue queue-$q --durable --cluster-durable; done
3. bind those queues to some exchange such that they can be addressed as a group
e.g. for q in `seq 1 10`; do qpid-config bind amq.fanout queue-$q; done
4. send some messages to that exchange matching this binding
e.g. for i in `seq 1 10`; do echo "Message$i"; done | sender --exchange amq.fanout --send-eos 1
5. kill on node of cluster
6. stop and recover the other cluster node
7. check that each queue has the expected messages recovered
Only first queue has the messages
All queues have the messages
Do we know if the data has been written down to the journal correctly for all the queues? That side seems correct in the code, so I'm wondering if the patch above for 509803 might not also be the issue here for recovery.
Created attachment 350679 [details]
candidate fix for issue
The issue is that getPersistentID() was being used to know whether to enqueue to store in Queue::setLastNodeFailure(). Issue is that on the first queue this gets set, so remainder of queues will get skipped. The patch above corrects this logic.
test is needed.
Fix and unit test committed to trunk.
Committed revision 791672.
Confirmed the patch (id=350679) is a valid fix.
The proposed patch introduces another, arguably worse, issue. It results in duplicate attempts to enqueue the same message should the last-man-standing mode ever be invoked again when one or more messages that were previously 'forced persistent' are still on the queue. This then results in the last man standing dying with:
2009-07-07 09:07:05 error Error delivering frames: Queue test-queue: store() failed: jexception 0x0b00 enq_map::insert_pfid() threw JERR_MAP_DUPLICATE: Attempted to insert record into map using duplicate key. (rid=0x1 pfid=0x0) (MessageStoreImpl.cpp:1485)
2009-07-07 09:07:05 notice 192.168.0.2:5985(LEFT) leaving cluster grs
2009-07-07 09:07:05 notice Shut down
The above case has been correct on trunk with tests:
Committed revision 791858.
Created attachment 350819 [details]
fix and unit tests for issue
Created attachment 350962 [details]
fix and unit tests for issue
This patch also corrects the requeue() caes for acquired messages that the last patch regresses.
Created attachment 350963 [details]
patch for issue
removed dup patch detail from other BZ
Created attachment 350990 [details]
Fixed in qpidd-0.5.752581-25.el5
on -22 bug aapears
on -25 has been fixed
validated on RHEL 5.3 i386 / x86_64
# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.