Bug 509801 - cluster-durable mode does not work for messages enqueued on more than on queue
cluster-durable mode does not work for messages enqueued on more than on queue
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1.1
All Linux
high Severity medium
: 1.1.6
: ---
Assigned To: Gordon Sim
Jiri Kolar
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-06 06:28 EDT by Gordon Sim
Modified: 2009-07-14 13:32 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-07-14 13:32:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
candidate fix for issue (1.64 KB, patch)
2009-07-06 16:39 EDT, Carl Trieloff
no flags Details | Diff
fix and unit tests for issue (5.45 KB, patch)
2009-07-07 11:14 EDT, Carl Trieloff
no flags Details | Diff
fix and unit tests for issue (7.45 KB, patch)
2009-07-08 12:25 EDT, Carl Trieloff
no flags Details | Diff
patch for issue (7.00 KB, patch)
2009-07-08 12:29 EDT, Carl Trieloff
no flags Details | Diff
Updated fix (10.42 KB, patch)
2009-07-08 16:22 EDT, Gordon Sim
no flags Details | Diff

  None (edit)
Description Gordon Sim 2009-07-06 06:28:40 EDT
Description of problem:

If a message is routed by an exchange to more than one queue with the cluster-durable property enabled, it will only become persistent on the first of those queue should the cluster-durable functionality be invoked.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-22.el5

How reproducible:

100%

Steps to Reproduce:
1. start two node cluster
2. create some queues with cluster durability enabled
  e.g. for q in `seq 1 10`; do qpid-config add queue queue-$q --durable --cluster-durable; done
3. bind those queues to some exchange such that they can be addressed as a group
  e.g. for q in `seq 1 10`; do qpid-config bind amq.fanout queue-$q; done
4. send some messages to that exchange matching this binding
  e.g. for i in `seq 1 10`; do echo "Message$i"; done | sender --exchange amq.fanout --send-eos 1
5. kill on node of cluster
6. stop and recover the other cluster node
7. check that each queue has the expected messages recovered
  
Actual results:

Only first queue has the messages

Expected results:

All queues have the messages
Comment 3 Carl Trieloff 2009-07-06 16:12:55 EDT

Do we know if the data has been written down to the journal correctly for all the queues? That side seems correct in the code, so I'm wondering if the patch above for 509803 might not also be the issue here for recovery.
Comment 4 Carl Trieloff 2009-07-06 16:39:49 EDT
Created attachment 350679 [details]
candidate fix for issue


The issue is that getPersistentID() was being used to know whether to enqueue to store in Queue::setLastNodeFailure(). Issue is that on the first queue this gets set, so remainder of queues will get skipped. The patch above corrects this logic.

test is needed.
Comment 5 Carl Trieloff 2009-07-06 21:56:39 EDT
Fix and unit test committed to trunk.
Committed revision 791672.


Confirmed the patch (id=350679) is a valid fix.
Comment 6 Gordon Sim 2009-07-07 04:07:14 EDT
The proposed patch introduces another, arguably worse, issue. It results in duplicate attempts to enqueue the same message should the last-man-standing mode ever be invoked again when one or more messages that were previously 'forced persistent' are still on the queue. This then results in the last man standing dying with:

2009-07-07 09:07:05 error Error delivering frames: Queue test-queue: store() failed: jexception 0x0b00 enq_map::insert_pfid() threw JERR_MAP_DUPLICATE: Attempted to insert record into map using duplicate key. (rid=0x1 pfid=0x0) (MessageStoreImpl.cpp:1485)
2009-07-07 09:07:05 notice 192.168.0.2:5985(LEFT) leaving cluster grs
2009-07-07 09:07:05 notice Shut down
Comment 7 Carl Trieloff 2009-07-07 11:06:31 EDT

The above case has been correct on trunk with tests:
Committed revision 791858.
Comment 8 Carl Trieloff 2009-07-07 11:14:28 EDT
Created attachment 350819 [details]
fix and unit tests for issue
Comment 9 Carl Trieloff 2009-07-08 12:25:23 EDT
Created attachment 350962 [details]
fix and unit tests for issue

This patch also corrects the requeue() caes for acquired messages that the last patch regresses.
Comment 10 Carl Trieloff 2009-07-08 12:29:07 EDT
Created attachment 350963 [details]
patch for issue

removed dup patch detail from other BZ
Comment 11 Gordon Sim 2009-07-08 16:22:30 EDT
Created attachment 350990 [details]
Updated fix
Comment 12 Gordon Sim 2009-07-09 03:08:50 EDT
Fixed in qpidd-0.5.752581-25.el5
Comment 13 Jiri Kolar 2009-07-09 05:25:26 EDT
Tested:
on -22 bug aapears
on -25 has been fixed

validated on RHEL  5.3 i386 / x86_64 

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.3-22.el5_3.8
openais-debuginfo-0.80.3-22.el5_3.8
openais-devel-0.80.3-22.el5_3.8
python-qpid-0.5.752581-3.el5
qpidc-0.5.752581-25.el5
qpidc-debuginfo-0.5.752581-22.el5
qpidc-devel-0.5.752581-25.el5
qpidc-perftest-0.5.752581-25.el5
qpidc-rdma-0.5.752581-25.el5
qpidc-ssl-0.5.752581-25.el5
qpidd-0.5.752581-25.el5
qpidd-acl-0.5.752581-25.el5
qpidd-cluster-0.5.752581-25.el5
qpidd-devel-0.5.752581-25.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-25.el5
qpidd-ssl-0.5.752581-25.el5
qpidd-xml-0.5.752581-25.el5
qpid-java-client-0.5.751061-8.el5
qpid-java-common-0.5.751061-8.el5
rhm-0.5.3206-6.el5
rhm-docs-0.5.756148-1.el5

->VERIFIED
Comment 15 errata-xmlrpc 2009-07-14 13:32:21 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1153.html

Note You need to log in before you can comment on or make changes to this bug.