Bug 509801 - cluster-durable mode does not work for messages enqueued on more than on queue
Summary: cluster-durable mode does not work for messages enqueued on more than on queue
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
high
medium
Target Milestone: 1.1.6
: ---
Assignee: Gordon Sim
QA Contact: Jiri Kolar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-07-06 10:28 UTC by Gordon Sim
Modified: 2009-07-14 17:32 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-07-14 17:32:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
candidate fix for issue (1.64 KB, patch)
2009-07-06 20:39 UTC, Carl Trieloff
no flags Details | Diff
fix and unit tests for issue (5.45 KB, patch)
2009-07-07 15:14 UTC, Carl Trieloff
no flags Details | Diff
fix and unit tests for issue (7.45 KB, patch)
2009-07-08 16:25 UTC, Carl Trieloff
no flags Details | Diff
patch for issue (7.00 KB, patch)
2009-07-08 16:29 UTC, Carl Trieloff
no flags Details | Diff
Updated fix (10.42 KB, patch)
2009-07-08 20:22 UTC, Gordon Sim
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1153 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging bug fixing update 2009-07-14 17:31:48 UTC

Description Gordon Sim 2009-07-06 10:28:40 UTC
Description of problem:

If a message is routed by an exchange to more than one queue with the cluster-durable property enabled, it will only become persistent on the first of those queue should the cluster-durable functionality be invoked.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-22.el5

How reproducible:

100%

Steps to Reproduce:
1. start two node cluster
2. create some queues with cluster durability enabled
  e.g. for q in `seq 1 10`; do qpid-config add queue queue-$q --durable --cluster-durable; done
3. bind those queues to some exchange such that they can be addressed as a group
  e.g. for q in `seq 1 10`; do qpid-config bind amq.fanout queue-$q; done
4. send some messages to that exchange matching this binding
  e.g. for i in `seq 1 10`; do echo "Message$i"; done | sender --exchange amq.fanout --send-eos 1
5. kill on node of cluster
6. stop and recover the other cluster node
7. check that each queue has the expected messages recovered
  
Actual results:

Only first queue has the messages

Expected results:

All queues have the messages

Comment 3 Carl Trieloff 2009-07-06 20:12:55 UTC

Do we know if the data has been written down to the journal correctly for all the queues? That side seems correct in the code, so I'm wondering if the patch above for 509803 might not also be the issue here for recovery.

Comment 4 Carl Trieloff 2009-07-06 20:39:49 UTC
Created attachment 350679 [details]
candidate fix for issue


The issue is that getPersistentID() was being used to know whether to enqueue to store in Queue::setLastNodeFailure(). Issue is that on the first queue this gets set, so remainder of queues will get skipped. The patch above corrects this logic.

test is needed.

Comment 5 Carl Trieloff 2009-07-07 01:56:39 UTC
Fix and unit test committed to trunk.
Committed revision 791672.


Confirmed the patch (id=350679) is a valid fix.

Comment 6 Gordon Sim 2009-07-07 08:07:14 UTC
The proposed patch introduces another, arguably worse, issue. It results in duplicate attempts to enqueue the same message should the last-man-standing mode ever be invoked again when one or more messages that were previously 'forced persistent' are still on the queue. This then results in the last man standing dying with:

2009-07-07 09:07:05 error Error delivering frames: Queue test-queue: store() failed: jexception 0x0b00 enq_map::insert_pfid() threw JERR_MAP_DUPLICATE: Attempted to insert record into map using duplicate key. (rid=0x1 pfid=0x0) (MessageStoreImpl.cpp:1485)
2009-07-07 09:07:05 notice 192.168.0.2:5985(LEFT) leaving cluster grs
2009-07-07 09:07:05 notice Shut down

Comment 7 Carl Trieloff 2009-07-07 15:06:31 UTC

The above case has been correct on trunk with tests:
Committed revision 791858.

Comment 8 Carl Trieloff 2009-07-07 15:14:28 UTC
Created attachment 350819 [details]
fix and unit tests for issue

Comment 9 Carl Trieloff 2009-07-08 16:25:23 UTC
Created attachment 350962 [details]
fix and unit tests for issue

This patch also corrects the requeue() caes for acquired messages that the last patch regresses.

Comment 10 Carl Trieloff 2009-07-08 16:29:07 UTC
Created attachment 350963 [details]
patch for issue

removed dup patch detail from other BZ

Comment 11 Gordon Sim 2009-07-08 20:22:30 UTC
Created attachment 350990 [details]
Updated fix

Comment 12 Gordon Sim 2009-07-09 07:08:50 UTC
Fixed in qpidd-0.5.752581-25.el5

Comment 13 Jiri Kolar 2009-07-09 09:25:26 UTC
Tested:
on -22 bug aapears
on -25 has been fixed

validated on RHEL  5.3 i386 / x86_64 

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.3-22.el5_3.8
openais-debuginfo-0.80.3-22.el5_3.8
openais-devel-0.80.3-22.el5_3.8
python-qpid-0.5.752581-3.el5
qpidc-0.5.752581-25.el5
qpidc-debuginfo-0.5.752581-22.el5
qpidc-devel-0.5.752581-25.el5
qpidc-perftest-0.5.752581-25.el5
qpidc-rdma-0.5.752581-25.el5
qpidc-ssl-0.5.752581-25.el5
qpidd-0.5.752581-25.el5
qpidd-acl-0.5.752581-25.el5
qpidd-cluster-0.5.752581-25.el5
qpidd-devel-0.5.752581-25.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-25.el5
qpidd-ssl-0.5.752581-25.el5
qpidd-xml-0.5.752581-25.el5
qpid-java-client-0.5.751061-8.el5
qpid-java-common-0.5.751061-8.el5
rhm-0.5.3206-6.el5
rhm-docs-0.5.756148-1.el5

->VERIFIED

Comment 15 errata-xmlrpc 2009-07-14 17:32:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1153.html


Note You need to log in before you can comment on or make changes to this bug.