Bug 509892

Summary: byte credit calculation inconsistent for messages transfered to new joiner
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Alan Conway <aconway>
Status: CLOSED ERRATA QA Contact: Jiri Kolar <jkolar>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1.1CC: jkolar, tao
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, transferring messages in a queue to a new cluster member resulted in inconsistent credit calculations between nodes, which may have caused the brokers in the cluster to exit with errors. With this update, the "delivery-properties.exchange" is set during the update process, so that the credit calculations are consistent.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 15:59:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
test program none

Description Gordon Sim 2009-07-06 17:38:04 UTC
Created attachment 350650 [details]
test program

Description of problem:

For the case where messages in a queue are transfered to a new cluster member, subsequent credit calculations are inconsistent between nodes meaning that subscriptions using byte credit can cause inconsistent queue state and ultimately cause brokers in the cluster to exit with errors.

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-22.el5

How reproducible:

100%

Steps to Reproduce:
1. start cluster node
2. send a message to a queue
   e.g. qpid-config add queue test-queue
        echo MyMessage | sender
3. add a new node to the cluster (e.g. on port 5673)
4. using attached test program, consume the message using an exact byte credit allocation from the new node
   e.g. test --byte-credit 60 --message-credit 1 --port 5673
  
Actual results:

Client gets message but the first cluster node exits with:

2009-jul-06 13:28:34 critical 10.16.44.221:20781(READY/error) error 254 did not occur on 10.16.44.221:22731
2009-jul-06 13:28:34 error Error delivering frames: Aborted by local failure that did not occur on all replicas
2009-jul-06 13:28:34 notice 10.16.44.221:20781(LEFT/error) leaving cluster grs-mrg14-test-cluster
2009-jul-06 13:28:34 notice Shut down


Expected results:

No node exits, message gets delivered and removed from queue on both nodes.

Additional info:

From experiments it appears the the calculation for required byte credit on the newly added node is one byte less than on the first node.

Comment 1 Alan Conway 2009-10-05 15:10:48 UTC
Fixed in SVN r821830

The delivery-properties.exchange was not being set during the update process.
The reproducer uses the default exchange so there's a 1 byte differnece - the 0 count byte.

Comment 3 Jiri Kolar 2010-05-27 14:02:07 UTC
Tested:
on 752581 bug appears
on 946106 does not. It has been fixed

validated on RHEL  5.5 i386 / x86_64 not on RHEL4 because lack of clustering

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
python-qpid-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-1.el5
qpid-cpp-client-devel-0.7.946106-1.el5
qpid-cpp-client-devel-docs-0.7.946106-1.el5
qpid-cpp-client-ssl-0.7.946106-1.el5
qpid-cpp-mrg-debuginfo-0.7.935473-1.el5
qpid-cpp-server-0.7.946106-1.el5
qpid-cpp-server-cluster-0.7.946106-1.el5
qpid-cpp-server-devel-0.7.946106-1.el5
qpid-cpp-server-ssl-0.7.946106-1.el5
qpid-cpp-server-store-0.7.946106-1.el5
qpid-cpp-server-xml-0.7.946106-1.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5
rhm-docs-0.7.946106-1.el5

->VERIFIED

Comment 4 Jaromir Hradilek 2010-10-08 10:08:18 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, transferring messages in a queue to a new cluster member resulted in inconsistent credit calculations between nodes, which may have caused the brokers in the cluster to exit with errors. With this update, the "delivery-properties.exchange" is set during the update process, so that the credit calculations are consistent.

Comment 6 errata-xmlrpc 2010-10-14 15:59:00 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html