Bug 545436 - Cluster node shutsdown with inconsistent error
Summary: Cluster node shutsdown with inconsistent error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.7
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.3
: ---
Assignee: Alan Conway
QA Contact: ppecka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-12-08 15:31 UTC by Rajith Attapattu
Modified: 2010-10-14 15:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
When running the 'test_failover' test case in the 'qpid-python-testkit' framework where cluster nodes are shutdown and new members are added, a newly joined node encountered an error of the form "confirmed N but only sent N-1" which was only raised on the said member causing it to shut down as inconsistent. With this update, inconsistent errors no longer occur.
Clone Of:
Environment:
Last Closed: 2010-10-14 15:59:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0773 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging and Grid Version 1.3 2010-10-14 15:56:44 UTC

Description Rajith Attapattu 2009-12-08 15:31:40 UTC
Description of problem:

When running the "test_failover" test case in the qpid-python-testkit framework where cluster nodes are shutdown and new members are added, a new node just joined encounters an error of the form "confirmed N but only sent N-1" which is only raised on the said member causing it to shut down as inconsistent.


Version-Release number of selected component (if applicable):
qpid trunk

How reproducible:
Always

Steps to Reproduce:
1. checkout svn
2. build c++ broker and cluster module
3. go to java/testkit/bin and run ./qpid-python-testkit
  
Actual results:

cluster2-5: 2009-12-04 14:44:35 notice cluster(192.168.1.103:14491 READY) caught up, active cluster member.
cluster2-5: 2009-12-04 14:44:36 error Execution exception: invalid-argument: anonymous.e3a66a8d-330d-47c2-b96d-bb0b1315248b: confirmed < (2+0) but only sent < (1+0) (qpid/SessionState.cpp:151)
cluster2-5: 2009-12-04 14:44:36 critical cluster(192.168.1.103:14491 READY/error) local error 599 did not occur on member 192.168.1.103:14453: invalid-argument: anonymous.e3a66a8d-330d-47c2-b96d-bb0b1315248b: confirmed < (2+0) but only sent < (1+0) (qpid/SessionState.cpp:151)
cluster2-5: 2009-12-04 14:44:36 error Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.e3a66a8d-330d-47c2-b96d-bb0b1315248b: confirmed < (2+0) but only sent < (1+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorCheck.cpp:89)
cluster2-5: 2009-12-04 14:44:36 notice cluster(192.168.1.103:14491 LEFT/error) leaving cluster cluster2-helaya:13958
cluster2-5: 2009-12-04 14:44:36 notice Shut down


Expected results:
There should not be any inconsistent errors.

Additional info:

Comment 1 Alan Conway 2009-12-09 16:59:49 UTC
This is also https://issues.apache.org/jira/browse/QPID-2253.

Fixed in r888874

Comment 2 ppecka 2010-05-18 17:56:28 UTC
reproduced, verified on RHEL_5.5 - i386 / x86_64:
# rpm -qa | grep qpid
qpid-cpp-client-devel-0.7.935473-1.el5
qpid-cpp-server-xml-0.7.935473-1.el5
qpid-cpp-server-store-0.7.935473-1.el5
qpid-cpp-client-devel-docs-0.7.935473-1.el5
qpid-cpp-server-0.7.935473-1.el5
python-qpid-0.7.938298-1.el5
qpid-java-common-0.7.934605-1.el5
qpid-cpp-server-cluster-0.7.935473-1.el5
qpid-java-client-0.7.934605-1.el5
qpid-tools-0.7.934605-2.el5
qpid-cpp-server-devel-0.7.935473-1.el5
qpid-cpp-client-0.7.935473-1.el5
qpid-cpp-client-ssl-0.7.935473-1.el5
qpid-cpp-server-ssl-0.7.935473-1.el5

--> VERIFIED

Comment 3 Martin Prpič 2010-10-10 08:08:46 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
When running the 'test_failover' test case in the 'qpid-python-testkit' framework where cluster nodes are shutdown and new members are added, a newly joined node encountered an error of the form "confirmed N but only sent N-1" which was only raised on the said member causing it to shut down as inconsistent. With this update, inconsistent errors no longer occur.

Comment 5 errata-xmlrpc 2010-10-14 15:59:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html


Note You need to log in before you can comment on or make changes to this bug.