Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 519476

Summary: Invalid accept data sent by Java client after failover.
Product: Red Hat Enterprise MRG Reporter: Alan Conway <aconway>
Component: qpid-javaAssignee: Rajith Attapattu <rattapat+nobody>
Status: CLOSED ERRATA QA Contact: Jiri Kolar <jkolar>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1.6CC: gsim, jkolar
Target Milestone: 1.3   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Previously, the Java client sent invalid accept data after a failover. This was caused by a race condition where data from an old disconnected connection was incorrectly sent to a new failed-over connection. With this update, the Java client no longer sends invalid data after a failover.
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-10-14 16:01:19 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Alan Conway 2009-08-26 18:36:13 UTC
A program intended to reproduce bug 516501 turned up a new bug, possibly a client-side bug in Java failover. It appears that there is a race condition where ack data from an old, disconnected connection is incorrectly sent on a new failed-over connection. The symptom is an error of the form "connfirmed N but sent 0"

The reproducer code is https://bugzilla.redhat.com/attachment.cgi?id=357364, here's the description from bug 516501

Comment #5 From  Rajith Attapattu (rattapat)  2009-08-13 15:28:17 EDT   (-) [reply] -------      Private

Created an attachment (id=357364) [details]
Reproducer

The attachment contains a JMS based reproducer.
Just untar the package and run the scramble_brokers.sh script.

It basically starts a jms producer and jms consumer that uses ** sync_ack ** in
the bg and then changes the 4 node cluster membership rapidly to force
failover.

I tried with a 2 node cluster to keep things simple but the probability of the
error happening was pretty low. Also in this case it was hitting a known issue
in the JMS clients FailoverExchangeMethod.

The script is running the java clients with log level at WARN. You can easily
change that in the script to debug ..etc.
You could also get the brokers to log into a file.

Feel free to modify the tests as you see fit.
Please ping me if you make any improvements to the test script and I could
incorporate those changes. into my nightly runs.

Comment 2 Rajith Attapattu 2009-12-14 22:19:36 UTC
I am currently unable to reproduce this issue with the latest package set.
I even tried with a broker prior to r794736.
I have done a fair amount of testing and I am yet to see this issue.

Comment 3 Jiri Kolar 2010-03-16 10:47:41 UTC
Any progress? I there any known reproducer?

Comment 4 Rajith Attapattu 2010-03-17 16:19:47 UTC
Not that know of.
This issue seems to be fixed, but sadly know way of verifying.

Comment 5 Jiri Kolar 2010-03-24 13:26:30 UTC
Tested:
on -2 bug does not appear and on 1.2 also not. We (Rajith,Me) were not able to reproduce it anymore. Probably fixed on broker side, but nobody know when.

Discussed with Rajith and Alan and both proposed mark it as verified

validated on packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5
openais-debuginfo-0.80.6-16.el5
python-qpid-0.7.917557-4.el5
qpid-cpp-client-0.7.916826-2.el5
qpid-cpp-client-devel-0.7.916826-2.el5
qpid-cpp-client-rdma-0.7.916826-2.el5
qpid-cpp-client-ssl-0.7.916826-2.el5
qpid-cpp-mrg-debuginfo-0.7.916826-2.el5
qpid-cpp-server-0.7.916826-2.el5
qpid-cpp-server-cluster-0.7.916826-2.el5
qpid-cpp-server-devel-0.7.916826-2.el5
qpid-cpp-server-rdma-0.7.916826-2.el5
qpid-cpp-server-ssl-0.7.916826-2.el5
qpid-cpp-server-store-0.7.916826-2.el5
qpid-cpp-server-xml-0.7.916826-2.el5
qpid-dotnet-0.4.738274-2.el5
qpid-java-client-0.7.918215-1.el5
qpid-java-common-0.7.918215-1.el5
qpid-tools-0.7.917557-4.el5


->VERIFIED

Comment 6 Jiri Kolar 2010-04-09 13:41:11 UTC
tested on RHEL  5.5 i386 / x86_64  and RHEL  4.8 i386 / x86_64

Comment 7 Martin Prpič 2010-10-10 09:34:17 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, the Java client sent invalid accept data after a failover. This was caused by a race condition where data from an old disconnected connection was incorrectly sent to a new failed-over connection. With this update, the Java client no longer sends invalid data after a failover.

Comment 9 errata-xmlrpc 2010-10-14 16:01:19 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0773.html