Bug 603835 - cluster_tests.test_management failing
cluster_tests.test_management failing
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
beta
All Linux
urgent Severity urgent
: 1.3
: ---
Assigned To: Alan Conway
Jeff Needle
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-14 12:35 EDT by Alan Conway
Modified: 2010-10-20 09:53 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 07:29:07 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Test logs from a failed run (4.84 KB, application/x-bzip-compressed-tar)
2010-06-14 12:35 EDT, Alan Conway
no flags Details

  None (edit)
Description Alan Conway 2010-06-14 12:35:05 EDT
Created attachment 423905 [details]
Test logs from a failed run

Description of problem:

The cluster_tests.test_management test is failing:

cluster_tests.LongTests.test_management ............................................. fail
Error during test:
  Traceback (most recent call last):
    File "/home/remote/aconway/qpid2/qpid/dbg/src/tests/python/commands/qpid-python-test", line 311, in run
      phase()
    File "/home/remote/aconway/qpid2/qpid/cpp/src/tests/cluster_tests.py", line 291, in test_management
      for b in cluster[alive:]: b.ready() # Check if a broker crashed.
    File "/home/remote/aconway/qpid2/qpid/dbg/src/tests/python/qpid/brokertest.py", line 393, in ready
      except: raise RethrownException(
  RethrownException: Broker cluster1-0 failed ready test:
      cluster1-0: 2010-06-14 12:20:06 debug cluster(20.0.100.32:2063 LEFT/error) local close of replicated connection 20.0.100.32:2063-4(local)
      cluster1-0: 2010-06-14 12:20:06 debug cluster(20.0.100.32:2063 LEFT/error) deleted connection: 20.0.100.32:2063-4(local)
      cluster1-0: 2010-06-14 12:20:06 debug Shutting down CPG
      cluster1-0: 2010-06-14 12:20:06 notice Shut down



Version-Release number of selected component (if applicable): Trunk r954471

How reproducible: every time

Steps to Reproduce:
1. cd qpid/cpp/src/tests
2. source test_env.sh
3. run_cluster_tests *.test_management -DDURATION=2
  
Actual results: fail 

Expected results: pass
Comment 1 Ted Ross 2010-06-15 08:54:35 EDT
Additional information:

This can be reproduced simply by starting up a two-node cluster and running the command "qpid-stat -b" against one of the cluster nodes.

The connected node will fail with the following log:

2010-06-15 08:52:09 error Execution exception: invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151)
2010-06-15 08:52:09 critical cluster(127.0.0.1:29929 READY/error) local error 587 did not occur on member 127.0.0.1:29949: invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151)
2010-06-15 08:52:09 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorCheck.cpp:89)
2010-06-15 08:52:09 notice cluster(127.0.0.1:29929 LEFT/error) leaving cluster TED
2010-06-15 08:52:09 notice Shut down
Comment 2 Ted Ross 2010-06-15 09:02:56 EDT
Even more information:

This problem is introduced at the client level.  If you revert qpid/extras/qmf/src/py/qmf/console.py to subversion rev 953702, the problem goes away.

The important difference in the console client code is that the newer version (that causes the crash) applies flow control back-pressure on multiple subscriptions.  Is it possible that credit balances for flow control are not being handled uniformly by nodes in the cluster?
Comment 3 Alan Conway 2010-06-16 16:37:05 EDT
Fixed in r955370, and mrg 1.3 release repo:
http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=c8e4559e0a26efe70e3a462f8e49a4bd55ba46a2
Comment 4 Jiri Kolar 2010-06-23 13:54:26 EDT
Tested:
on 752581 bug appears
on 946106 does not. It has been fixed

validated on RHEL  5.5 i386 / x86_64 not on RHEL4 because of no clustering

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
python-qpid-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-2.el5
qpid-cpp-client-devel-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-2.el5
qpid-cpp-client-ssl-0.7.946106-2.el5
qpid-cpp-mrg-debuginfo-0.7.946106-1.el5
qpid-cpp-server-0.7.946106-2.el5
qpid-cpp-server-cluster-0.7.946106-2.el5
qpid-cpp-server-devel-0.7.946106-2.el5
qpid-cpp-server-ssl-0.7.946106-2.el5
qpid-cpp-server-store-0.7.946106-2.el5
qpid-cpp-server-xml-0.7.946106-2.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5  
rhm-docs-0.7.946106-1.el5

->VERIFIED

Note You need to log in before you can comment on or make changes to this bug.