Bug 603835 - cluster_tests.test_management failing
Summary: cluster_tests.test_management failing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.3
: ---
Assignee: Alan Conway
QA Contact: Jeff Needle
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-06-14 16:35 UTC by Alan Conway
Modified: 2010-10-20 13:53 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-10-20 11:29:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Test logs from a failed run (4.84 KB, application/x-bzip-compressed-tar)
2010-06-14 16:35 UTC, Alan Conway
no flags Details

Description Alan Conway 2010-06-14 16:35:05 UTC
Created attachment 423905 [details]
Test logs from a failed run

Description of problem:

The cluster_tests.test_management test is failing:

cluster_tests.LongTests.test_management ............................................. fail
Error during test:
  Traceback (most recent call last):
    File "/home/remote/aconway/qpid2/qpid/dbg/src/tests/python/commands/qpid-python-test", line 311, in run
      phase()
    File "/home/remote/aconway/qpid2/qpid/cpp/src/tests/cluster_tests.py", line 291, in test_management
      for b in cluster[alive:]: b.ready() # Check if a broker crashed.
    File "/home/remote/aconway/qpid2/qpid/dbg/src/tests/python/qpid/brokertest.py", line 393, in ready
      except: raise RethrownException(
  RethrownException: Broker cluster1-0 failed ready test:
      cluster1-0: 2010-06-14 12:20:06 debug cluster(20.0.100.32:2063 LEFT/error) local close of replicated connection 20.0.100.32:2063-4(local)
      cluster1-0: 2010-06-14 12:20:06 debug cluster(20.0.100.32:2063 LEFT/error) deleted connection: 20.0.100.32:2063-4(local)
      cluster1-0: 2010-06-14 12:20:06 debug Shutting down CPG
      cluster1-0: 2010-06-14 12:20:06 notice Shut down



Version-Release number of selected component (if applicable): Trunk r954471

How reproducible: every time

Steps to Reproduce:
1. cd qpid/cpp/src/tests
2. source test_env.sh
3. run_cluster_tests *.test_management -DDURATION=2
  
Actual results: fail 

Expected results: pass

Comment 1 Ted Ross 2010-06-15 12:54:35 UTC
Additional information:

This can be reproduced simply by starting up a two-node cluster and running the command "qpid-stat -b" against one of the cluster nodes.

The connected node will fail with the following log:

2010-06-15 08:52:09 error Execution exception: invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151)
2010-06-15 08:52:09 critical cluster(127.0.0.1:29929 READY/error) local error 587 did not occur on member 127.0.0.1:29949: invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151)
2010-06-15 08:52:09 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.dhcp-100-18-254.bos.redhat.com.29971.3: confirmed < (45+0) but only sent < (44+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorCheck.cpp:89)
2010-06-15 08:52:09 notice cluster(127.0.0.1:29929 LEFT/error) leaving cluster TED
2010-06-15 08:52:09 notice Shut down

Comment 2 Ted Ross 2010-06-15 13:02:56 UTC
Even more information:

This problem is introduced at the client level.  If you revert qpid/extras/qmf/src/py/qmf/console.py to subversion rev 953702, the problem goes away.

The important difference in the console client code is that the newer version (that causes the crash) applies flow control back-pressure on multiple subscriptions.  Is it possible that credit balances for flow control are not being handled uniformly by nodes in the cluster?

Comment 3 Alan Conway 2010-06-16 20:37:05 UTC
Fixed in r955370, and mrg 1.3 release repo:
http://mrg1.lab.bos.redhat.com/git/?p=qpid.git;a=commitdiff;h=c8e4559e0a26efe70e3a462f8e49a4bd55ba46a2

Comment 4 Jiri Kolar 2010-06-23 17:54:26 UTC
Tested:
on 752581 bug appears
on 946106 does not. It has been fixed

validated on RHEL  5.5 i386 / x86_64 not on RHEL4 because of no clustering

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-16.el5_5.1
openais-debuginfo-0.80.6-16.el5_5.1
python-qpid-0.7.946106-1.el5
qpid-cpp-client-0.7.946106-2.el5
qpid-cpp-client-devel-0.7.946106-2.el5
qpid-cpp-client-devel-docs-0.7.946106-2.el5
qpid-cpp-client-ssl-0.7.946106-2.el5
qpid-cpp-mrg-debuginfo-0.7.946106-1.el5
qpid-cpp-server-0.7.946106-2.el5
qpid-cpp-server-cluster-0.7.946106-2.el5
qpid-cpp-server-devel-0.7.946106-2.el5
qpid-cpp-server-ssl-0.7.946106-2.el5
qpid-cpp-server-store-0.7.946106-2.el5
qpid-cpp-server-xml-0.7.946106-2.el5
qpid-java-client-0.7.946106-3.el5
qpid-java-common-0.7.946106-3.el5
qpid-tools-0.7.946106-4.el5  
rhm-docs-0.7.946106-1.el5

->VERIFIED


Note You need to log in before you can comment on or make changes to this bug.