Description of problem: There is observed following behavior: Let's have 4 node broker on 4 different machines, all broker are configured the same way: cluster-mechanism=ANONYMOUS auth=no log-to-file=/tmp/qpidd.log log-enable=info+ log-enable=debug+:cluster cluster-name=fcluster All brokers start up well with following commands: :>/tmp/qpidd.log ; :>/tmp/openais.log service qpidd stop ; service openais restart ; service qpidd start netstat -nlp | grep qpidd qpid-cluster qpid-stat -b When all four nodes are up I ran "qpid-cluster" and all four nodes are shown as up. Then I ran "qpid-stat -b" on all nodes approximatelly in the same moment and I saw the last started node crashing like this: [root@mrg-qe-12 fcluster]# qpid-stat -b Brokers broker cluster uptime conn sess exch queue ===================================================================== 10.34.33.62:5672 fcluster(ACTIVE) 1m 41s 4 4 8 40 10.34.33.63:5672 fcluster(ACTIVE) 1m 20s 4 4 8 60 10.34.33.64:5672 fcluster(ACTIVE) 1m 28s 3 3 8 55 10.34.33.65:5672 fcluster(ACTIVE) 1m 27s 3 3 8 15 Exception in thread Thread for broker: 10.34.33.65:5672 (most likely raised during interpreter shutdown): Traceback (most recent call last): File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2536, in run File "/usr/lib/python2.4/Queue.py", line 125, in get exceptions.TypeError: 'NoneType' object is not callable Unhandled exception in thread started by Error in sys.excepthook: Original exception was: After analysis I found that all brokers in the cluster except the one which showed above qpid-stat -b issue left the cluster for following message[s]: 2010-06-21 13:58:15 error Execution exception: invalid-argument: anonymous.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0 ) but only sent < (61+0) (qpid/SessionState.cpp:151) 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) channel error 7633 on 10.34.33.64:60903(192.168.157.12:3793-6 shad ow) must be resolved with: 192.168.157.9:4141 192.168.157.10:3473 192.168.157.11:3116 192.168.157.12:3793 : invalid-argument: anonymo us.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp:151) 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.9:4141 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.10:3473 192.168.157.1 1:3116 192.168.157.12:3793 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.10:3473 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.11:3116 192.168.157.1 2:3793 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.11:3116 2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.12:3793 2010-06-21 13:58:15 critical cluster(192.168.157.11:3116 READY/error) local error 7633 did not occur on member 192.168.157.12:3793: i nvalid-argument: anonymous.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp: 151) 2010-06-21 13:58:15 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous. mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorChe ck.cpp:89) 2010-06-21 13:58:15 notice cluster(192.168.157.11:3116 LEFT/error) leaving cluster fcluster 2010-06-21 13:58:15 debug Shutting down CPG 2010-06-21 13:58:15 notice Shut down Version-Release number of selected component (if applicable): python-qmf-0.7.946106-4.el5 python-qpid-0.7.946106-2.el5 qmf-0.7.946106-4.el5 qmf-devel-0.7.946106-4.el5 qpid-cpp-client-0.7.946106-4.el5 qpid-cpp-client-devel-0.7.946106-4.el5 qpid-cpp-client-devel-docs-0.7.946106-4.el5 qpid-cpp-client-ssl-0.7.946106-4.el5 qpid-cpp-server-0.7.946106-4.el5 qpid-cpp-server-cluster-0.7.946106-4.el5 qpid-cpp-server-devel-0.7.946106-4.el5 qpid-cpp-server-ssl-0.7.946106-4.el5 qpid-cpp-server-store-0.7.946106-4.el5 qpid-cpp-server-xml-0.7.946106-4.el5 qpid-java-client-0.7.946106-4.el5 qpid-java-common-0.7.946106-4.el5 qpid-tools-0.7.946106-4.el5 How reproducible: rapidly on mrr-qe-09 ... 12 Steps to Reproduce: 1. set-up openais and qpidd 2. service qpidd stop ; service openais restart ; service qpidd start 3. loop netstat -nlp | grep qpidd ; qpid-cluster ; qpid-stat -b 4. in the few moments cluster width reduces from N to typically 1 Actual results: cluster does not survive qpid-stat -b management clients. Expected results: cluster should survive qpid-stat -b management clients. Additional info:
Created attachment 425613 [details] The bug related data (configurations, logs and transcripts) The attachment contain data from 4 node cluster run (qpidd/openais configurations, qpidd/openais logs and terminal transcripts)
*** This bug has been marked as a duplicate of bug 605763 ***