606324 – qpid qmf python qpid-stat client shutdowns the cluster except the node against which the client ran

Bug 606324 - qpid qmf python qpid-stat client shutdowns the cluster except the node against which the client ran

Summary: qpid qmf python qpid-stat client shutdowns the cluster except the node agains...

Keywords:
Status:	CLOSED DUPLICATE of bug 605763
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	Development
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.3
Target Release:	---
Assignee:	Alan Conway
QA Contact:	MRG Quality Engineering
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	570154
TreeView+	depends on / blocked

Reported:	2010-06-21 12:28 UTC by Frantisek Reznicek
Modified:	2015-11-16 01:12 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-06-22 13:58:49 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
The bug related data (configurations, logs and transcripts) (12.69 KB, application/x-tbz) 2010-06-21 12:30 UTC, Frantisek Reznicek	no flags	Details
View All

Description Frantisek Reznicek 2010-06-21 12:28:48 UTC

Description of problem:

There is observed following behavior:

Let's have 4 node broker on 4 different machines, all broker are configured the same way:
  cluster-mechanism=ANONYMOUS
  auth=no
  log-to-file=/tmp/qpidd.log
  log-enable=info+
  log-enable=debug+:cluster
  cluster-name=fcluster

All brokers start up well with following commands:
  :>/tmp/qpidd.log ; :>/tmp/openais.log
  service qpidd stop ; service openais restart ; service qpidd start

  netstat -nlp | grep qpidd

  qpid-cluster
  qpid-stat -b


When all four nodes are up I ran "qpid-cluster" and all four nodes are shown as up. Then I ran "qpid-stat -b" on all nodes approximatelly in the same moment and I saw the last started node crashing like this:
  [root@mrg-qe-12 fcluster]# qpid-stat -b
  Brokers
    broker            cluster           uptime  conn  sess  exch  queue
    =====================================================================
    10.34.33.62:5672  fcluster(ACTIVE)  1m 41s     4     4     8    40
    10.34.33.63:5672  fcluster(ACTIVE)  1m 20s     4     4     8    60
    10.34.33.64:5672  fcluster(ACTIVE)  1m 28s     3     3     8    55
    10.34.33.65:5672  fcluster(ACTIVE)  1m 27s     3     3     8    15
  Exception in thread Thread for broker: 10.34.33.65:5672 (most likely raised during interpreter shutdown):
  Traceback (most recent call last):
    File "/usr/lib/python2.4/threading.py", line 442, in __bootstrap
    File "/usr/lib/python2.4/site-packages/qmf/console.py", line 2536, in run
    File "/usr/lib/python2.4/Queue.py", line 125, in get
  exceptions.TypeError: 'NoneType' object is not callable
  Unhandled exception in thread started by
  Error in sys.excepthook:
  
  Original exception was:

After analysis I found that all brokers in the cluster except the one which showed above qpid-stat -b issue left the cluster for following message[s]:

  2010-06-21 13:58:15 error Execution exception: invalid-argument: anonymous.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0
  ) but only sent < (61+0) (qpid/SessionState.cpp:151)
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) channel error 7633 on 10.34.33.64:60903(192.168.157.12:3793-6 shad
  ow) must be resolved with: 192.168.157.9:4141 192.168.157.10:3473 192.168.157.11:3116 192.168.157.12:3793 : invalid-argument: anonymo
  us.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp:151)
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.9:4141
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.10:3473 192.168.157.1
  1:3116 192.168.157.12:3793
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.10:3473
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.11:3116 192.168.157.1
  2:3793
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 resolved with 192.168.157.11:3116
  2010-06-21 13:58:15 debug cluster(192.168.157.11:3116 READY/error) error 7633 must be resolved with 192.168.157.12:3793
  2010-06-21 13:58:15 critical cluster(192.168.157.11:3116 READY/error) local error 7633 did not occur on member 192.168.157.12:3793: i
  nvalid-argument: anonymous.mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp:
  151)
  2010-06-21 13:58:15 critical Error delivering frames: local error did not occur on all cluster members : invalid-argument: anonymous.
  mrg-qe-11.lab.eng.brq.redhat.com.3165.5: confirmed < (62+0) but only sent < (61+0) (qpid/SessionState.cpp:151) (qpid/cluster/ErrorChe
  ck.cpp:89)
  2010-06-21 13:58:15 notice cluster(192.168.157.11:3116 LEFT/error) leaving cluster fcluster
  2010-06-21 13:58:15 debug Shutting down CPG
  2010-06-21 13:58:15 notice Shut down


Version-Release number of selected component (if applicable):
python-qmf-0.7.946106-4.el5
python-qpid-0.7.946106-2.el5
qmf-0.7.946106-4.el5
qmf-devel-0.7.946106-4.el5
qpid-cpp-client-0.7.946106-4.el5
qpid-cpp-client-devel-0.7.946106-4.el5
qpid-cpp-client-devel-docs-0.7.946106-4.el5
qpid-cpp-client-ssl-0.7.946106-4.el5
qpid-cpp-server-0.7.946106-4.el5
qpid-cpp-server-cluster-0.7.946106-4.el5
qpid-cpp-server-devel-0.7.946106-4.el5
qpid-cpp-server-ssl-0.7.946106-4.el5
qpid-cpp-server-store-0.7.946106-4.el5
qpid-cpp-server-xml-0.7.946106-4.el5
qpid-java-client-0.7.946106-4.el5
qpid-java-common-0.7.946106-4.el5
qpid-tools-0.7.946106-4.el5


How reproducible:
rapidly on mrr-qe-09 ... 12

Steps to Reproduce:
1. set-up openais and qpidd
2. service qpidd stop ; service openais restart ; service qpidd start
3. loop netstat -nlp | grep qpidd ; qpid-cluster ; qpid-stat -b
4. in the few moments cluster width reduces from N to typically 1
  
Actual results:
cluster does not survive qpid-stat -b management clients.

Expected results:
cluster should survive qpid-stat -b management clients.

Additional info:

Comment 1 Frantisek Reznicek 2010-06-21 12:30:51 UTC

Created attachment 425613 [details]
The bug related data (configurations, logs and transcripts)

The attachment contain data from 4 node cluster run (qpidd/openais configurations, qpidd/openais logs and terminal transcripts)

Comment 2 Alan Conway 2010-06-22 13:58:49 UTC


*** This bug has been marked as a duplicate of bug 605763 ***

Note You need to log in before you can comment on or make changes to this bug.