Bug 484407 - clustered qpidd broker crashes when trying to start up if management is enabled (==default)
clustered qpidd broker crashes when trying to start up if management is enabl...
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1
All Linux
high Severity high
: 1.1.1
: ---
Assigned To: Alan Conway
Frantisek Reznicek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-02-06 11:40 EST by Frantisek Reznicek
Modified: 2015-11-15 19:06 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-04-21 12:17:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2009-02-06 11:40:28 EST
Description of problem:
It is unable to launch c++ broker (qpidd) if management is enabled (which is by default)

qpidd produces following backtrace:

Core was generated by `qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --da'.
Program terminated with signal 11, Segmentation fault.
[New process 30884]
#0  0x00b91e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
(gdb) bt
#0  0x00b91e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#1  0x00bdbb2b in qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#2  0x0983df83 in qpid::cluster::FailoverExchange::FailoverExchange () from /usr/lib/qpid/daemon/cluster.so
#3  0x097fcbf4 in qpid::cluster::Cluster::Cluster () from /usr/lib/qpid/daemon/cluster.so
#4  0x0981c6e1 in qpid::cluster::ClusterPlugin::earlyInitialize () from /usr/lib/qpid/daemon/cluster.so
#5  0x00bc7909 in qpid::broker::Broker::Broker () from /usr/lib/libqpidbroker.so.0
#6  0x0804dc70 in ?? ()
#7  0x0804c677 in __cxa_pure_virtual ()
#8  0x0086be8c in __libc_start_main () from /lib/libc.so.6
#9  0x0804c001 in __cxa_pure_virtual ()


Version-Release number of selected component (if applicable):
[root@dell-pe2850-01 _]# rpm -qa | egrep '(qpid|rhm|openais)'
qpid-java-client-0.4.738568-1.el5
qpidc-ssl-0.4.741135-1.el5
qpidd-xml-0.4.741135-1.el5
qpidd-rdma-0.4.741135-1.el5
openais-0.80.3-22.el5
qpid-java-common-0.4.738568-1.el5
qpidc-0.4.741135-1.el5
python-qpid-0.4.741135-1.el5
rhm-docs-0.4.734193-5.el5
qpidc-rdma-0.4.741135-1.el5
rhm-0.4.3108-1.el5
qpidc-perftest-0.4.741135-1.el5
qpidd-devel-0.4.741135-1.el5
openais-devel-0.80.3-22.el5
qpidc-devel-0.4.741135-1.el5
qpidd-0.4.741135-1.el5
qpidd-ssl-0.4.741135-1.el5
qpidd-cluster-0.4.741135-1.el5
qpidd-acl-0.4.741135-1.el5


How reproducible:
100%

Steps to Reproduce:
1. configure openais on at least 2 node cluster
2. launch service openais start on both
3. start qpidd for instance this way:
mkdir -p qpidd-data-dir.1 ; qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0 -p 0 > qpidd.1.log 2>&1
  
Actual results:
qpidd crashes

Expected results:
qpidd should not crash

Additional info:
[root@dell-pe2850-01 dell-pe2850-01.rhts.bos.redhat.com]# service openais restart
Stopping OpenAIS daemon (aisexec): [  OK  ]
Starting OpenAIS daemon (aisexec): [  OK  ]
[root@dell-pe2850-01 dell-pe2850-01.rhts.bos.redhat.com]# cd
[root@dell-pe2850-01 ~]# mkdir _
[root@dell-pe2850-01 ~]# cd _
[root@dell-pe2850-01 _]# service openais status
aisexec (pid 30850) is running...
[root@dell-pe2850-01 _]# cat /etc/ais/openais.conf
# Please read the openais.conf.5 manual page
 
totem {
        version: 2
        secauth: off
        threads: 0
        interface {
                ringnumber: 0
                bindnetaddr: 10.16.65.0
                mcastaddr: 226.94.1.11
                mcastport: 54051
        }
}
 
logging {
        debug: on
        timestamp: on
        to_file: yes
        logfile: /tmp/ais
}
 
amf {
        mode: disabled
}
[root@dell-pe2850-01 _]# mkdir -p qpidd-data-dir.0 ; qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0 -p 0 > qpidd.0.log 2>&1
Segmentation fault (core dumped)
[root@dell-pe2850-01 _]# mkdir -p qpidd-data-dir.1 ; qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0 -p 0 > qpidd.1.log 2>&1
Segmentation fault (core dumped)
[root@dell-pe2850-01 _]# ls
core.30874  core.30884  qpidd.0.log  qpidd.1.log  qpidd-data-dir.0  qpidd-data-dir.1
[root@dell-pe2850-01 _]# which qpidd
/usr/sbin/qpidd
[root@dell-pe2850-01 _]# gdb /usr/sbin/qpidd core.30874
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)

<snip>
 
Loaded symbols for /usr/lib/qpid/client/sslconnector.so
(no debugging symbols found)
Core was generated by `qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --da'.
Program terminated with signal 11, Segmentation fault.
[New process 30874]
[New process 30878]
[New process 30877]
[New process 30876]
[New process 30875]
#0  0x009a3e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
(gdb) bt
#0  0x009a3e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#1  0x009edb2b in qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#2  0x008e9f83 in qpid::cluster::FailoverExchange::FailoverExchange () from /usr/lib/qpid/daemon/cluster.so
#3  0x008a8bf4 in qpid::cluster::Cluster::Cluster () from /usr/lib/qpid/daemon/cluster.so
#4  0x008c86e1 in qpid::cluster::ClusterPlugin::earlyInitialize () from /usr/lib/qpid/daemon/cluster.so
#5  0x009d9909 in qpid::broker::Broker::Broker () from /usr/lib/libqpidbroker.so.0
#6  0x0804dc70 in ?? ()
#7  0x0804c677 in __cxa_pure_virtual ()
#8  0x004b2e8c in __libc_start_main () from /lib/libc.so.6
#9  0x0804c001 in __cxa_pure_virtual ()
(gdb) quit
[root@dell-pe2850-01 _]# gdb /usr/sbin/qpidd core.30884
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(no debugging symbols found)

<snip>
 
Loaded symbols for /usr/lib/qpid/client/sslconnector.so
(no debugging symbols found)
Core was generated by `qpidd --auth no --log-enable trace+ --cluster-name qpid_mnode_cluster_test --da'.
Program terminated with signal 11, Segmentation fault.
[New process 30884]
#0  0x00b91e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
(gdb) bt
#0  0x00b91e7c in qmf::org::apache::qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#1  0x00bdbb2b in qpid::broker::Exchange::Exchange () from /usr/lib/libqpidbroker.so.0
#2  0x0983df83 in qpid::cluster::FailoverExchange::FailoverExchange () from /usr/lib/qpid/daemon/cluster.so
#3  0x097fcbf4 in qpid::cluster::Cluster::Cluster () from /usr/lib/qpid/daemon/cluster.so
#4  0x0981c6e1 in qpid::cluster::ClusterPlugin::earlyInitialize () from /usr/lib/qpid/daemon/cluster.so
#5  0x00bc7909 in qpid::broker::Broker::Broker () from /usr/lib/libqpidbroker.so.0
#6  0x0804dc70 in ?? ()
#7  0x0804c677 in __cxa_pure_virtual ()
#8  0x0086be8c in __libc_start_main () from /lib/libc.so.6
#9  0x0804c001 in __cxa_pure_virtual ()
(gdb) quit
[root@dell-pe2850-01 _]# service openais status
aisexec (pid 30850) is running...
[root@dell-pe2850-01 _]# ll /tmp/ais
-rw-r--r-- 1 root root 6096 Feb  6 10:47 /tmp/ais
[root@dell-pe2850-01 _]# vi /tmp/ais
[root@dell-pe2850-01 _]# rpm -qa | egrep '(qpid|rhm|openais)'
qpid-java-client-0.4.738568-1.el5
qpidc-ssl-0.4.741135-1.el5
qpidd-xml-0.4.741135-1.el5
qpidd-rdma-0.4.741135-1.el5
openais-0.80.3-22.el5
qpid-java-common-0.4.738568-1.el5
qpidc-0.4.741135-1.el5
python-qpid-0.4.741135-1.el5
rhm-docs-0.4.734193-5.el5
qpidc-rdma-0.4.741135-1.el5
rhm-0.4.3108-1.el5
qpidc-perftest-0.4.741135-1.el5
qpidd-devel-0.4.741135-1.el5
openais-devel-0.80.3-22.el5
qpidc-devel-0.4.741135-1.el5
qpidd-0.4.741135-1.el5
qpidd-ssl-0.4.741135-1.el5
qpidd-cluster-0.4.741135-1.el5
qpidd-acl-0.4.741135-1.el5
[root@dell-pe2850-01 _]# cat /tmp/ais
Feb  6 10:45:15.616584 [MAIN ] AIS Executive Service RELEASE 'subrev 1358 version 0.80.3'
Feb  6 10:45:15.616740 [MAIN ] Copyright (C) 2002-2006 MontaVista Software, Inc and contributors.
Feb  6 10:45:15.616760 [MAIN ] Copyright (C) 2006 Red Hat, Inc.
Feb  6 10:45:15.616778 [MAIN ] AIS Executive Service: started and ready to provide service.
Feb  6 10:45:15.616795 [MAIN ] openais component openais_cpg loaded.
Feb  6 10:45:15.616811 [MAIN ] Registering service handler 'openais cluster closed process group service v1.01'
Feb  6 10:45:15.616828 [MAIN ] openais component openais_cfg loaded.
Feb  6 10:45:15.616845 [MAIN ] Registering service handler 'openais configuration service'
Feb  6 10:45:15.616862 [MAIN ] openais component openais_msg loaded.
Feb  6 10:45:15.616878 [MAIN ] Registering service handler 'openais message service B.01.01'
Feb  6 10:45:15.616895 [MAIN ] openais component openais_lck loaded.
Feb  6 10:45:15.616911 [MAIN ] Registering service handler 'openais distributed locking service B.01.01'
Feb  6 10:45:15.616928 [MAIN ] openais component openais_evt loaded.
Feb  6 10:45:15.616944 [MAIN ] Registering service handler 'openais event service B.01.01'
Feb  6 10:45:15.616961 [MAIN ] openais component openais_ckpt loaded.
Feb  6 10:45:15.616979 [MAIN ] Registering service handler 'openais checkpoint service B.01.01'
Feb  6 10:45:15.616996 [MAIN ] openais component openais_amf loaded.
Feb  6 10:45:15.617012 [MAIN ] Registering service handler 'openais availability management framework B.01.01'
Feb  6 10:45:15.617028 [MAIN ] openais component openais_clm loaded.
Feb  6 10:45:15.617045 [MAIN ] Registering service handler 'openais cluster membership service B.01.01'
Feb  6 10:45:15.617061 [MAIN ] openais component openais_evs loaded.
Feb  6 10:45:15.617077 [MAIN ] Registering service handler 'openais extended virtual synchrony service'
Feb  6 10:45:15.617109 [print.c:0344] log setup
Feb  6 10:45:15.634175 [TOTEM] Token Timeout (1000 ms) retransmit timeout (238 ms)
Feb  6 10:45:15.634218 [TOTEM] token hold (180 ms) retransmits before loss (4 retrans)
Feb  6 10:45:15.634230 [TOTEM] join (50 ms) send_join (0 ms) consensus (800 ms) merge (200 ms)
Feb  6 10:45:15.634241 [TOTEM] downcheck (1000 ms) fail to recv const (50 msgs)
Feb  6 10:45:15.634252 [TOTEM] seqno unchanged const (30 rotations) Maximum network MTU 1500
Feb  6 10:45:15.634262 [TOTEM] window size per rotation (50 messages) maximum messages per rotation (17 messages)
Feb  6 10:45:15.634273 [TOTEM] send threads (0 threads)
Feb  6 10:45:15.634283 [TOTEM] RRP token expired timeout (238 ms)
Feb  6 10:45:15.634293 [TOTEM] RRP token problem counter (2000 ms)
Feb  6 10:45:15.634303 [TOTEM] RRP threshold (10 problem count)
Feb  6 10:45:15.634313 [TOTEM] RRP mode set to none.
Feb  6 10:45:15.634323 [TOTEM] heartbeat_failures_allowed (0)
Feb  6 10:45:15.634333 [TOTEM] max_network_delay (50 ms)
Feb  6 10:45:15.634377 [TOTEM] HeartBeat is Disabled. To enable set heartbeat_failures_allowed > 0
Feb  6 10:45:15.634680 [TOTEM] Receive multicast socket recv buffer size (262142 bytes).
Feb  6 10:45:15.634696 [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Feb  6 10:45:15.637240 [TOTEM] The network interface [10.16.65.59] is now up.
Feb  6 10:45:15.637287 [TOTEM] Created or loaded sequence id 108.10.16.65.59 for this ring.
Feb  6 10:45:15.637338 [TOTEM] entering GATHER state from 15.
Feb  6 10:45:15.637566 [SERV ] Initialising service handler 'openais extended virtual synchrony service'
Feb  6 10:45:15.637585 [SERV ] Initialising service handler 'openais cluster membership service B.01.01'
Feb  6 10:45:15.637699 [SERV ] Initialising service handler 'openais availability management framework B.01.01'
Feb  6 10:45:15.637721 [SERV ] Initialising service handler 'openais checkpoint service B.01.01'
Feb  6 10:45:15.637735 [SERV ] Initialising service handler 'openais event service B.01.01'
Feb  6 10:45:15.637755 [SERV ] Initialising service handler 'openais distributed locking service B.01.01'
Feb  6 10:45:15.637769 [SERV ] Initialising service handler 'openais message service B.01.01'
Feb  6 10:45:15.637782 [SERV ] Initialising service handler 'openais configuration service'
Feb  6 10:45:15.637795 [SERV ] Initialising service handler 'openais cluster closed process group service v1.01'
Feb  6 10:45:15.637812 [SYNC ] Not using a virtual synchrony filter.
Feb  6 10:45:15.637887 [TOTEM] Creating commit token because I am the rep.
Feb  6 10:45:15.637905 [TOTEM] Saving state aru 0 high seq received 0
Feb  6 10:45:15.637927 [TOTEM] Storing new sequence id for ring 70
Feb  6 10:45:15.637982 [TOTEM] entering COMMIT state.
Feb  6 10:45:15.638003 [TOTEM] entering RECOVERY state.
Feb  6 10:45:15.638048 [TOTEM] position [0] member 10.16.65.59:
Feb  6 10:45:15.638060 [TOTEM] previous ring seq 108 rep 10.16.65.59
Feb  6 10:45:15.638071 [TOTEM] aru 0 high delivered 0 received flag 1
Feb  6 10:45:15.638090 [TOTEM] Did not need to originate any messages in recovery.
Feb  6 10:45:15.638111 [TOTEM] Sending initial ORF token
Feb  6 10:45:15.638241 [CLM  ] CLM CONFIGURATION CHANGE
Feb  6 10:45:15.638258 [CLM  ] New Configuration:
Feb  6 10:45:15.638269 [CLM  ] Members Left:
Feb  6 10:45:15.638279 [CLM  ] Members Joined:
Feb  6 10:45:15.638316 [CLM  ] CLM CONFIGURATION CHANGE
Feb  6 10:45:15.638327 [CLM  ] New Configuration:
Feb  6 10:45:15.638341 [CLM  ]  r(0) ip(10.16.65.59)
Feb  6 10:45:15.638352 [CLM  ] Members Left:
Feb  6 10:45:15.638362 [CLM  ] Members Joined:
Feb  6 10:45:15.638374 [CLM  ]  r(0) ip(10.16.65.59)
Feb  6 10:45:15.638392 [SYNC ] This node is within the primary component and will provide service.
Feb  6 10:45:15.638415 [TOTEM] entering OPERATIONAL state.
Feb  6 10:45:15.640139 [CLM  ] got nodejoin message 10.16.65.59
Feb  6 10:46:58.581059 [ipc.c:1155] connection received from libais client 8.
Feb  6 10:46:58.582255 [ipc.c:1155] connection received from libais client 9.
Feb  6 10:47:11.834853 [ipc.c:1155] connection received from libais client 8.
Feb  6 10:47:11.835862 [ipc.c:1155] connection received from libais client 9.
Comment 1 Alan Conway 2009-02-06 11:41:57 EST
Fixed in revision 741624
Comment 2 Frantisek Reznicek 2009-02-13 07:46:14 EST
The issue has been fixed, validated on RHEL 5.3 i386 / x86_64 on packages qpidd-0.4.743861-1.el5, qpidd-cluster-0.4.743861-1.el5, rhm-0.4.3116-1.el5.

->VERIFIED
Comment 3 Lana Brindley 2009-02-24 21:48:46 EST
How was this bug fixed?

LKB
Comment 4 Alan Conway 2009-02-25 08:15:08 EST
The management system was being initialized too late by the Cluster code, I moved the initialization to the correct place.
Comment 6 errata-xmlrpc 2009-04-21 12:17:43 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html

Note You need to log in before you can comment on or make changes to this bug.