Bug 485689 - clustered qpidd hangs during start-up (openais-0.80.5-1.el5)
Summary: clustered qpidd hangs during start-up (openais-0.80.5-1.el5)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.1.1
: ---
Assignee: Alan Conway
QA Contact: Frantisek Reznicek
URL:
Whiteboard:
Depends On:
Blocks: 485422
TreeView+ depends on / blocked
 
Reported: 2009-02-16 11:06 UTC by Frantisek Reznicek
Modified: 2015-11-16 00:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 21:05:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Frantisek Reznicek 2009-02-16 11:06:49 UTC
Description of problem:

After installing new openais (openais-0.80.5-1.el5) for retesting bug 485422
I can see qpidd hanging during startup in function semop().
I re-tested three times with the same results (management on/off), so
qpidd clustering is not functional!

See details below:
The cluster test detect running qpidd using netstat -nlp where qpidd does not 
appear using above listed packages.


Version-Release number of selected component (if applicable):
  [root@dell-pesc430-03 qpid_test_mnode_cluster]# rpm -qa | egrep '(qpidd|rhm|openais)'
  qpidd-xml-0.4.743861-1.el5
  qpidd-cluster-0.4.743861-1.el5
  openais-devel-0.80.5-1.el5
  qpidd-rdma-0.4.743861-1.el5
  qpidd-acl-0.4.743861-1.el5
  qpidd-0.4.743861-1.el5
  qpidd-devel-0.4.743861-1.el5
  openais-0.80.5-1.el5
  rhm-docs-0.4.734193-5.el5
  rhm-0.4.3116-1.el5
  qpidd-ssl-0.4.743861-1.el5




How reproducible:
100%

Steps to Reproduce:
1.
2.
3.
  
Actual results:
qpidd hangs ... qpidd clustering not functional

Expected results:
qpidd clustering should be working

Additional info:

[root@dell-pesc430-03 qpid_test_mnode_cluster]# ps auxw | grep qpidd
root      8913  0.0  0.9 133212  9332 pts/1    Sl+  05:44   0:00 qpidd --mgmt-enable no --auth no --log-enable debug+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0000 -p 0
root     11632  0.0  0.9 133212  9336 pts/1    Sl+  05:44   0:00 qpidd --mgmt-enable no --auth no --log-enable debug+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0001 -p 0
root     14029  0.1  0.9 133212  9336 pts/1    Sl+  05:44   0:00 qpidd --mgmt-enable no --auth no --log-enable debug+ --cluster-name qpid_mnode_cluster_test --data-dir qpidd-data-dir.0002 -p 0
root     25649  0.0  0.0  61184   736 pts/0    S+   05:45   0:00 grep qpidd
[root@dell-pesc430-03 qpid_test_mnode_cluster]# netstat -nlp | grep qpidd
[root@dell-pesc430-03 qpid_test_mnode_cluster]# netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:2049                0.0.0.0:*                   LISTEN      -
tcp        0      0 0.0.0.0:644                 0.0.0.0:*                   LISTEN      2158/rpc.statd
tcp        0      0 0.0.0.0:615                 0.0.0.0:*                   LISTEN      26724/rpc.mountd
tcp        0      0 0.0.0.0:111                 0.0.0.0:*                   LISTEN      2134/portmap
tcp        0      0 127.0.0.1:631               0.0.0.0:*                   LISTEN      2380/cupsd
tcp        0      0 0.0.0.0:59192               0.0.0.0:*                   LISTEN      -
tcp        0      0 0.0.0.0:601                 0.0.0.0:*                   LISTEN      26709/rpc.rquotad
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      2416/sendmail: acce
tcp        0      0 :::22                       :::*                        LISTEN      2370/sshd
udp        0      0 0.0.0.0:2049                0.0.0.0:*                               -
udp        0      0 0.0.0.0:641                 0.0.0.0:*                               2158/rpc.statd
udp        0      0 0.0.0.0:43916               0.0.0.0:*                               2533/avahi-daemon:
udp        0      0 10.16.64.106:54051          0.0.0.0:*                               6595/aisexec
udp        0      0 10.16.64.106:53795          0.0.0.0:*                               6595/aisexec
udp        0      0 226.94.1.11:54051           0.0.0.0:*                               6595/aisexec
udp        0      0 0.0.0.0:49828               0.0.0.0:*                               -
udp        0      0 0.0.0.0:68                  0.0.0.0:*                               2004/dhclient
udp        0      0 0.0.0.0:612                 0.0.0.0:*                               26724/rpc.mountd
udp        0      0 0.0.0.0:5353                0.0.0.0:*                               2533/avahi-daemon:
udp        0      0 0.0.0.0:111                 0.0.0.0:*                               2134/portmap
udp        0      0 0.0.0.0:631                 0.0.0.0:*                               2380/cupsd
udp        0      0 10.16.64.106:123            0.0.0.0:*                               2397/ntpd
udp        0      0 127.0.0.1:123               0.0.0.0:*                               2397/ntpd
udp        0      0 0.0.0.0:123                 0.0.0.0:*                               2397/ntpd
udp        0      0 0.0.0.0:1022                0.0.0.0:*                               26709/rpc.rquotad
udp        0      0 0.0.0.0:638                 0.0.0.0:*                               2158/rpc.statd
udp        0      0 :::53348                    :::*                                    2533/avahi-daemon:
udp        0      0 :::5353                     :::*                                    2533/avahi-daemon:
udp        0      0 fe80::212:3fff:fe77:bb3:123 :::*                                    2397/ntpd
udp        0      0 ::1:123                     :::*                                    2397/ntpd
udp        0      0 :::123                      :::*                                    2397/ntpd
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     7936   2437/gpm            /dev/gpmctl
unix  2      [ ACC ]     STREAM     LISTENING     9440   2567/gam_server     @/tmp/fam-root-
unix  2      [ ACC ]     STREAM     LISTENING     8247   2492/xfs            /tmp/.font-unix/fs7100
unix  2      [ ACC ]     STREAM     LISTENING     29988218 6595/aisexec        @libais.socket
unix  2      [ ACC ]     STREAM     LISTENING     6340   1628/mcstransd      /var/run/setrans/.setrans-unix
unix  2      [ ACC ]     STREAM     LISTENING     8396   2544/hald           @/var/run/hald/dbus-yEXhA9MDCN
unix  2      [ ACC ]     STREAM     LISTENING     8397   2544/hald           @/var/run/hald/dbus-Lrmdw0HasC
unix  2      [ ACC ]     STREAM     LISTENING     6845   2081/audispd        /var/run/audispd_events
unix  2      [ ACC ]     STREAM     LISTENING     7226   2215/dbus-daemon    /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     7308   2232/sdpd           /var/run/sdp
unix  2      [ ACC ]     STREAM     LISTENING     7400   2295/pcscd          /var/run/pcscd.comm
unix  2      [ ACC ]     STREAM     LISTENING     7520   2345/acpid          /var/run/acpid.socket
unix  2      [ ACC ]     STREAM     LISTENING     7677   2380/cupsd          /var/run/cups/cups.sock
unix  2      [ ACC ]     STREAM     LISTENING     8362   2533/avahi-daemon:  /var/run/avahi-daemon/socket
[root@dell-pesc430-03 qpid_test_mnode_cluster]# service openais status
aisexec (pid 6595) is running...
[root@dell-pesc430-03 qpid_test_mnode_cluster]# pstree -p 8913
qpidd(8913)─┬─{qpidd}(8966)
            └─{qpidd}(8967)
[root@dell-pesc430-03 qpid_test_mnode_cluster]# pstack 8913
Thread 3 (Thread 0x41140940 (LWP 8966)):
#0  0x0000003ee560ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180cbf in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x41b41940 (LWP 8967)):
#0  0x0000003ee560a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180a6c in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b924f4857a0 (LWP 8913)):
#0  0x0000003ee4ed4477 in semop () from /lib64/libc.so.6
#1  0x00002b924ff91bc4 in openais_msg_send_reply_receive ()
#2  0x00002b924ff924d1 in cpg_local_get () from /usr/lib64/openais/libcpg.so.2
#3  0x00002b924fd43a0b in qpid::cluster::Cpg::self ()
#4  0x00002b924fd15bc7 in qpid::cluster::Cluster::Cluster ()
#5  0x00002b924fd33263 in qpid::cluster::ClusterPlugin::earlyInitialize ()
#6  0x0000003fdc0cb1ae in qpid::broker::Broker::Broker ()
#7  0x00000000004068bc in ?? ()
#8  0x0000000000405438 in __cxa_pure_virtual ()
#9  0x0000003ee4e1d974 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000404eb9 in __cxa_pure_virtual ()
#11 0x00007fff5b63df68 in ?? ()
#12 0x0000000000000000 in ?? ()
[root@dell-pesc430-03 qpid_test_mnode_cluster]# pstack 8913
Thread 3 (Thread 0x41140940 (LWP 8966)):
#0  0x0000003ee560ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180cbf in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x41b41940 (LWP 8967)):
#0  0x0000003ee560a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180a6c in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b924f4857a0 (LWP 8913)):
#0  0x0000003ee4ed4477 in semop () from /lib64/libc.so.6
#1  0x00002b924ff91bc4 in openais_msg_send_reply_receive ()
#2  0x00002b924ff924d1 in cpg_local_get () from /usr/lib64/openais/libcpg.so.2
#3  0x00002b924fd43a0b in qpid::cluster::Cpg::self ()
#4  0x00002b924fd15bc7 in qpid::cluster::Cluster::Cluster ()
#5  0x00002b924fd33263 in qpid::cluster::ClusterPlugin::earlyInitialize ()
#6  0x0000003fdc0cb1ae in qpid::broker::Broker::Broker ()
#7  0x00000000004068bc in ?? ()
#8  0x0000000000405438 in __cxa_pure_virtual ()
#9  0x0000003ee4e1d974 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000404eb9 in __cxa_pure_virtual ()
#11 0x00007fff5b63df68 in ?? ()
#12 0x0000000000000000 in ?? ()
[root@dell-pesc430-03 qpid_test_mnode_cluster]# strace -p 8913
Process 8913 attached - interrupt to quit
semop(425997, 0x7fff5b63c7a0, 1 <unfinished ...>
Process 8913 detached
[root@dell-pesc430-03 qpid_test_mnode_cluster]# strace -p 8913
Process 8913 attached - interrupt to quit
semop(425997, 0x7fff5b63c7a0, 1 <unfinished ...>
Process 8913 detached
[root@dell-pesc430-03 qpid_test_mnode_cluster]# pstack 8913
Thread 3 (Thread 0x41140940 (LWP 8966)):
#0  0x0000003ee560ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180cbf in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x41b41940 (LWP 8967)):
#0  0x0000003ee560a899 in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000003fdc180a6c in qpid::broker::Timer::run ()
#2  0x0000003fdbb67e5a in ?? () from /usr/lib64/libqpidcommon.so.0
#3  0x0000003ee5606367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003ee4ed30ad in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b924f4857a0 (LWP 8913)):
#0  0x0000003ee4ed4477 in semop () from /lib64/libc.so.6
#1  0x00002b924ff91bc4 in openais_msg_send_reply_receive ()
#2  0x00002b924ff924d1 in cpg_local_get () from /usr/lib64/openais/libcpg.so.2
#3  0x00002b924fd43a0b in qpid::cluster::Cpg::self ()
#4  0x00002b924fd15bc7 in qpid::cluster::Cluster::Cluster ()
#5  0x00002b924fd33263 in qpid::cluster::ClusterPlugin::earlyInitialize ()
#6  0x0000003fdc0cb1ae in qpid::broker::Broker::Broker ()
#7  0x00000000004068bc in ?? ()
#8  0x0000000000405438 in __cxa_pure_virtual ()
#9  0x0000003ee4e1d974 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000404eb9 in __cxa_pure_virtual ()
#11 0x00007fff5b63df68 in ?? ()
#12 0x0000000000000000 in ?? ()
[root@dell-pesc430-03 qpid_test_mnode_cluster]# rpm -q openais
openais-0.80.5-1.el5

Here goes qpidd logfile with and without management (debug+)
-- w mgmt
[root@dell-pesc430-03 qpid_test_mnode_cluster]# cat dell-pesc430-03.rhts.bos.redhat.com/j0000.sh.log
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/acl.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so
2009-feb-16 05:38:37 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it.
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/xml.so
2009-feb-16 05:38:37 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so
2009-feb-16 05:38:37 info Management enabled
2009-feb-16 05:38:37 info ManagementBroker generated broker ID: 4cb368c5-18ec-4d9a-ae95-7524836df7d6
2009-feb-16 05:38:37 debug ManagementBroker boot sequence: 1
2009-feb-16 05:38:37 debug ManagementBroker added package org.apache.qpid.broker
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:system
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:broker
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:agent
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:vhost
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:queue
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:exchange
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:binding
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:connection
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:link
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:bridge
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:session
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:clientConnect
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:clientConnectFail
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:clientDisconnect
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:brokerLinkUp
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:brokerLinkDown
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:queueDeclare
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:queueDelete
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:exchangeDeclare
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:exchangeDelete
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:bind
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:unbind
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:subscribe
2009-feb-16 05:38:37 debug ManagementBroker added class org.apache.qpid.broker:unsubscribe
2009-feb-16 05:38:37 notice Journal "TplStore": Created
2009-feb-16 05:38:37 debug Journal "TplStore": Journal directory = "qpidd-data-dir.0000/rhm/tpl/"; Base file name = "tpl"
2009-feb-16 05:38:37 notice Store module initialized; dir=qpidd-data-dir.0000
2009-feb-16 05:38:37 info > Default files per journal: 8
2009-feb-16 05:38:37 info > Default journal file size: 24 (wpgs)
2009-feb-16 05:38:37 info > Default write cache page size: 32 (Kib)
2009-feb-16 05:38:37 info > Default number of write cache pages: 32
2009-feb-16 05:38:37 info > TPL files per journal: 8
2009-feb-16 05:38:37 info > TPL journal file size: 24 (wpgs)
2009-feb-16 05:38:37 info > TPL write cache page size: 4 (Kib)
2009-feb-16 05:38:37 info > TPL number of write cache pages: 64
2009-feb-16 05:38:37 debug Initialized CPG handle 0x714e8a7400000000


-- w/o mgmt
[root@dell-pesc430-03 qpid_test_mnode_cluster]# cat dell-pesc430-03.rhts.bos.redhat.com/j0000.sh.log
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/rdma.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/client/rdmaconnector.so
2009-feb-16 05:44:29 info SSL connector not enabled, you must set QPID_SSL_CERT_DB to enable it.
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/client/sslconnector.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/cluster.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/replicating_listener.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/msgstore.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/ssl.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/xml.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/replication_exchange.so
2009-feb-16 05:44:29 info Loaded Module: /usr/lib64/qpid/daemon/acl.so
2009-feb-16 05:44:29 debug Initialized CPG handle 0x51934f8700000000

Comment 1 Alan Conway 2009-02-16 17:58:23 UTC
raised bug 485750 against openais

Comment 2 Alan Conway 2009-02-17 14:35:41 UTC
Steven Dake wrote:
> The problem was what was built was not what was tested.  3 revisions
> were missing from the source rpm that I expected to be there.  One of
> the revisions resulted in a segfault which is the cause of all the
> breakage you are seeing.  I have manually added them so we should be
> ready to rock now.
>

Comment 3 Frantisek Reznicek 2009-02-20 12:31:48 UTC
Watch the https://bugzilla.redhat.com/show_bug.cgi?id=485750#c11 for review of the final qpidd selinux policy module.

This is the qpidd <-> openais 0.80.5-2 selinux policy module which fixes the bug:

file: qpidd.te
---------------------------------------------------------------------------
module qpidd 1.0;

require {
        type initrc_t;
        type ccs_t;
        type tmpfs_t;
        class sem { unix_read read write unix_write associate };
        class shm { unix_read read write unix_write associate };
        class file { read write };
        class capability ipc_owner;
}

#============= ccs_t ==============
allow ccs_t tmpfs_t:file { read write };
allow ccs_t initrc_t:sem { unix_read read write unix_write associate };
allow ccs_t initrc_t:shm { unix_read read write unix_write associate };
allow ccs_t self:capability ipc_owner;

---------------------------------------------------------------------------
#eof

Comment 4 David Sommerseth 2009-02-27 15:36:13 UTC
Tested against 0.4.747892-1.el5.x86_64.rpms (https://brewweb.devel.redhat.com/taskinfo?taskID=1705562) and the current candidate release (0.4.744917-1.el5) rpms.

Reproduction:

- Make sure SELinux is Enforced (getenforce)
- Upgrade openais to openais-0.80.5-2.el5
- Configure openais (/etc/ais/openais.conf) and a corresponding network interface.
- Configure qpidd (/etc/qpidd.conf).  Add cluster-name=foo
- Test 1: Try to start qpidd from command line (qpidd -t --cluster-name=foo)
- Test 2: Try to start qpidd via service (service qpidd start)

For the candidate release, both tests failed due to qpidd hanging.

For the brewbuilds qpidd starts perfectly well via the service command (Test 2) but hangs when starting via command line (Test 1).

This is the correct behavior, since the SELinux context will be different when started from command line and via service/init rc.

Comment 5 Justin Ross 2011-06-27 21:05:41 UTC
Fixed and verified; closing.


Note You need to log in before you can comment on or make changes to this bug.