Hide Forgot
Description of problem: While reproducing the bug 667428 I triggered very frequently state when qpidd broker process is running, but no client can be served. (all get 'Connection refused') The scenario which triggers this problem is bug 667428 looped by shell script (see attachement). Once the testing script finishes looping one broker on port 10000 should be up and running. (the testing script launches brokers at ports 5672, then 10000, then launches transactional java sender and kills the head (first/5672) broker) The process backtrace from the unresponsive: (gdb) 4 Thread 0x424c6940 (LWP 27463) 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x42ec7940 (LWP 27465) 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x438c8940 (LWP 27467) 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x2b745ec5efe0 (LWP 27457) 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 (gdb) Thread 4 (Thread 0x424c6940 (LWP 27463)): #0 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b745e9c75ba in qpid::sys::Timer::run (this=0xffffffffffffff92) at ../include/qpid/sys/posix/Condition.h:69 #2 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable ( p=0x166a4014) at qpid/sys/posix/Thread.cpp:35 #3 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x42ec7940 (LWP 27465)): #0 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b745e9c75ba in qpid::sys::Timer::run (this=0xffffffffffffff92) at ../include/qpid/sys/posix/Condition.h:69 #2 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable ( p=0x166d3494) at qpid/sys/posix/Thread.cpp:35 #3 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x438c8940 (LWP 27467)): #0 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b745e8f24c1 in qpid::sys::Poller::wait (this=0x1669d4f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002b745e8f2f57 in qpid::sys::Poller::run (this=0x1669d4f0) at qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable (p=0x8) at qpid/sys/posix/Thread.cpp:35 #4 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b745ec5efe0 (LWP 27457)): #0 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b745e8f24c1 in qpid::sys::Poller::wait (this=0x1669d4f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002b745e8f2f57 in qpid::sys::Poller::run (this=0x1669d4f0) at qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002b745e433462 in qpid::broker::Broker::run ( this=<value optimized out>) at qpid/broker/Broker.cpp:344 #4 0x0000000000406acc in QpiddBroker::execute (this=<value optimized out>, options=0x1669c790) at posix/QpiddBroker.cpp:187 #5 0x000000000040559f in main (argc=9, argv=0x7fff34af8c78) at qpidd.cpp:80 (gdb) Detaching from program: /usr/sbin/qpidd, process 27457 (gdb) quit This issue is reliably reproduced on RHEL 5.6 x86_64 (KVM). Version-Release number of selected component (if applicable): openais-0.80.6-28.el5 python-qmf-0.7.946106-14.el5 python-qpid-0.7.946106-15.el5 qmf-0.7.946106-27.el5 qmf-devel-0.7.946106-27.el5 qpid-cpp-client-0.7.946106-27.el5 qpid-cpp-client-devel-0.7.946106-27.el5 qpid-cpp-client-devel-docs-0.7.946106-27.el5 qpid-cpp-client-ssl-0.7.946106-27.el5 qpid-cpp-mrg-debuginfo-0.7.946106-27.el5 qpid-cpp-server-0.7.946106-27.el5 qpid-cpp-server-cluster-0.7.946106-27.el5 qpid-cpp-server-devel-0.7.946106-27.el5 qpid-cpp-server-ssl-0.7.946106-27.el5 qpid-cpp-server-store-0.7.946106-27.el5 qpid-cpp-server-xml-0.7.946106-27.el5 qpid-java-client-0.7.946106-14.el5 qpid-java-common-0.7.946106-14.el5 qpid-java-example-0.7.946106-14.el5 qpid-tools-0.7.946106-12.el5 How reproducible: ~ 50% Steps to Reproduce: 1. configure qpidd, configure openais 2. run the attached reproducer ./bz667428.sh 3 3. ps auxw | grep qpidd Actual results: qpidd reaches easily unresponsive state. Expected results: qpidd should not reach unresponsive state. Additional info: [root@dhcp-26-233 bz667428]# ps auxw | grep qpidd root 27457 0.4 4.2 207016 16128 pts/2 Sl 17:22 0:00 qpidd --auth=no --truncate yes --cluster-name=bz667428_at_dhcp-26-233.brq.redhat.com --log-enable=debug+ -p 10000 --data-dir=./dd10000 root 27673 0.0 0.1 61228 748 pts/2 R+ 17:24 0:00 grep qpidd [root@dhcp-26-233 bz667428]# qpid-config [root@dhcp-26-233 bz667428]# netstat -nlp | grep qpidd tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27457/qpidd Failed: error: (111, 'Connection refused') [root@dhcp-26-233 bz667428]# qpid-cluster Failed: error - (111, 'Connection refused') [root@dhcp-26-233 bz667428]# dump_process 27457 [root@dhcp-26-233 bz667428]# cat dump_pid_27457.20110124_172448.qpidd GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-32.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. (gdb) Attaching to process 27457 Reading symbols from /usr/sbin/qpidd...Reading symbols from /usr/lib/debug/usr/sbin/qpidd.debug...done. (gdb) Using memory regions provided by the target. There are no memory regions defined. (gdb) From To Syms Read Shared Object Library 0x00002b745e3bf760 0x00002b745e5508b8 Yes (*) /usr/lib64/libqpidbroker.so.3 0x00002b745e8d4680 0x00002b745e9d3458 Yes (*) /usr/lib64/libqpidcommon.so.3 0x0000003132807820 0x0000003132814e58 Yes (*) /usr/lib64/libqpidtypes.so.1 0x0000003c78a10aa0 0x0000003c78a2dae8 Yes (*) /usr/lib64/libboost_program_options.so.2 0x0000003c78e04810 0x0000003c78e0cff8 Yes (*) /usr/lib64/libboost_filesystem.so.2 0x000000311ac01500 0x000000311ac02918 Yes (*) /lib64/libuuid.so.1 0x0000003116c00e10 0x0000003116c01a08 Yes (*) /lib64/libdl.so.2 0x0000003118002220 0x0000003118005d98 Yes (*) /lib64/librt.so.1 0x0000003c79c046e0 0x0000003c79c13be8 Yes (*) /usr/lib64/libsasl2.so.2 0x000000311b84f430 0x000000311b8c30b8 Yes (*) /usr/lib64/libstdc++.so.6 0x0000003117003e60 0x0000003117043e38 Yes (*) /lib64/libm.so.6 0x0000003119401e50 0x000000311940b018 Yes (*) /lib64/libgcc_s.so.1 0x000000311681d780 0x000000311690a3a8 Yes (*) /lib64/libc.so.6 0x0000003116400a70 0x00000031164166ce Yes (*) /lib64/ld-linux-x86-64.so.2 0x00000031174051f0 0x0000003117410258 Yes (*) /lib64/libpthread.so.0 0x0000003c792032a0 0x0000003c7920e2d8 Yes (*) /lib64/libresolv.so.2 0x0000003119c009f0 0x0000003119c06918 Yes (*) /lib64/libcrypt.so.1 0x00002b745eca47f0 0x00002b745ed14388 Yes (*) /usr/lib64/qpid/daemon/cluster.so 0x0000003c78601430 0x0000003c78603538 Yes (*) /usr/lib64/openais/libcpg.so.2 0x00002b745ef4f110 0x00002b745ef51b98 Yes (*) /usr/lib64/libcman.so.2 0x00002b745f191350 0x00002b745f205428 Yes (*) /usr/lib64/libqpidclient.so.3 0x00002b745f436830 0x00002b745f440608 Yes (*) /usr/lib64/qpid/daemon/xml.so 0x00002b745f7c9070 0x00002b745f94a758 Yes (*) /usr/lib64/libxerces-c.so.28 0x00002b745fddf090 0x00002b745ff6ab28 Yes (*) /usr/lib64/libxqilla.so.3 0x00002b74602b0f20 0x00002b74602d1318 Yes (*) /usr/lib64/qpid/daemon/acl.so 0x00002b74604e45f0 0x00002b74604ec838 Yes (*) /usr/lib64/qpid/daemon/ssl.so 0x00002b746070f640 0x00002b7460726098 Yes (*) /usr/lib64/libsslcommon.so.3 0x000000311b018310 0x000000311b0f7288 Yes (*) /usr/lib64/libnss3.so 0x0000003117c08540 0x0000003117c2b868 Yes (*) /usr/lib64/libssl3.so 0x00002b7460939eb0 0x00002b7460958568 Yes (*) /usr/lib64/libnspr4.so 0x000000311a008500 0x000000311a012e18 Yes (*) /usr/lib64/libnssutil3.so 0x000000311a801370 0x000000311a802968 Yes (*) /usr/lib64/libplc4.so 0x000000311a400e40 0x000000311a401c78 Yes (*) /usr/lib64/libplds4.so 0x0000003117801fd0 0x000000311780cac8 Yes (*) /usr/lib64/libz.so.1 0x00002b7460b6d5e0 0x00002b7460b70b78 Yes (*) /usr/lib64/qpid/daemon/watchdog.so 0x00002b7460d78680 0x00002b7460d7cc48 Yes (*) /usr/lib64/qpid/daemon/replication_exchange.so 0x00002b7460fb7d70 0x00002b7461045528 Yes (*) /usr/lib64/qpid/daemon/msgstore.so 0x00002b74612ab5d0 0x00002b7461364428 Yes (*) /usr/lib64/libdb_cxx-4.3.so 0x00002b746158f510 0x00002b746158f6d1 Yes (*) /usr/lib64/libaio.so.1 0x00002b7461796b70 0x00002b746179c898 Yes (*) /usr/lib64/qpid/daemon/replicating_listener.so (*): Shared library is missing debugging information. (gdb) 4 Thread 0x424c6940 (LWP 27463) 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x42ec7940 (LWP 27465) 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x438c8940 (LWP 27467) 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 * 1 Thread 0x2b745ec5efe0 (LWP 27457) 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 (gdb) Thread 4 (Thread 0x424c6940 (LWP 27463)): #0 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b745e9c75ba in qpid::sys::Timer::run (this=0xffffffffffffff92) at ../include/qpid/sys/posix/Condition.h:69 #2 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable ( p=0x166a4014) at qpid/sys/posix/Thread.cpp:35 #3 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x42ec7940 (LWP 27465)): #0 0x000000311740b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00002b745e9c75ba in qpid::sys::Timer::run (this=0xffffffffffffff92) at ../include/qpid/sys/posix/Condition.h:69 #2 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable ( p=0x166d3494) at qpid/sys/posix/Thread.cpp:35 #3 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #4 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x438c8940 (LWP 27467)): #0 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b745e8f24c1 in qpid::sys::Poller::wait (this=0x1669d4f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002b745e8f2f57 in qpid::sys::Poller::run (this=0x1669d4f0) at qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002b745e8e91aa in qpid::sys::(anonymous namespace)::runRunnable (p=0x8) at qpid/sys/posix/Thread.cpp:35 #4 0x000000311740673d in start_thread () from /lib64/libpthread.so.0 #5 0x00000031168d40cd in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2b745ec5efe0 (LWP 27457)): #0 0x00000031168d44b8 in epoll_wait () from /lib64/libc.so.6 #1 0x00002b745e8f24c1 in qpid::sys::Poller::wait (this=0x1669d4f0, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:563 #2 0x00002b745e8f2f57 in qpid::sys::Poller::run (this=0x1669d4f0) at qpid/sys/epoll/EpollPoller.cpp:515 #3 0x00002b745e433462 in qpid::broker::Broker::run ( this=<value optimized out>) at qpid/broker/Broker.cpp:344 #4 0x0000000000406acc in QpiddBroker::execute (this=<value optimized out>, options=0x1669c790) at posix/QpiddBroker.cpp:187 #5 0x000000000040559f in main (argc=9, argv=0x7fff34af8c78) at qpidd.cpp:80 (gdb) Detaching from program: /usr/sbin/qpidd, process 27457 (gdb) quit /etc/qpidd.conf: cluster-mechanism=ANONYMOUS cluster-name=freznice-virt-cluster #log-enable=info+ log-enable=trace+ log-to-file=/tmp/qpidd.log mgmt-pub-interval=1
Created attachment 474987 [details] The issue reproducer and logs This is the script which reproduces the problem. Dependencies: openais configured and running. java-openjdk 1.6 installed main + devel qpid-java-* pkgs installed python-qpid installed How to run: ./bz667428 3 Then do: ps auxw | grep qpidd netstat -nlp | grep qpidd qpid-config qpid-cluster and you should see something like this: [root@dhcp-26-233 bz667428]# ps auxw | grep qpidd root 27457 0.4 4.2 207016 16128 pts/2 Sl 17:22 0:00 qpidd --auth=no --truncate yes --cluster-name=bz667428_at_dhcp-26-233.brq.redhat.com --log-enable=debug+ -p 10000 --data-dir=./dd10000 root 27673 0.0 0.1 61228 748 pts/2 R+ 17:24 0:00 grep qpidd [root@dhcp-26-233 bz667428]# netstat -nlp | grep qpidd tcp 0 0 0.0.0.0:10000 0.0.0.0:* LISTEN 27457/qpidd [root@dhcp-26-233 bz667428]# qpid-config Failed: error: (111, 'Connection refused') [root@dhcp-26-233 bz667428]# qpid-cluster Failed: error - (111, 'Connection refused')
In comment 1, you can see from the ps and output that the broker that is still running is listening on port 10000. There is no broker listening on the default port 5672. So it is normal to get "connection refused" when running qpid-config or qpid-cluster with no arguments, as they will try to connect to 5672. However these work fine for me: qpid-config -a localhost:10000 qpid-cluster -a localhost:10000 I repeated about 15 times and didn't see a non-responsiveness problem. Am I missing something?
I apologize, this is indeed correct and I concur. The defect has no content and I'm going to close it as NOT-A-BUG.