Red Hat Bugzilla – Bug 510241
clustered qpidd crash in qpid::sys::Poller::run()
Last modified: 2015-11-15 20:11:37 EST
Created attachment 350930 [details] reproducer Description of problem: During BZ 506758 validation there was seen this crash in qpid::sys::Poller::run(). Seen on RHEL 5.3 x86_64 on an Quad-Core AMD Opteron(tm) Processor 2376 Version-Release number of selected component (if applicable): [root@mrg-qe-02 bz506758]# rpm -qa | egrep '(qpid|rhm|openais)' | sort -u openais-0.80.3-22.el5_3.8 openais-debuginfo-0.80.3-22.el5_3.8 python-qpid-0.5.752581-3.el5 qpidc-0.5.752581-22.el5 qpidc-debuginfo-0.5.752581-22.el5 qpidc-devel-0.5.752581-22.el5 qpidc-perftest-0.5.752581-22.el5 qpidc-rdma-0.5.752581-22.el5 qpidc-ssl-0.5.752581-22.el5 qpidd-0.5.752581-22.el5 qpidd-acl-0.5.752581-22.el5 qpidd-cluster-0.5.752581-22.el5 qpidd-devel-0.5.752581-22.el5 qpid-dotnet-0.4.738274-2.el5 qpidd-rdma-0.5.752581-22.el5 qpidd-ssl-0.5.752581-22.el5 qpidd-xml-0.5.752581-22.el5 qpid-java-client-0.5.751061-8.el5 qpid-java-common-0.5.751061-8.el5 rhm-0.5.3206-5.el5 rhm-docs-0.5.756148-1.el5 How reproducible: very hard (<1%) Steps to Reproduce: 0. install and set-up openais 1. run attached reproducer ./run.sh 5 100 (5 node[s] cluster, 100 instances of subscribe running in parallel) 2. wait for crash Actual results: Crash Expected results: No crash Additional info (threaded backtrace): GNU gdb Fedora (6.8-27.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... Reading symbols from /usr/lib64/libqpidbroker.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib64/libqpidbroker.so.0 Reading symbols from /usr/lib64/libqpidcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidcommon.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib64/libqpidcommon.so.0 Reading symbols from /usr/lib64/libboost_program_options.so.2...done. Loaded symbols for /usr/lib64/libboost_program_options.so.2 Reading symbols from /usr/lib64/libboost_filesystem.so.2...done. Loaded symbols for /usr/lib64/libboost_filesystem.so.2 Reading symbols from /lib64/libuuid.so.1...done. Loaded symbols for /lib64/libuuid.so.1 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/librt.so.1...done. Loaded symbols for /lib64/librt.so.1 Reading symbols from /usr/lib64/libsasl2.so.2...done. Loaded symbols for /usr/lib64/libsasl2.so.2 Reading symbols from /usr/lib64/libstdc++.so.6...done. Loaded symbols for /usr/lib64/libstdc++.so.6 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libgcc_s.so.1...done. Loaded symbols for /lib64/libgcc_s.so.1 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /lib64/libpthread.so.0...done. Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 Reading symbols from /usr/lib64/qpid/daemon/replicating_listener.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/replicating_listener.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/replicating_listener.so Reading symbols from /usr/lib64/qpid/daemon/rdma.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/rdma.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/rdma.so Reading symbols from /usr/lib64/librdmawrap.so.0...Reading symbols from /usr/lib/debug/usr/lib64/librdmawrap.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib64/librdmawrap.so.0 Reading symbols from /usr/lib64/librdmacm.so.1...done. Loaded symbols for /usr/lib64/librdmacm.so.1 Reading symbols from /usr/lib64/libibverbs.so.1...done. Loaded symbols for /usr/lib64/libibverbs.so.1 Reading symbols from /usr/lib64/qpid/daemon/cluster.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/cluster.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/cluster.so Reading symbols from /usr/lib64/openais/libcpg.so.2...Reading symbols from /usr/lib/debug/usr/lib64/openais/libcpg.so.2.0.0.debug...done. done. Loaded symbols for /usr/lib64/openais/libcpg.so.2 Reading symbols from /usr/lib64/libcman.so.2...done. Loaded symbols for /usr/lib64/libcman.so.2 Reading symbols from /usr/lib64/libqpidclient.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidclient.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib64/libqpidclient.so.0 Reading symbols from /usr/lib64/qpid/client/rdmaconnector.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/client/rdmaconnector.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/client/rdmaconnector.so Reading symbols from /usr/lib64/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/client/sslconnector.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/client/sslconnector.so Reading symbols from /usr/lib64/libsslcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libsslcommon.so.0.1.0.debug...done. done. Loaded symbols for /usr/lib64/libsslcommon.so.0 Reading symbols from /usr/lib64/libnss3.so...done. Loaded symbols for /usr/lib64/libnss3.so Reading symbols from /usr/lib64/libssl3.so...done. Loaded symbols for /usr/lib64/libssl3.so Reading symbols from /usr/lib64/libnspr4.so...done. Loaded symbols for /usr/lib64/libnspr4.so Reading symbols from /usr/lib64/libnssutil3.so...done. Loaded symbols for /usr/lib64/libnssutil3.so Reading symbols from /usr/lib64/libplc4.so...done. Loaded symbols for /usr/lib64/libplc4.so Reading symbols from /usr/lib64/libplds4.so...done. Loaded symbols for /usr/lib64/libplds4.so Reading symbols from /usr/lib64/qpid/daemon/replication_exchange.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/replication_exchange.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/replication_exchange.so Reading symbols from /usr/lib64/qpid/daemon/xml.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/xml.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/xml.so Reading symbols from /usr/lib64/libxerces-c.so.28...done. Loaded symbols for /usr/lib64/libxerces-c.so.28 Reading symbols from /usr/lib64/libxqilla.so.3...done. Loaded symbols for /usr/lib64/libxqilla.so.3 Reading symbols from /usr/lib64/qpid/daemon/acl.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/acl.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/acl.so Reading symbols from /usr/lib64/qpid/daemon/ssl.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/ssl.so.debug...done. done. Loaded symbols for /usr/lib64/qpid/daemon/ssl.so Reading symbols from /usr/lib64/qpid/daemon/msgstore.so...done. Loaded symbols for /usr/lib64/qpid/daemon/msgstore.so Reading symbols from /usr/lib64/libdb_cxx-4.3.so...done. Loaded symbols for /usr/lib64/libdb_cxx-4.3.so Reading symbols from /usr/lib64/libaio.so.1...done. Loaded symbols for /usr/lib64/libaio.so.1 Reading symbols from /usr/lib64/sasl2/libplain.so.2...done. Loaded symbols for /usr/lib64/sasl2/libplain.so.2 Reading symbols from /usr/lib64/sasl2/libsasldb.so.2...done. Loaded symbols for /usr/lib64/sasl2/libsasldb.so.2 Reading symbols from /usr/lib64/sasl2/libanonymous.so.2...done. Loaded symbols for /usr/lib64/sasl2/libanonymous.so.2 Reading symbols from /usr/lib64/sasl2/liblogin.so.2...done. Loaded symbols for /usr/lib64/sasl2/liblogin.so.2 Core was generated by `qpidd -p 5672 --auth no --log-enable info+ --cluster-name mrg-qe-02.lab.eng.brq'. Program terminated with signal 11, Segmentation fault. [New process 25133] [New process 25132] [New process 25131] [New process 25130] [New process 25129] [New process 25128] [New process 25127] [New process 25126] [New process 25125] [New process 25123] [New process 25122] [New process 25121] [New process 25120] #0 0x0000000002058740 in ?? () (gdb) Thread 13 (process 25120): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030f04c941e in qpid::broker::Broker::run (this=<value optimized out>) at qpid/broker/Broker.cpp:319 #4 0x00000000004069b8 in QpiddBroker::execute (this=<value optimized out>, options=0x1241d30) at posix/QpiddBroker.cpp:166 #5 0x00000000004054a8 in main (argc=11, argv=0x7fffed422d38) at qpidd.cpp:77 Thread 12 (process 25121): #0 0x00000036e100ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000030f0591379 in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69 #2 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #3 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #4 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 11 (process 25122): #0 0x00000036e100ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000030f0591379 in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69 #2 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #3 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #4 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 10 (process 25123): #0 0x00000036e100ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000030f0591379 in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69 #2 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #3 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #4 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 9 (process 25125): #0 0x00000036e100ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00000030f0591379 in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69 #2 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #3 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #4 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 8 (process 25126): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 7 (process 25127): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 6 (process 25128): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 5 (process 25129): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 4 (process 25130): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 3 (process 25131): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 2 (process 25132): #0 0x00000036e04d3498 in epoll_wait () from /lib64/libc.so.6 #1 0x00000030eff7d0dd in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:439 #2 0x00000030eff7dc87 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:405 #3 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #4 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #5 0x00000036e04d30ad in clone () from /lib64/libc.so.6 Thread 1 (process 25133): #0 0x0000000002058740 in ?? () #1 0x00000030eff7dcb3 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/Poller.h:122 #2 0x00000030eff73cea in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35 #3 0x00000036e1006367 in start_thread () from /lib64/libpthread.so.0 #4 0x00000036e04d30ad in clone () from /lib64/libc.so.6 (gdb) quit [09:08:45] get_cpu_info():CPU information: processor : 0 1 2 3 4 5 6 7 vendor_id : AuthenticAMD model name : Quad-Core AMD Opteron(tm) Processor 2376 cpu MHz : 800.000 cpu cores : 4 bogomips : 4592.50 4588.47 4588.55 4588.32 4589.26 4587.98 4588.54 4590.52 [09:08:45] Memory info: total used free shared buffers cached Mem: 8247168 549744 7697424 0 30376 361600 -/+ buffers/cache: 157768 8089400 Swap: 10289144 0 10289144
This is believed to have been fixed by changes in the 1.3 rebase (as part of the cleanup of deletion of dispatch handles) and we are requesting verification of that.
VERIFIED on RHEL 5.5 both i386 / x86_64: (tested for over 4days) # rpm -qa | grep qpid | sort -u python-qpid-0.7.946106-14.el5 qpid-cpp-client-0.7.946106-15.el5 qpid-cpp-client-devel-0.7.946106-15.el5 qpid-cpp-client-devel-docs-0.7.946106-15.el5 qpid-cpp-client-ssl-0.7.946106-15.el5 qpid-cpp-mrg-debuginfo-0.7.946106-15.el5 qpid-cpp-server-0.7.946106-15.el5 qpid-cpp-server-cluster-0.7.946106-15.el5 qpid-cpp-server-devel-0.7.946106-15.el5 qpid-cpp-server-ssl-0.7.946106-15.el5 qpid-cpp-server-store-0.7.946106-15.el5 qpid-cpp-server-xml-0.7.946106-15.el5 qpid-java-client-0.7.946106-9.el5 qpid-java-common-0.7.946106-9.el5 qpid-tests-0.7.946106-1.el5 qpid-tools-0.7.946106-10.el5 --> VERIFIED
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The clustered qpidd service no longer terminates unexpectedly in qpid::sys::Poller::run().
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html