Bug 492327

Summary: clustered broker crashes in DispatchHandle III (triggered by failover soak)
Product: Red Hat Enterprise MRG Reporter: Frantisek Reznicek <freznice>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED DUPLICATE QA Contact: Frantisek Reznicek <freznice>
Severity: high Docs Contact:
Priority: high    
Version: 1.1CC: esammons, gsim
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-05 14:30:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 487706    
Attachments:
Description Flags
mrg8 persistant failover soak core backtraces
none
dhcp-lab-200 persistant failover soak core backtraces none

Description Frantisek Reznicek 2009-03-26 14:04:53 UTC
Description of problem:
The failover soak soak test running on RHEL 5.3 x86_64 (dell-pesc1425-01)
triggers qpidd trash in DispatchHandle very similar to bug 490457.

The crash was detected after approx 1000 runs of the failover soak test 
(last run exit code was 8), which can be found here
 https://bugzilla.redhat.com/attachment.cgi?id=335676  (part of bug 490855).


Version-Release number of selected component (if applicable):
[root@dell-pesc1425-01 fsoak]# rpm -qa | egrep '(qpid|rhm|openai)' | sort -u
openais-0.80.3-22.el5_3.3
openais-debuginfo-0.80.3-22.el5_3.3
openais-devel-0.80.3-22.el5_3.3
python-qpid-0.5.752581-1.el5
qpidc-0.5.752581-1.el5
qpidc-debuginfo-0.5.752581-1.el5
qpidc-devel-0.5.752581-1.el5
qpidc-perftest-0.5.752581-1.el5
qpidc-rdma-0.5.752581-1.el5
qpidc-ssl-0.5.752581-1.el5
qpidd-0.5.752581-1.el5
qpidd-acl-0.5.752581-1.el5
qpidd-cluster-0.5.752581-1.el5
qpidd-devel-0.5.752581-1.el5
qpidd-rdma-0.5.752581-1.el5
qpidd-ssl-0.5.752581-1.el5
qpidd-xml-0.5.752581-1.el5
qpid-java-client-0.5.751061-1.el5
qpid-java-common-0.5.751061-1.el5
rhm-0.5.3153-1.el5
rhm-docs-0.5.756148-1.el5


How reproducible:
quite hard to reproduce (1 occurence withing 1010 runs on dell-pesc1425-01.rhts.bos.redhat.com)

Steps to Reproduce:
1. install and set up and start openais
2. ulimit -c unlimited
3. extract the fsoak.tar.bz2
4. compile clients using fsoak/run_fs.sh
5. launch using fsoak/run_failover_soak
  
Actual results:
qpidd crashes.

Expected results:
qpidd should not crash.

Additional info: (Backtrace below)

root@dell-pesc1425-01 fsoak]# gdb `which qpidd` core.4166
GNU gdb Fedora (6.8-27.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Reading symbols from /usr/lib64/libqpidbroker.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidbroker.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib64/libqpidbroker.so.0
Reading symbols from /usr/lib64/libqpidcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidcommon.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib64/libqpidcommon.so.0
Reading symbols from /usr/lib64/libboost_program_options.so.2...done.
Loaded symbols for /usr/lib64/libboost_program_options.so.2
Reading symbols from /usr/lib64/libboost_filesystem.so.2...done.
Loaded symbols for /usr/lib64/libboost_filesystem.so.2
Reading symbols from /lib64/libuuid.so.1...done.
Loaded symbols for /lib64/libuuid.so.1
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/librt.so.1...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /usr/lib64/libsasl2.so.2...done.
Loaded symbols for /usr/lib64/libsasl2.so.2
Reading symbols from /usr/lib64/libstdc++.so.6...done.
Loaded symbols for /usr/lib64/libstdc++.so.6
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libgcc_s.so.1...done.
Loaded symbols for /lib64/libgcc_s.so.1
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libpthread.so.0...done.
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
Reading symbols from /usr/lib64/qpid/daemon/cluster.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/daemon/cluster.so.debug...done.
done.
Loaded symbols for /usr/lib64/qpid/daemon/cluster.so
Reading symbols from /usr/lib64/openais/libcpg.so.2...Reading symbols from /usr/lib/debug/usr/lib64/openais/libcpg.so.2.0.0.debug...done.
done.
Loaded symbols for /usr/lib64/openais/libcpg.so.2
Reading symbols from /usr/lib64/libcman.so.2...done.
Loaded symbols for /usr/lib64/libcman.so.2
Reading symbols from /usr/lib64/libqpidclient.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libqpidclient.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib64/libqpidclient.so.0
Reading symbols from /usr/lib64/qpid/client/rdmaconnector.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/client/rdmaconnector.so.debug...done.
done.
Loaded symbols for /usr/lib64/qpid/client/rdmaconnector.so
Reading symbols from /usr/lib64/librdmawrap.so.0...Reading symbols from /usr/lib/debug/usr/lib64/librdmawrap.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib64/librdmawrap.so.0
Reading symbols from /usr/lib64/librdmacm.so.1...done.
Loaded symbols for /usr/lib64/librdmacm.so.1
Reading symbols from /usr/lib64/libibverbs.so.1...done.
Loaded symbols for /usr/lib64/libibverbs.so.1
Reading symbols from /usr/lib64/qpid/client/sslconnector.so...Reading symbols from /usr/lib/debug/usr/lib64/qpid/client/sslconnector.so.debug...done.
done.
Loaded symbols for /usr/lib64/qpid/client/sslconnector.so
Reading symbols from /usr/lib64/libsslcommon.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libsslcommon.so.0.1.0.debug...done.
done.
Loaded symbols for /usr/lib64/libsslcommon.so.0
Reading symbols from /usr/lib64/libnss3.so...done.
Loaded symbols for /usr/lib64/libnss3.so
Reading symbols from /usr/lib64/libssl3.so...done.
Loaded symbols for /usr/lib64/libssl3.so
Reading symbols from /usr/lib64/libnspr4.so...done.
Loaded symbols for /usr/lib64/libnspr4.so
Reading symbols from /usr/lib64/libnssutil3.so...done.
Loaded symbols for /usr/lib64/libnssutil3.so
Reading symbols from /usr/lib64/libplc4.so...done.
Loaded symbols for /usr/lib64/libplc4.so
Reading symbols from /usr/lib64/libplds4.so...done.
Loaded symbols for /usr/lib64/libplds4.so
Core was generated by `qpidd --no-module-dir --load-module /usr/lib64/qpid/daemon/cluster.so --cluster'.
Program terminated with signal 6, Aborted.
[New process 4173]
[New process 4172]
[New process 4171]
[New process 4170]
[New process 4168]
[New process 4167]
[New process 4166]
#0  0x0000003c13030215 in raise () from /lib64/libc.so.6
(gdb) thread apply all backtrace

Thread 7 (process 4166):
#0  0x0000003c130d3498 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003c16172e8d in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>)
    at qpid/sys/epoll/EpollPoller.cpp:432
#2  0x0000003c16173c67 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:398
#3  0x0000003c166ccb86 in qpid::broker::Broker::run (this=<value optimized out>) at qpid/broker/Broker.cpp:319
#4  0x0000000000406948 in QpiddBroker::execute (this=<value optimized out>, options=0x61c0740) at posix/QpiddBroker.cpp:165
#5  0x0000000000405438 in main (argc=15, argv=0x7fff34280698) at qpidd.cpp:77

Thread 6 (process 4167):
#0  0x0000003c1380ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000003c16788e6f in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69
#2  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#3  0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003c130d30ad in clone () from /lib64/libc.so.6

Thread 5 (process 4168):
#0  0x0000003c1380ab00 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000003c16788e6f in qpid::broker::Timer::run (this=<value optimized out>) at qpid/sys/posix/Condition.h:69
#2  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#3  0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
#4  0x0000003c130d30ad in clone () from /lib64/libc.so.6

Thread 4 (process 4170):
#0  0x0000003c130d3498 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003c16172e8d in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>)
    at qpid/sys/epoll/EpollPoller.cpp:432
#2  0x0000003c16173c67 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:398
#3  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#4  0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003c130d30ad in clone () from /lib64/libc.so.6

Thread 3 (process 4171):
#0  0x0000003c130d3498 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003c16172e8d in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>)
    at qpid/sys/epoll/EpollPoller.cpp:432
#2  0x0000003c16173c67 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:398
#3  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#4  0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#5  0x0000003c130d30ad in clone () from /lib64/libc.so.6

Thread 2 (process 4172):
#0  0x0000003c130d3498 in epoll_wait () from /lib64/libc.so.6
#1  0x0000003c16172e8d in qpid::sys::Poller::wait (this=<value optimized out>, timeout=<value optimized out>)
    at qpid/sys/epoll/EpollPoller.cpp:432
#2  0x0000003c16173c67 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/epoll/EpollPoller.cpp:398
#3  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#4  0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
#5  0x0000003c130d30ad in clone () from /lib64/libc.so.6

Thread 1 (process 4173):
#0  0x0000003c13030215 in raise () from /lib64/libc.so.6
#1  0x0000003c13031cc0 in abort () from /lib64/libc.so.6
#2  0x0000003c140bec44 in __gnu_cxx::__verbose_terminate_handler () from /usr/lib64/libstdc++.so.6
#3  0x0000003c140bcdb6 in ?? () from /usr/lib64/libstdc++.so.6
#4  0x0000003c140bcde3 in std::terminate () from /usr/lib64/libstdc++.so.6
#5  0x0000003c140bceca in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6  0x0000003c161c0eee in ~ScopedLock (this=<value optimized out>) at qpid/sys/posix/Mutex.h:120
#7  0x0000003c161c07b6 in qpid::sys::DispatchHandle::processEvent (this=<value optimized out>, type=<value optimized out>)
    at qpid/sys/DispatchHandle.cpp:420
#8  0x0000003c16173c93 in qpid::sys::Poller::run (this=<value optimized out>) at qpid/sys/Poller.h:122
#9  0x0000003c1616ac4a in runRunnable (p=<value optimized out>) at qpid/sys/posix/Thread.cpp:35
#10 0x0000003c13806367 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003c130d30ad in clone () from /lib64/libc.so.6

Comment 1 Frantisek Reznicek 2009-04-17 08:51:12 UTC
Created attachment 339970 [details]
mrg8 persistant failover soak core backtraces

Comment 2 Frantisek Reznicek 2009-04-17 08:52:40 UTC
Created attachment 339971 [details]
dhcp-lab-200 persistant failover soak core backtraces

Comment 3 Andrew Stitcher 2009-05-05 14:30:00 UTC

*** This bug has been marked as a duplicate of bug 490457 ***