Bug 1712228
| Summary: | sporadic qpidd segfault during shutdown while attaching new link | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Pavel Moravec <pmoravec> |
| Component: | qpid-cpp | Assignee: | Mike Cressman <mcressma> |
| Status: | CLOSED ERRATA | QA Contact: | Messaging QE <messaging-qe-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.2 | CC: | gsim, jfrancin, jross, mcressma, zkraus |
| Target Milestone: | 3.2.13 | Keywords: | OtherQA |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | qpid-cpp-1.36.0-22 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-07-15 07:54:52 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Perspective patch: during the shutdown, cancel all timers first? (In reply to Pavel Moravec from comment #1) > Perspective patch: during the shutdown, cancel all timers first? +1, that change now committed upstream (see QPID-8321). This also fixes https://bugzilla.redhat.com/show_bug.cgi?id=1624896 Testing scratch build http://brew-task-repos.usersys.redhat.com/repos/scratch/mcressma/qpid-cpp/1.36.0/22.el6/qpid-cpp-1.36.0-22.el6-scratch.repo : no segfault on this build within 2 hours test. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:1770 |
Description of problem: When shutting down a broker that is just about to attach a link to a federated broker, a concurrency bug can cause segfault. Backtraces (see Additional info) show the problem is with concurrent manipulation of qpid::sys::HandleSet where new link adds an entry to the set at the same time when broker shutdown clears all the set content. The segfault is quite rare (I reproduced it once upon few hours, customer with 2% probability) and happens probably only during shutdown. Version-Release number of selected component (if applicable): qpid-cpp-server-1.36.0-21.el6.x86_64 (but expected in any version) How reproducible: low probability Steps to Reproduce: 1) Add to /etc/hosts: 127.0.0.1 test-1 test-2 test-3 test-4 test-5 test-6 test-7 test-8 test-9 test-0 2) Run this script to (A) start one broker accepting just SSL connections, and (B) repeatedly restart 2nd broker creating many links to this broker (all of them will fail due to noSSL and some re-attach attempt will happen during this broker shutdown at the "right" ztime): killall qpidd sleep 1 commonOptions="--log-to-stderr=no --log-to-stdout=no --log-enable=critical+" qpidd --require-encryption=yes --ssl-require-client-authentication=yes --ssl-port=5671 --log-to-file=qpidd.log $commonOptions & while true; do echo "$(date): starting 2nd broker.." qpidd --worker-threads 12 --port=5673 --pid-dir ~/.qpidd5673 --data-dir ~/.qpidd5673 --store-dir ~/.qpidd5673 --log-to-file=qpidd.5673.log --link-maintenance-interval 0.01 $commonOptions & pid=$! sleep 1 echo "$(date): adding some links.." for i in $(seq 0 9); do qpid-route link add localhost:5673 test-${i}:5672 & done while [ $(ps aux | grep python | grep -c qpid-route) -gt 0 ]; do sleep 1 done sleep 10 echo "$(date): killing 2nd broker.." kill $pid sleep 5 done Actual results: Sometimes, coredump is generated during shutdown of the broker. Expected results: No shutdown. Additional info: Backtraces: Program terminated with signal 11, Segmentation fault. #0 std::_Rb_tree_insert_and_rebalance (__insert_left=<value optimized out>, __x=0x7fdd4c0164a0, __p=<value optimized out>, __header=...) at ../../../../libstdc++-v3/src/tree.cc:248 248 __root->_M_color = _S_black; Missing separate debuginfos, use: debuginfo-install libibverbs-1.1.8-4.el6.x86_64 librdmacm-1.0.21-0.el6.x86_64 xerces-c-3.0.1-20.el6.x86_64 xqilla-2.2.3-8.el6.x86_64 (gdb) bt #0 std::_Rb_tree_insert_and_rebalance (__insert_left=<value optimized out>, __x=0x7fdd4c0164a0, __p=<value optimized out>, __header=...) at ../../../../libstdc++-v3/src/tree.cc:248 #1 0x00007fdd8584b894 in std::_Rb_tree<qpid::sys::PollerHandle*, qpid::sys::PollerHandle*, std::_Identity<qpid::sys::PollerHandle*>, std::less<qpid::sys::PollerHandle*>, std::allocator<qpid::sys::PollerHandle*> >::_M_insert_ (this=0x746400, __x=<value optimized out>, __p=0x7fdd4c003710, __v=@0x7fdd80c46878) at /usr/include/c++/4.4.7/bits/stl_tree.h:883 #2 0x00007fdd8584a753 in _M_insert_unique (this=0x7463d8, h=0x7fdd4cf55368) at /usr/include/c++/4.4.7/bits/stl_tree.h:1182 #3 insert (this=0x7463d8, h=0x7fdd4cf55368) at /usr/include/c++/4.4.7/bits/stl_set.h:411 #4 qpid::sys::HandleSet::add (this=0x7463d8, h=0x7fdd4cf55368) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:173 #5 0x00007fdd8584a839 in qpid::sys::Poller::registerHandle (this=0x73bec0, handle=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:341 #6 0x00007fdd858ab7df in qpid::sys::DispatchHandle::startWatch (this=0x7fdd4cf55368, poller0=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/DispatchHandle.cpp:55 #7 0x00007fdd85821e8e in qpid::sys::posix::AsynchConnector::start (this=<value optimized out>, poller=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/AsynchIO.cpp:188 #8 0x00007fdd85eb319f in qpid::sys::SocketConnector::connect (this=<value optimized out>, poller=<value optimized out>, name=<value optimized out>, host="test-1", port="5672", fact= 0x7fdd80c46bb0, failed=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/SocketTransport.cpp:212 #9 0x00007fdd85d6e58c in qpid::broker::Broker::connect (this=<value optimized out>, name="qpid.tcp:test-1:5672", host="test-1", port="5672", transport=<value optimized out>, f=0x745f98, failed=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Broker.cpp:1367 #10 0x00007fdd85d6e874 in qpid::broker::Broker::connect (this=0x745820, name="qpid.tcp:test-1:5672", host="test-1", port=<value optimized out>, transport=<value optimized out>, failed=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Broker.cpp:1358 #11 0x00007fdd85e07c00 in qpid::broker::Link::startConnectionLH (this=0x7fdd50028f70) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Link.cpp:227 #12 0x00007fdd85e087fe in qpid::broker::Link::maintenanceVisit (this=0x7fdd50028f70) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Link.cpp:518 #13 0x00007fdd85e0d574 in qpid::broker::LinkTimerTask::fire (this=0x7fdd50018d20) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Link.cpp:70 #14 0x00007fdd858af65b in qpid::sys::Timer::fire (this=<value optimized out>, t=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/Timer.cpp:222 #15 0x00007fdd858b0c85 in qpid::sys::Timer::run (this=0x7460f0) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/Timer.cpp:156 #16 0x00007fdd8583f7aa in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/Thread.cpp:35 #17 0x00007fdd85254a51 in start_thread (arg=0x7fdd80c48700) at pthread_create.c:301 #18 0x00007fdd84fa196d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) thread 2 [Switching to thread 2 (Thread 0x7fdd7f82a700 (LWP 26621))]#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 136 2: movl %edx, %eax (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007fdd852565a3 in _L_lock_892 () from /lib64/libpthread.so.0 #2 0x00007fdd85256487 in __pthread_mutex_lock (mutex=0x7463d8) at pthread_mutex_lock.c:82 #3 0x00007fdd85846905 in qpid::sys::Mutex::lock (this=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/Mutex.h:116 #4 0x00007fdd8584a39b in ScopedLock (this=<value optimized out>, h=0x7fdd4cf17578) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/Mutex.h:33 #5 qpid::sys::HandleSet::remove (this=<value optimized out>, h=0x7fdd4cf17578) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:177 #6 0x00007fdd8584a546 in qpid::sys::Poller::unregisterHandle (this=0x73bec0, handle=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:352 #7 0x00007fdd858ab2b1 in qpid::sys::DispatchHandle::stopWatch (this=0x7fdd4cf17578) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/DispatchHandle.cpp:191 #8 0x00007fdd85826922 in qpid::sys::posix::AsynchConnector::connComplete (this=0x7fdd4cf17570, h=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/AsynchIO.cpp:213 #9 0x00007fdd858ac2a3 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:1013 #10 0x00007fdd858aabc7 in qpid::sys::DispatchHandle::processEvent (this=0x7fdd4cf17578, type=qpid::sys::Poller::DISCONNECTED) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/DispatchHandle.cpp:291 #11 0x00007fdd858493e6 in process (this=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/Poller.h:131 #12 qpid::sys::HandleSet::cleanup (this=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:187 #13 0x00007fdd8584b3ed in qpid::sys::Poller::run (this=0x73bec0) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/epoll/EpollPoller.cpp:529 #14 0x00007fdd8583f7aa in qpid::sys::(anonymous namespace)::runRunnable (p=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/Thread.cpp:35 #15 0x00007fdd85254a51 in start_thread (arg=0x7fdd7f82a700) at pthread_create.c:301 #16 0x00007fdd84fa196d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115