Description of problem: There are observed quite frequent crashes of qpidd+msgstore in mrg::journal::wmgr::~wmgr() call, precisely: #0 0x000000360dc72449 in _int_free () from /lib64/libc.so.6 #1 0x000000360dc7276b in free () from /lib64/libc.so.6 #2 0x00002adb693c382d in deallocate (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/ext/new_allocator.h:94 #3 _M_deallocate_node (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:419 #4 std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::_M_destroy_nodes (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:524 #5 0x00002adb693c3a1b in std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::~_Deque_base (this=0x5040d90, __in_chrg=<value optimized out>) at /usr/include/c++/4.1.2/bits/stl_deque.h:445 #6 0x00002adb693d1b7c in ~deque (this=0x5040c18, __in_chrg=<value optimized out>) at /usr/include/c++/4.1.2/bits/stl_deque.h:725 #7 mrg::journal::wmgr::~wmgr (this=0x5040c18, __in_chrg=<value optimized out>) at jrnl/wmgr.cpp:91 #8 0x00002adb693aa44a in mrg::journal::jcntl::~jcntl (this=0x50408a8, __in_chrg=<value optimized out>) at jrnl/jcntl.cpp:84 ... The crash was seen in the automated test which was stressing msgstore module by restarting broker with different msgstore arguments/parameters and message flow in / out after restarts. The reproducer is identical to bug 595438 one. The issue was seen multiple times on RHEL 5.5 x86_64. Version-Release number of selected component (if applicable): python-qmf-0.7.946106-3.el5 python-qpid-0.7.946106-1.el5 qmf-0.7.946106-1.el5 qmf-devel-0.7.946106-1.el5 qpid-cpp-client-0.7.946106-1.el5 qpid-cpp-client-devel-0.7.946106-1.el5 qpid-cpp-client-devel-docs-0.7.946106-1.el5 qpid-cpp-client-ssl-0.7.946106-1.el5 qpid-cpp-mrg-debuginfo-0.7.946106-1.el5 qpid-cpp-server-0.7.946106-1.el5 qpid-cpp-server-cluster-0.7.946106-1.el5 qpid-cpp-server-devel-0.7.946106-1.el5 qpid-cpp-server-ssl-0.7.946106-1.el5 qpid-cpp-server-store-0.7.946106-1.el5 qpid-cpp-server-xml-0.7.946106-1.el5 qpid-java-client-0.7.946106-1.el5 qpid-java-common-0.7.946106-1.el5 qpid-tests-0.7.946106-1.el5 qpid-tools-0.7.946106-2.el5 How reproducible: quite rapidly (>80%) Steps to Reproduce: 1. run MRG/Messaging/qpid_ptest_restart_with_msgstore_sweep http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_ptest_restart_with_msgstore_sweep/runtest.sh?rev=HEAD 2. wait for crashes Actual results: Broker crashes. Expected results: Broker should not crash. Additional info: ./core.8640: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'qpidd' GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5) Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug Core was generated by `/usr/sbin/qpidd --data-dir /mnt/tests/distribution/MRG/Messaging/qpid_ptest_res'. Program terminated with signal 11, Segmentation fault. #0 0x000000360dc72449 in _int_free () from /lib64/libc.so.6 (gdb) rax 0xe0c9290 235704976 rbx 0xe0c9080 235704448 rcx 0x11afd5f78ff767f0 1274472478818134000 rdx 0x360df53048 232162406472 rsi 0x1 1 rdi 0x360df529e0 232162404832 rbp 0x210 0x210 rsp 0x7fff78c26ef0 0x7fff78c26ef0 r8 0x43132940 1125329216 r9 0xe0bb3e0 235647968 r10 0x0 0 r11 0x360e808740 232171538240 r12 0xe0c9290 235704976 r13 0x360dd22438 232160109624 r14 0xf80 3968 r15 0x360df529e0 232162404832 rip 0x360dc72449 0x360dc72449 <_int_free+2137> eflags 0x10206 [ PF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 fctrl 0x37f 895 fstat 0x0 0 ftag 0xffff 65535 fiseg 0x0 0 fioff 0x0 0 foseg 0x0 0 fooff 0x0 0 fop 0x0 0 mxcsr 0x1f80 [ IM DM ZM OM UM PM ] (gdb) Using memory regions provided by the target. There are no memory regions defined. (gdb) 16 AT_HWCAP Machine-dependent CPU capability hints 0x178bfbff 6 AT_PAGESZ System page size 4096 17 AT_CLKTCK Frequency of times() 100 3 AT_PHDR Program headers for program 0x400040 4 AT_PHENT Size of program header entry 56 5 AT_PHNUM Number of program headers 8 7 AT_BASE Base address of interpreter 0x0 8 AT_FLAGS Flags 0x0 9 AT_ENTRY Entry point of program 0x404ff0 11 AT_UID Real user ID 0 12 AT_EUID Effective user ID 0 13 AT_GID Real group ID 0 14 AT_EGID Effective group ID 0 23 AT_SECURE Boolean, was exec setuid-like? 0 15 AT_PLATFORM String identifying platform 0x7fff78c27d89 "x86_64" 0 AT_NULL End of vector 0x0 (gdb) Stack level 0, frame at 0x7fff78c26f80: rip = 0x360dc72449 in _int_free; saved rip 0x360dc7276b called by frame at 0x7fff78c26fc0 Arglist at 0x7fff78c26ee8, args: Locals at 0x7fff78c26ee8, Previous frame's sp is 0x7fff78c26f80 Saved registers: rbx at 0x7fff78c26f48, rbp at 0x7fff78c26f50, r12 at 0x7fff78c26f58, r13 at 0x7fff78c26f60, r14 at 0x7fff78c26f68, r15 at 0x7fff78c26f70, rip at 0x7fff78c26f78 (gdb) From To Syms Read Shared Object Library 0x000000360ec9ee10 0x000000360ee1fab8 Yes (*) /usr/lib64/libqpidbroker.so.2 0x000000361350eab0 0x000000361360db18 Yes (*) /usr/lib64/libqpidcommon.so.2 0x0000003610010aa0 0x000000361002dae8 Yes (*) /usr/lib64/libboost_program_options.so.2 0x0000003610404810 0x000000361040cff8 Yes (*) /usr/lib64/libboost_filesystem.so.2 0x0000003622201500 0x0000003622202918 Yes (*) /lib64/libuuid.so.1 0x000000360e400e10 0x000000360e401a08 Yes (*) /lib64/libdl.so.2 0x000000360f802220 0x000000360f805cc8 Yes (*) /lib64/librt.so.1 0x0000003622a046e0 0x0000003622a13be8 Yes (*) /usr/lib64/libsasl2.so.2 0x000000361744f430 0x00000036174c3058 Yes (*) /usr/lib64/libstdc++.so.6 0x000000360e003e60 0x000000360e043e38 Yes (*) /lib64/libm.so.6 0x0000003614801e50 0x000000361480b018 Yes (*) /lib64/libgcc_s.so.1 0x000000360dc1d780 0x000000360dd09ff8 Yes (*) /lib64/libc.so.6 0x000000360d800a70 0x000000360d81671e Yes (*) /lib64/ld-linux-x86-64.so.2 0x000000360e8051f0 0x000000360e810258 Yes (*) /lib64/libpthread.so.0 0x0000003610c032a0 0x0000003610c0e2d8 Yes (*) /lib64/libresolv.so.2 0x00000036184009f0 0x0000003618406918 Yes (*) /lib64/libcrypt.so.1 0x00002b5751f755e0 0x00002b5751f78b68 Yes (*) /usr/lib64/qpid/daemon/watchdog.so 0x00002b57521b2580 0x00002b575223eda8 Yes (*) /usr/lib64/qpid/daemon/msgstore.so 0x00002b57524a85d0 0x00002b5752561288 Yes (*) /usr/lib64/libdb_cxx-4.3.so 0x00002b575278c510 0x00002b575278c6d1 Yes (*) /usr/lib64/libaio.so.1 0x00002b5752992610 0x00002b5752996bd8 Yes (*) /usr/lib64/qpid/daemon/replication_exchange.so 0x00002b5752bdb4e0 0x00002b5752c45ee8 Yes (*) /usr/lib64/qpid/daemon/cluster.so 0x00002b5752e833d0 0x00002b5752e85338 Yes (*) /usr/lib64/openais/libcpg.so.2 0x00002b5753087110 0x00002b5753089b78 Yes (*) /usr/lib64/libcman.so.2 0x000000361083f920 0x00000036108b8a48 Yes (*) /usr/lib64/libqpidclient.so.2 0x00002b57532937b0 0x00002b575329d578 Yes (*) /usr/lib64/qpid/daemon/xml.so 0x00002b575362a070 0x00002b57537ab758 Yes (*) /usr/lib64/libxerces-c.so.28 0x00002b5753c40090 0x00002b5753dcbb28 Yes (*) /usr/lib64/libxqilla.so.3 0x00002b575410d570 0x00002b5754115728 Yes (*) /usr/lib64/qpid/daemon/ssl.so 0x00002b575433c640 0x00002b57543530a8 Yes (*) /usr/lib64/libsslcommon.so.2 0x0000003620a18080 0x0000003620af4968 Yes (*) /usr/lib64/libnss3.so 0x0000003621e07ee0 0x0000003621e28ae8 Yes (*) /usr/lib64/libssl3.so 0x000000362120ced0 0x000000362122b018 Yes (*) /usr/lib64/libnspr4.so 0x0000003620e080d0 0x0000003620e12428 Yes (*) /usr/lib64/libnssutil3.so 0x0000003621a01370 0x0000003621a02978 Yes (*) /usr/lib64/libplc4.so 0x0000003621600e10 0x0000003621601c08 Yes (*) /usr/lib64/libplds4.so 0x00002b5754560b70 0x00002b5754566738 Yes (*) /usr/lib64/qpid/daemon/replicating_listener.so 0x00002b5754777f10 0x00002b57547984e8 Yes (*) /usr/lib64/qpid/daemon/acl.so (*): Shared library is missing debugging information. (gdb) * 1 Thread 8640 0x000000360dc72449 in _int_free () from /lib64/libc.so.6 Thread 1 (Thread 8640): #0 0x000000360dc72449 in _int_free () from /lib64/libc.so.6 #1 0x000000360dc7276b in free () from /lib64/libc.so.6 #2 0x00002b575221882d in deallocate (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/ext/new_allocator.h:94 #3 _M_deallocate_node (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:419 #4 std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::_M_destroy_nodes (this=<value optimized out>, __nstart=0x1, __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:524 #5 0x00002b5752218a1b in std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::~_Deque_base (this=0xe0c8d90, __in_chrg=<value optimized out>) at /usr/include/c++/4.1.2/bits/stl_deque.h:445 #6 0x00002b5752226b7c in ~deque (this=0xe0c8c18, __in_chrg=<value optimized out>) at /usr/include/c++/4.1.2/bits/stl_deque.h:725 #7 mrg::journal::wmgr::~wmgr (this=0xe0c8c18, __in_chrg=<value optimized out>) at jrnl/wmgr.cpp:91 #8 0x00002b57521ff44a in mrg::journal::jcntl::~jcntl (this=0xe0c88a8, __in_chrg=<value optimized out>) at jrnl/jcntl.cpp:84 #9 0x00002b57521bade3 in mrg::msgstore::JournalImpl::~JournalImpl ( this=0xe0c88a0, __in_chrg=<value optimized out>) at JournalImpl.cpp:128 #10 0x00002b57521ddfbc in mrg::msgstore::TplJournalImpl::~TplJournalImpl ( this=0x360df529e0, __in_chrg=<value optimized out>) at JournalImpl.h:243 #11 0x00002b57521c9ec9 in release (this=0xe0b4570, __in_chrg=<value optimized out>) at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145 #12 ~shared_count (this=0xe0b4570, __in_chrg=<value optimized out>) at /usr/include/boost/detail/shared_count.hpp:159 #13 ~shared_ptr (this=0xe0b4570, __in_chrg=<value optimized out>) at /usr/include/boost/shared_ptr.hpp:106 #14 mrg::msgstore::MessageStoreImpl::~MessageStoreImpl (this=0xe0b4570, __in_chrg=<value optimized out>) at MessageStoreImpl.cpp:446 #15 0x00002b57521b28ae in ~shared_count () at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145 #16 ~shared_ptr () at /usr/include/boost/shared_ptr.hpp:106 #17 ~StorePlugin () at StorePlugin.cpp:36 #18 __tcf_1 () at StorePlugin.cpp:77 #19 0x000000360dc333a5 in exit () from /lib64/libc.so.6 #20 0x000000360dc1d99b in __libc_start_main () from /lib64/libc.so.6 #21 0x0000000000405019 in _start () (gdb) quit Full crash list here: https://beaker.engineering.redhat.com/logs/2010/34/834/1556/11356/45363///test_log--distribution-MRG-Messaging-qpid_ptest_restart_with_msgstore_sweep.log
Kim, was this also fixed along with bug 595438? If so, please move it to MODIFIED. Thanks.
I cannot reproduce this. The reproducer is identical to bug 595438, and I always get an outcome similar to that bug. Perhaps it is hardware dependent? I do suspect that this is another manifestation of the bug that caused bug 595438 (ie bug 596765), and with that bug fixed, this one should also be solved. Without a reproducer that works, I can't prove it, though. I'll change this to MODIFIED. QE: Are you able to retest on the same hardware that produced this bug and check if it is solved? If it is not fixed, set this back to ASSIGNED.
Stress testing currently ongoing...
After retest on the machine, I can claim the issue is gone. Last time tested on RHEL 4.8 / 5.5 on packages: python-qmf-0.7.946106-13.el5 python-qpid-0.7.946106-14.el5 qmf-*0.7.946106-14.el5 qpid-cpp-*-0.7.946106-14.el5 qpid-java-*-0.7.946106-8.el5 qpid-tests-0.7.946106-1.el5 qpid-tools-0.7.946106-10.el5 -> VERIFIED