Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 596758 - qpidd+msgstore crashes in mrg::journal::wmgr::~wmgr() -> free()
qpidd+msgstore crashes in mrg::journal::wmgr::~wmgr() -> free()
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
Development
All Linux
high Severity high
: 1.3
: ---
Assigned To: Kim van der Riet
Frantisek Reznicek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-27 09:12 EDT by Frantisek Reznicek
Modified: 2015-11-15 20:12 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-10-20 07:29:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Frantisek Reznicek 2010-05-27 09:12:31 EDT
Description of problem:

There are observed quite frequent crashes of qpidd+msgstore in mrg::journal::wmgr::~wmgr() call, precisely:


  #0  0x000000360dc72449 in _int_free () from /lib64/libc.so.6
  #1  0x000000360dc7276b in free () from /lib64/libc.so.6
  #2  0x00002adb693c382d in deallocate (this=<value optimized out>, 
      __nstart=0x1, __nfinish=0x360df53048)
      at /usr/include/c++/4.1.2/ext/new_allocator.h:94
  #3  _M_deallocate_node (this=<value optimized out>, __nstart=0x1, 
      __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:419
  #4  std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::_M_destroy_nodes (this=<value optimized out>, __nstart=0x1, 
      __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:524
  #5  0x00002adb693c3a1b in std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::~_Deque_base (this=0x5040d90, 
      __in_chrg=<value optimized out>)
      at /usr/include/c++/4.1.2/bits/stl_deque.h:445
  #6  0x00002adb693d1b7c in ~deque (this=0x5040c18, 
      __in_chrg=<value optimized out>)
      at /usr/include/c++/4.1.2/bits/stl_deque.h:725
  #7  mrg::journal::wmgr::~wmgr (this=0x5040c18, 
      __in_chrg=<value optimized out>) at jrnl/wmgr.cpp:91
  #8  0x00002adb693aa44a in mrg::journal::jcntl::~jcntl (this=0x50408a8, 
      __in_chrg=<value optimized out>) at jrnl/jcntl.cpp:84
  ...

The crash was seen in the automated test which was stressing msgstore module by restarting broker with different msgstore arguments/parameters and message flow in / out after restarts.

The reproducer is identical to bug 595438 one.

The issue was seen multiple times on RHEL 5.5 x86_64.


Version-Release number of selected component (if applicable):
 python-qmf-0.7.946106-3.el5
 python-qpid-0.7.946106-1.el5
 qmf-0.7.946106-1.el5
 qmf-devel-0.7.946106-1.el5
 qpid-cpp-client-0.7.946106-1.el5
 qpid-cpp-client-devel-0.7.946106-1.el5
 qpid-cpp-client-devel-docs-0.7.946106-1.el5
 qpid-cpp-client-ssl-0.7.946106-1.el5
 qpid-cpp-mrg-debuginfo-0.7.946106-1.el5
 qpid-cpp-server-0.7.946106-1.el5
 qpid-cpp-server-cluster-0.7.946106-1.el5
 qpid-cpp-server-devel-0.7.946106-1.el5
 qpid-cpp-server-ssl-0.7.946106-1.el5
 qpid-cpp-server-store-0.7.946106-1.el5
 qpid-cpp-server-xml-0.7.946106-1.el5
 qpid-java-client-0.7.946106-1.el5
 qpid-java-common-0.7.946106-1.el5
 qpid-tests-0.7.946106-1.el5
 qpid-tools-0.7.946106-2.el5


How reproducible:
quite rapidly (>80%)

Steps to Reproduce:
1. run MRG/Messaging/qpid_ptest_restart_with_msgstore_sweep
http://cvs.devel.redhat.com/cgi-bin/cvsweb.cgi/tests/distribution/MRG/Messaging/qpid_ptest_restart_with_msgstore_sweep/runtest.sh?rev=HEAD
2. wait for crashes

  
Actual results:
Broker crashes.

Expected results:
Broker should not crash.

Additional info:


./core.8640: ELF 64-bit LSB core file AMD x86-64, version 1 (SYSV), SVR4-style, from 'qpidd'
  GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)
  Copyright (C) 2009 Free Software Foundation, Inc.
  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
  This is free software: you are free to change and redistribute it.
  There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
  and "show warranty" for details.
  This GDB was configured as "x86_64-redhat-linux-gnu".
  For bug reporting instructions, please see:
  warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .gnu.liblist not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .dynbss not found in /usr/lib/debug/usr/sbin/qpidd.debug
  warning: section .gnu.conflict not found in /usr/lib/debug/usr/sbin/qpidd.debug
  Core was generated by `/usr/sbin/qpidd --data-dir /mnt/tests/distribution/MRG/Messaging/qpid_ptest_res'.
  Program terminated with signal 11, Segmentation fault.
  #0  0x000000360dc72449 in _int_free () from /lib64/libc.so.6
  (gdb) rax            0xe0c9290	235704976
  rbx            0xe0c9080	235704448
  rcx            0x11afd5f78ff767f0	1274472478818134000
  rdx            0x360df53048	232162406472
  rsi            0x1	1
  rdi            0x360df529e0	232162404832
  rbp            0x210	0x210
  rsp            0x7fff78c26ef0	0x7fff78c26ef0
  r8             0x43132940	1125329216
  r9             0xe0bb3e0	235647968
  r10            0x0	0
  r11            0x360e808740	232171538240
  r12            0xe0c9290	235704976
  r13            0x360dd22438	232160109624
  r14            0xf80	3968
  r15            0x360df529e0	232162404832
  rip            0x360dc72449	0x360dc72449 <_int_free+2137>
  eflags         0x10206	[ PF IF RF ]
  cs             0x33	51
  ss             0x2b	43
  ds             0x0	0
  es             0x0	0
  fs             0x0	0
  gs             0x0	0
  fctrl          0x37f	895
  fstat          0x0	0
  ftag           0xffff	65535
  fiseg          0x0	0
  fioff          0x0	0
  foseg          0x0	0
  fooff          0x0	0
  fop            0x0	0
  mxcsr          0x1f80	[ IM DM ZM OM UM PM ]
  (gdb) Using memory regions provided by the target.
  There are no memory regions defined.
  (gdb) 16   AT_HWCAP             Machine-dependent CPU capability hints 0x178bfbff
  6    AT_PAGESZ            System page size               4096
  17   AT_CLKTCK            Frequency of times()           100
  3    AT_PHDR              Program headers for program    0x400040
  4    AT_PHENT             Size of program header entry   56
  5    AT_PHNUM             Number of program headers      8
  7    AT_BASE              Base address of interpreter    0x0
  8    AT_FLAGS             Flags                          0x0
  9    AT_ENTRY             Entry point of program         0x404ff0
  11   AT_UID               Real user ID                   0
  12   AT_EUID              Effective user ID              0
  13   AT_GID               Real group ID                  0
  14   AT_EGID              Effective group ID             0
  23   AT_SECURE            Boolean, was exec setuid-like? 0
  15   AT_PLATFORM          String identifying platform    0x7fff78c27d89 "x86_64"
  0    AT_NULL              End of vector                  0x0
  (gdb) Stack level 0, frame at 0x7fff78c26f80:
   rip = 0x360dc72449 in _int_free; saved rip 0x360dc7276b
   called by frame at 0x7fff78c26fc0
   Arglist at 0x7fff78c26ee8, args: 
   Locals at 0x7fff78c26ee8, Previous frame's sp is 0x7fff78c26f80
   Saved registers:
    rbx at 0x7fff78c26f48, rbp at 0x7fff78c26f50, r12 at 0x7fff78c26f58,
    r13 at 0x7fff78c26f60, r14 at 0x7fff78c26f68, r15 at 0x7fff78c26f70,
    rip at 0x7fff78c26f78
  (gdb) From                To                  Syms Read   Shared Object Library
  0x000000360ec9ee10  0x000000360ee1fab8  Yes (*)     /usr/lib64/libqpidbroker.so.2
  0x000000361350eab0  0x000000361360db18  Yes (*)     /usr/lib64/libqpidcommon.so.2
  0x0000003610010aa0  0x000000361002dae8  Yes (*)     /usr/lib64/libboost_program_options.so.2
  0x0000003610404810  0x000000361040cff8  Yes (*)     /usr/lib64/libboost_filesystem.so.2
  0x0000003622201500  0x0000003622202918  Yes (*)     /lib64/libuuid.so.1
  0x000000360e400e10  0x000000360e401a08  Yes (*)     /lib64/libdl.so.2
  0x000000360f802220  0x000000360f805cc8  Yes (*)     /lib64/librt.so.1
  0x0000003622a046e0  0x0000003622a13be8  Yes (*)     /usr/lib64/libsasl2.so.2
  0x000000361744f430  0x00000036174c3058  Yes (*)     /usr/lib64/libstdc++.so.6
  0x000000360e003e60  0x000000360e043e38  Yes (*)     /lib64/libm.so.6
  0x0000003614801e50  0x000000361480b018  Yes (*)     /lib64/libgcc_s.so.1
  0x000000360dc1d780  0x000000360dd09ff8  Yes (*)     /lib64/libc.so.6
  0x000000360d800a70  0x000000360d81671e  Yes (*)     /lib64/ld-linux-x86-64.so.2
  0x000000360e8051f0  0x000000360e810258  Yes (*)     /lib64/libpthread.so.0
  0x0000003610c032a0  0x0000003610c0e2d8  Yes (*)     /lib64/libresolv.so.2
  0x00000036184009f0  0x0000003618406918  Yes (*)     /lib64/libcrypt.so.1
  0x00002b5751f755e0  0x00002b5751f78b68  Yes (*)     /usr/lib64/qpid/daemon/watchdog.so
  0x00002b57521b2580  0x00002b575223eda8  Yes (*)     /usr/lib64/qpid/daemon/msgstore.so
  0x00002b57524a85d0  0x00002b5752561288  Yes (*)     /usr/lib64/libdb_cxx-4.3.so
  0x00002b575278c510  0x00002b575278c6d1  Yes (*)     /usr/lib64/libaio.so.1
  0x00002b5752992610  0x00002b5752996bd8  Yes (*)     /usr/lib64/qpid/daemon/replication_exchange.so
  0x00002b5752bdb4e0  0x00002b5752c45ee8  Yes (*)     /usr/lib64/qpid/daemon/cluster.so
  0x00002b5752e833d0  0x00002b5752e85338  Yes (*)     /usr/lib64/openais/libcpg.so.2
  0x00002b5753087110  0x00002b5753089b78  Yes (*)     /usr/lib64/libcman.so.2
  0x000000361083f920  0x00000036108b8a48  Yes (*)     /usr/lib64/libqpidclient.so.2
  0x00002b57532937b0  0x00002b575329d578  Yes (*)     /usr/lib64/qpid/daemon/xml.so
  0x00002b575362a070  0x00002b57537ab758  Yes (*)     /usr/lib64/libxerces-c.so.28
  0x00002b5753c40090  0x00002b5753dcbb28  Yes (*)     /usr/lib64/libxqilla.so.3
  0x00002b575410d570  0x00002b5754115728  Yes (*)     /usr/lib64/qpid/daemon/ssl.so
  0x00002b575433c640  0x00002b57543530a8  Yes (*)     /usr/lib64/libsslcommon.so.2
  0x0000003620a18080  0x0000003620af4968  Yes (*)     /usr/lib64/libnss3.so
  0x0000003621e07ee0  0x0000003621e28ae8  Yes (*)     /usr/lib64/libssl3.so
  0x000000362120ced0  0x000000362122b018  Yes (*)     /usr/lib64/libnspr4.so
  0x0000003620e080d0  0x0000003620e12428  Yes (*)     /usr/lib64/libnssutil3.so
  0x0000003621a01370  0x0000003621a02978  Yes (*)     /usr/lib64/libplc4.so
  0x0000003621600e10  0x0000003621601c08  Yes (*)     /usr/lib64/libplds4.so
  0x00002b5754560b70  0x00002b5754566738  Yes (*)     /usr/lib64/qpid/daemon/replicating_listener.so
  0x00002b5754777f10  0x00002b57547984e8  Yes (*)     /usr/lib64/qpid/daemon/acl.so
  (*): Shared library is missing debugging information.
  (gdb) * 1 Thread 8640  0x000000360dc72449 in _int_free () from /lib64/libc.so.6
  Thread 1 (Thread 8640):
  #0  0x000000360dc72449 in _int_free () from /lib64/libc.so.6
  #1  0x000000360dc7276b in free () from /lib64/libc.so.6
  #2  0x00002b575221882d in deallocate (this=<value optimized out>, 
      __nstart=0x1, __nfinish=0x360df53048)
      at /usr/include/c++/4.1.2/ext/new_allocator.h:94
  #3  _M_deallocate_node (this=<value optimized out>, __nstart=0x1, 
      __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:419
  #4  std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::_M_destroy_nodes (this=<value optimized out>, __nstart=0x1, 
      __nfinish=0x360df53048) at /usr/include/c++/4.1.2/bits/stl_deque.h:524
  #5  0x00002b5752218a1b in std::_Deque_base<mrg::journal::data_tok*, std::allocator<mrg::journal::data_tok*> >::~_Deque_base (this=0xe0c8d90, 
      __in_chrg=<value optimized out>)
      at /usr/include/c++/4.1.2/bits/stl_deque.h:445
  #6  0x00002b5752226b7c in ~deque (this=0xe0c8c18, 
      __in_chrg=<value optimized out>)
      at /usr/include/c++/4.1.2/bits/stl_deque.h:725
  #7  mrg::journal::wmgr::~wmgr (this=0xe0c8c18, 
      __in_chrg=<value optimized out>) at jrnl/wmgr.cpp:91
  #8  0x00002b57521ff44a in mrg::journal::jcntl::~jcntl (this=0xe0c88a8, 
      __in_chrg=<value optimized out>) at jrnl/jcntl.cpp:84
  #9  0x00002b57521bade3 in mrg::msgstore::JournalImpl::~JournalImpl (
      this=0xe0c88a0, __in_chrg=<value optimized out>) at JournalImpl.cpp:128
  #10 0x00002b57521ddfbc in mrg::msgstore::TplJournalImpl::~TplJournalImpl (
      this=0x360df529e0, __in_chrg=<value optimized out>) at JournalImpl.h:243
  #11 0x00002b57521c9ec9 in release (this=0xe0b4570, 
      __in_chrg=<value optimized out>)
      at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
  #12 ~shared_count (this=0xe0b4570, __in_chrg=<value optimized out>)
      at /usr/include/boost/detail/shared_count.hpp:159
  #13 ~shared_ptr (this=0xe0b4570, __in_chrg=<value optimized out>)
      at /usr/include/boost/shared_ptr.hpp:106
  #14 mrg::msgstore::MessageStoreImpl::~MessageStoreImpl (this=0xe0b4570, 
      __in_chrg=<value optimized out>) at MessageStoreImpl.cpp:446
  #15 0x00002b57521b28ae in ~shared_count ()
      at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:145
  #16 ~shared_ptr () at /usr/include/boost/shared_ptr.hpp:106
  #17 ~StorePlugin () at StorePlugin.cpp:36
  #18 __tcf_1 () at StorePlugin.cpp:77
  #19 0x000000360dc333a5 in exit () from /lib64/libc.so.6
  #20 0x000000360dc1d99b in __libc_start_main () from /lib64/libc.so.6
  #21 0x0000000000405019 in _start ()
  (gdb) quit


Full crash list here:
https://beaker.engineering.redhat.com/logs/2010/34/834/1556/11356/45363///test_log--distribution-MRG-Messaging-qpid_ptest_restart_with_msgstore_sweep.log
Comment 1 Jan Sarenik 2010-06-01 04:50:56 EDT
Kim, was this also fixed along with bug 595438? If so, please move
it to MODIFIED. Thanks.
Comment 2 Kim van der Riet 2010-06-01 07:21:03 EDT
I cannot reproduce this.

The reproducer is identical to bug 595438, and I always get an outcome similar to that bug. Perhaps it is hardware dependent? I do suspect that this is another manifestation of the bug that caused bug 595438 (ie bug 596765), and with that bug fixed, this one should also be solved.

Without a reproducer that works, I can't prove it, though.

I'll change this to MODIFIED. QE: Are you able to retest on the same hardware that produced this bug and check if it is solved? If it is not fixed, set this back to ASSIGNED.
Comment 3 Frantisek Reznicek 2010-06-21 10:16:16 EDT
Stress testing currently ongoing...
Comment 5 Frantisek Reznicek 2010-09-15 09:10:07 EDT
After retest on the machine, I can claim the issue is gone. Last time tested on RHEL 4.8 / 5.5 on packages:
python-qmf-0.7.946106-13.el5
python-qpid-0.7.946106-14.el5
qmf-*0.7.946106-14.el5
qpid-cpp-*-0.7.946106-14.el5
qpid-java-*-0.7.946106-8.el5
qpid-tests-0.7.946106-1.el5
qpid-tools-0.7.946106-10.el5

-> VERIFIED

Note You need to log in before you can comment on or make changes to this bug.