Bug 460113 - qpidd segfault during RHTS run (RHEL 4)
qpidd segfault during RHTS run (RHEL 4)
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.0
All Linux
urgent Severity high
: 1.0.1
: ---
Assigned To: messaging-bugs
Kim van der Riet
:
Depends On: 456454
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-26 04:53 EDT by Gordon Sim
Modified: 2011-06-27 15:57 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-06-27 15:57:47 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Gordon Sim 2008-08-26 04:53:25 EDT
+++ This bug was initially created as a clone of Bug #456454 +++

Got this on the console:

qpidd[9845]: segfault at 00000000063cc000 rip 00002b0f7687844b rsp
0000000043145698 error 4

and caught the core dump, which is attached.  Off to find a qpidd with symbols
so I can get a meaningful backtrace.

qpidd-0.2.676581-1.el5

--- Additional comment from jneedle@redhat.com on 2008-07-23 15:36:43 EDT ---

Created an attachment (id=312515)
qpidd core dump


--- Additional comment from jneedle@redhat.com on 2008-07-23 15:42:09 EDT ---

qpidd.debug doesn't give me much more.  Deferring to the experts.

Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir
/tmp/rhts_qpidd/qpid-data/pt_broker.8'.

--- Additional comment from davids@redhat.com on 2008-07-24 04:42:00 EDT ---

This seems to only happen on RHEL5 (i386 and x86_64).  RHEL4 do not seem to get
in such troubles.  It happens on boxes with 8 CPU cores.  

Wild guess (based on earlier chat with Andrew): Could it be connected to pthread
libraries?  Different pthread versions on RHEL4 and RHEL5?

--- Additional comment from gsim@redhat.com on 2008-07-24 04:58:58 EDT ---

Stack trace from david:

#0  0x0053876c in memcpy () from /lib/libc.so.6
#1  0x001e2e54 in std::string::_Rep::_M_clone () from /usr/lib/libstdc++.so.6
#2  0x001e37b7 in std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6
#3  0x0098972c in qpid::management::Journal::getPackageName () from
/usr/lib/qpidd/libbdbstore.so
#4  0x00e39726 in ?? ()
#5  0xb6ba0c6c in ?? ()
#6  0x09355308 in ?? ()
#7  0x00000001 in ?? ()
#8  0x00002b9c in ?? ()
#9  0xb6ba0c98 in ?? ()
#10 0x00534030 in free () from /lib/libc.so.6
#11 0x00982691 in qpid::management::Journal::writeStatistics () from
/usr/lib/qpidd/libbdbstore.so
#12 0x00e2afa3 in ?? ()
#13 0x09355308 in ?? ()
#14 0xb6ba0e50 in ?? ()
#15 0x00000000 in ?? ()


--- Additional comment from davids@redhat.com on 2008-07-24 05:04:47 EDT ---

On this last run, we got 2 core files.  Only one of them gave as much as the
comment #5 backtrace.

Both cores are equal on #0, and that's the only similarity.  

#0  0x005cf76c in memcpy () from /lib/libc.so.6
#1  0x00504874 in ?? ()
#2  0xa7a73014 in ?? ()
#3  0x0888b724 in ?? ()
#4  0x0888ba30 in ?? ()
#5  0x00557ff4 in ?? ()
#6  0x0888b718 in ?? ()
#7  0x08892c98 in ?? ()
#8  0xb6b8fb48 in ?? ()
#9  0x005051d7 in ?? ()
#10 0x0888b718 in ?? ()
#11 0xb6b8fb3f in ?? ()
#12 0x00000000 in ?? ()


--- Additional comment from jneedle@redhat.com on 2008-07-24 09:24:51 EDT ---

Created an attachment (id=312556)
Core from 5.2 i386 run


--- Additional comment from jneedle@redhat.com on 2008-07-24 09:25:43 EDT ---

Created an attachment (id=312557)
Second qpidd core from 5.2 i386 run


--- Additional comment from kim.vdriet@redhat.com on 2008-07-24 09:30:53 EDT ---

A partial backtrace of the attached core file (from #1 above) shows:
#0  0x00002b0f7687844b in ?? ()
#1  0x00002b0f761082b0 in ?? ()
#2  0x0000000005ed0100 in ?? ()
#3  0x0000000043145700 in ?? ()
#4  0x00000000431459e0 in ?? ()
#5  0x00002b0f761089af in ?? ()
#6  0x000000000000003d in ?? ()
#7  0x0000000000000024 in ?? ()
#8  0x0000000000610838 in std::string::_Rep::_S_empty_rep_storage ()
#9  0x0000000043145700 in ?? ()
#10 0x0000000005f2b600 in ?? ()
#11 0x00002b0f7725ea90 in qpid::management::Journal::getPackageName () from
/usr/lib64/qpidd/libbdbstore.so
#12 0x00002b0f741728aa in qpid::management::ManagementObject::writeTimestamps
(this=0x2aaab4000010, buf=@0x0) at qpid/management/ManagementObject.cpp:32
#13 0x00002b0f7725851d in qpid::management::Journal::writeStatistics () from
/usr/lib64/qpidd/libbdbstore.so
#14 0x00002b0f7416735b in qpid::management::ManagementBroker::PeriodicProcessing
(this=0x2aaaaaaab010) at qpid/management/ManagementBroker.cpp:314
#15 0x00002b0f74167938 in qpid::management::ManagementBroker::Periodic::fire
(this=0x2aaaac0563d0) at qpid/management/ManagementBroker.cpp:181
#16 0x00002b0f741575f5 in qpid::broker::Timer::run (this=0x2aaaaaaab138) at
qpid/broker/Timer.cpp:64
#17 0x00002b0f74500cda in qpid::sys::Thread::runRunnable (p=0x2aaab44fbea8) at
qpid/sys/posix/Thread.cpp:27
#18 0x00002b0f76b572f7 in ?? ()
#19 0x0000000000000000 in ?? ()

On the surface, this looks like a thread timing issue in management - ie a timer
is firing and making a call on a non-existent or deleted journal management
object (or part of an object, the crash seems to be happening on a std::string
operation of some sort) through qpid::management::Journal::writeStatistics()

--- Additional comment from jneedle@redhat.com on 2008-07-24 10:28:23 EDT ---

Playing the "Let's randomly install debuginfo packages until this is useful"
game (added gcc-debuginfo, glibc-debuginfo, and glibc-debuginfo-common) yields
this somewhat more useful trace for core.11161.  Fingers are starting to point
in Ted's general direction here...

Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir
/tmp/rhts_qpidd/qpid-data/pt_broker.1'.
Program terminated with signal 11, Segmentation fault.
#0  0x0053876c in memcpy () from /lib/libc.so.6
(gdb) bt
#0  0x0053876c in memcpy () from /lib/libc.so.6
#1  0x001e2e54 in std::string::_Rep::_M_clone (this=0x9306718, 
    __alloc=@0xb6ba0c1f, __res=0)
    at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/char_traits.h:269
#2  0x001e37b7 in basic_string (this=0xb6ba0c6c, __str=@0x9a9cec)
    at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/basic_string.h:219
#3  0x0098972c in qpid::management::Journal::getPackageName ()
   from /usr/lib/qpidd/libbdbstore.so
#4  0x00e39726 in ?? ()
#5  0xb6ba0c6c in ?? ()
#6  0x09355308 in ?? ()
#7  0x00000001 in ?? ()
#8  0x00002b9c in ?? ()
#9  0xb6ba0c98 in ?? ()
#10 0x00534030 in *__GI___libc_free (mem=0x9355308) at malloc.c:3545
#11 0x00982691 in qpid::management::Journal::writeStatistics ()
   from /usr/lib/qpidd/libbdbstore.so
#12 0x00e2afa3 in ?? ()
#13 0x09355308 in ?? ()
#14 0xb6ba0e50 in ?? ()
#15 0x00000000 in ?? ()
Current language:  auto; currently c


--- Additional comment from gsim@redhat.com on 2008-07-25 10:58:48 EDT ---

Interstingly packageName is a static string, and it seems to be when copying
that this problem occurs...

--- Additional comment from kim.vdriet@redhat.com on 2008-07-28 08:06:13 EDT ---

(In reply to comment #11)
> Interstingly packageName is a static string, and it seems to be when copying
> that this problem occurs...
But the function getPackageName() is itself not static (but it could be).



--- Additional comment from gsim@redhat.com on 2008-07-28 09:29:27 EDT ---

It appears that the statics in the qpid::management::Journal are deleted before
the destructor of qpid::management::ManagementBroker is called. As the timer
controllef by the ManagementBroker instance is not stopped until that instance
is deleted, this means the thread could still invoke methods on the Journal
instance it has registered and some of these, notably getPackageName, access now
deleted statics.

Either we need to ensure that the ManagementBroker instance is always deleted
before the statics or at least we must ensure that the thread it controls is
stooped before those statics are deleted.

--- Additional comment from gsim@redhat.com on 2008-07-28 09:30:21 EDT ---

Suggest either:

Index: src/qpidd.cpp
===================================================================
--- src/qpidd.cpp       (revision 680266)
+++ src/qpidd.cpp       (working copy)
@@ -272,6 +272,7 @@
             if (options->broker.port == 0)
                 cout << uint16_t(brokerPtr->getPort()) << endl;
             brokerPtr->run();
+            brokerPtr.reset();
             QPID_LOG(notice, "Shutting down.");
         }
         return 0;


or:

Index: src/qpid/management/ManagementBroker.cpp
===================================================================
--- src/qpid/management/ManagementBroker.cpp    (revision 680266)
+++ src/qpid/management/ManagementBroker.cpp    (working copy)
@@ -125,6 +125,7 @@

         broker->mExchange.reset ();
         broker->dExchange.reset ();
+        broker->timer.stop();
         agent.reset ();
     }
 }



--- Additional comment from gsim@redhat.com on 2008-07-28 10:00:38 EDT ---

Latter patch from above applied to qpid.0-10 as r680362.
Comment 2 Frantisek Reznicek 2008-09-08 06:08:57 EDT
No more qpidd segfaults observed during MRG_Messaging/qpid_testmatrix1 runs.
No more qpidd seqfaults at all observed during RHTS testing.
See RHTS jobs 28372, 28374, 28425-9, 28432.
Comment 4 Justin Ross 2011-06-27 15:57:47 EDT
Long since resolved; closing.

Note You need to log in before you can comment on or make changes to this bug.