Bug 456454
| Summary: | qpidd segfault during RHTS run | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Jeff Needle <jneedle> | ||||||||
| Component: | qpid-cpp | Assignee: | Kim van der Riet <kim.vdriet> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Kim van der Riet <kim.vdriet> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 1.0 | CC: | davids, freznice | ||||||||
| Target Milestone: | 1.0.1 | ||||||||||
| Target Release: | --- | ||||||||||
| Hardware: | All | ||||||||||
| OS: | Linux | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2008-10-06 19:08:21 UTC | Type: | --- | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | |||||||||||
| Bug Blocks: | 460113 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Jeff Needle
2008-07-23 19:36:43 UTC
Created attachment 312515 [details]
qpidd core dump
qpidd.debug doesn't give me much more. Deferring to the experts. Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir /tmp/rhts_qpidd/qpid-data/pt_broker.8'. This seems to only happen on RHEL5 (i386 and x86_64). RHEL4 do not seem to get in such troubles. It happens on boxes with 8 CPU cores. Wild guess (based on earlier chat with Andrew): Could it be connected to pthread libraries? Different pthread versions on RHEL4 and RHEL5? Stack trace from david: #0 0x0053876c in memcpy () from /lib/libc.so.6 #1 0x001e2e54 in std::string::_Rep::_M_clone () from /usr/lib/libstdc++.so.6 #2 0x001e37b7 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6 #3 0x0098972c in qpid::management::Journal::getPackageName () from /usr/lib/qpidd/libbdbstore.so #4 0x00e39726 in ?? () #5 0xb6ba0c6c in ?? () #6 0x09355308 in ?? () #7 0x00000001 in ?? () #8 0x00002b9c in ?? () #9 0xb6ba0c98 in ?? () #10 0x00534030 in free () from /lib/libc.so.6 #11 0x00982691 in qpid::management::Journal::writeStatistics () from /usr/lib/qpidd/libbdbstore.so #12 0x00e2afa3 in ?? () #13 0x09355308 in ?? () #14 0xb6ba0e50 in ?? () #15 0x00000000 in ?? () On this last run, we got 2 core files. Only one of them gave as much as the comment #5 backtrace. Both cores are equal on #0, and that's the only similarity. #0 0x005cf76c in memcpy () from /lib/libc.so.6 #1 0x00504874 in ?? () #2 0xa7a73014 in ?? () #3 0x0888b724 in ?? () #4 0x0888ba30 in ?? () #5 0x00557ff4 in ?? () #6 0x0888b718 in ?? () #7 0x08892c98 in ?? () #8 0xb6b8fb48 in ?? () #9 0x005051d7 in ?? () #10 0x0888b718 in ?? () #11 0xb6b8fb3f in ?? () #12 0x00000000 in ?? () Created attachment 312556 [details]
Core from 5.2 i386 run
Created attachment 312557 [details]
Second qpidd core from 5.2 i386 run
A partial backtrace of the attached core file (from #1 above) shows: #0 0x00002b0f7687844b in ?? () #1 0x00002b0f761082b0 in ?? () #2 0x0000000005ed0100 in ?? () #3 0x0000000043145700 in ?? () #4 0x00000000431459e0 in ?? () #5 0x00002b0f761089af in ?? () #6 0x000000000000003d in ?? () #7 0x0000000000000024 in ?? () #8 0x0000000000610838 in std::string::_Rep::_S_empty_rep_storage () #9 0x0000000043145700 in ?? () #10 0x0000000005f2b600 in ?? () #11 0x00002b0f7725ea90 in qpid::management::Journal::getPackageName () from /usr/lib64/qpidd/libbdbstore.so #12 0x00002b0f741728aa in qpid::management::ManagementObject::writeTimestamps (this=0x2aaab4000010, buf=@0x0) at qpid/management/ManagementObject.cpp:32 #13 0x00002b0f7725851d in qpid::management::Journal::writeStatistics () from /usr/lib64/qpidd/libbdbstore.so #14 0x00002b0f7416735b in qpid::management::ManagementBroker::PeriodicProcessing (this=0x2aaaaaaab010) at qpid/management/ManagementBroker.cpp:314 #15 0x00002b0f74167938 in qpid::management::ManagementBroker::Periodic::fire (this=0x2aaaac0563d0) at qpid/management/ManagementBroker.cpp:181 #16 0x00002b0f741575f5 in qpid::broker::Timer::run (this=0x2aaaaaaab138) at qpid/broker/Timer.cpp:64 #17 0x00002b0f74500cda in qpid::sys::Thread::runRunnable (p=0x2aaab44fbea8) at qpid/sys/posix/Thread.cpp:27 #18 0x00002b0f76b572f7 in ?? () #19 0x0000000000000000 in ?? () On the surface, this looks like a thread timing issue in management - ie a timer is firing and making a call on a non-existent or deleted journal management object (or part of an object, the crash seems to be happening on a std::string operation of some sort) through qpid::management::Journal::writeStatistics() Playing the "Let's randomly install debuginfo packages until this is useful"
game (added gcc-debuginfo, glibc-debuginfo, and glibc-debuginfo-common) yields
this somewhat more useful trace for core.11161. Fingers are starting to point
in Ted's general direction here...
Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir
/tmp/rhts_qpidd/qpid-data/pt_broker.1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0053876c in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x0053876c in memcpy () from /lib/libc.so.6
#1 0x001e2e54 in std::string::_Rep::_M_clone (this=0x9306718,
__alloc=@0xb6ba0c1f, __res=0)
at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/char_traits.h:269
#2 0x001e37b7 in basic_string (this=0xb6ba0c6c, __str=@0x9a9cec)
at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/basic_string.h:219
#3 0x0098972c in qpid::management::Journal::getPackageName ()
from /usr/lib/qpidd/libbdbstore.so
#4 0x00e39726 in ?? ()
#5 0xb6ba0c6c in ?? ()
#6 0x09355308 in ?? ()
#7 0x00000001 in ?? ()
#8 0x00002b9c in ?? ()
#9 0xb6ba0c98 in ?? ()
#10 0x00534030 in *__GI___libc_free (mem=0x9355308) at malloc.c:3545
#11 0x00982691 in qpid::management::Journal::writeStatistics ()
from /usr/lib/qpidd/libbdbstore.so
#12 0x00e2afa3 in ?? ()
#13 0x09355308 in ?? ()
#14 0xb6ba0e50 in ?? ()
#15 0x00000000 in ?? ()
Current language: auto; currently c
Interstingly packageName is a static string, and it seems to be when copying that this problem occurs... (In reply to comment #11) > Interstingly packageName is a static string, and it seems to be when copying > that this problem occurs... But the function getPackageName() is itself not static (but it could be). It appears that the statics in the qpid::management::Journal are deleted before the destructor of qpid::management::ManagementBroker is called. As the timer controllef by the ManagementBroker instance is not stopped until that instance is deleted, this means the thread could still invoke methods on the Journal instance it has registered and some of these, notably getPackageName, access now deleted statics. Either we need to ensure that the ManagementBroker instance is always deleted before the statics or at least we must ensure that the thread it controls is stooped before those statics are deleted. Suggest either:
Index: src/qpidd.cpp
===================================================================
--- src/qpidd.cpp (revision 680266)
+++ src/qpidd.cpp (working copy)
@@ -272,6 +272,7 @@
if (options->broker.port == 0)
cout << uint16_t(brokerPtr->getPort()) << endl;
brokerPtr->run();
+ brokerPtr.reset();
QPID_LOG(notice, "Shutting down.");
}
return 0;
or:
Index: src/qpid/management/ManagementBroker.cpp
===================================================================
--- src/qpid/management/ManagementBroker.cpp (revision 680266)
+++ src/qpid/management/ManagementBroker.cpp (working copy)
@@ -125,6 +125,7 @@
broker->mExchange.reset ();
broker->dExchange.reset ();
+ broker->timer.stop();
agent.reset ();
}
}
Latter patch from above applied to qpid.0-10 as r680362. No more qpidd segfaults observed during MRG_Messaging/qpid_testmatrix1 runs. No more qpidd seqfaults at all observed during RHTS testing. See RHTS jobs 28372, 28374, 28425-9, 28432. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0640.html |