Bug 468821

Summary: SIGSEGV running kolchak (from qpid::sys::SystemInfo::getLocalIpAddresses)
Product: Red Hat Enterprise MRG Reporter: Gordon Sim <gsim>
Component: qpid-cppAssignee: Andrew Stitcher <astitcher>
Status: CLOSED DUPLICATE QA Contact: Kim van der Riet <kim.vdriet>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.0CC: astitcher, duck
Target Milestone: 1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-11-18 14:24:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Gordon Sim 2008-10-28 09:15:51 UTC
Taken from email from freznice:

A reached SIGSEGV after 3.23 days running Kolchak on dell-pesc420-01.rhts.bos.redhat.com, RHEL52_i386.
MRG was build from trunk on 'Oct 23 10:17' Boston timezone

Shorten log:
2008-oct-25 01:08:17 warning SASL: No Authentication Performed
==9801== 
==9801== Process terminating with default action of signal 11 (SIGSEGV)
==9801==  Access not within mapped region at address 0x74588C3C2
==9801==    at 0x3897443704: vfprintf (in /lib64/libc-2.5.so)
==9801==    by 0x38974693C9: vsnprintf (in /lib64/libc-2.5.so)
==9801==    by 0x389744D052: snprintf (in /lib64/libc-2.5.so)
==9801==    by 0x38974E6F6A: inet_ntoa (in /lib64/libc-2.5.so)
==9801==    by 0x538A19A: qpid::sys::SystemInfo::getLocalIpAddresses(unsigned short, std::vector<qpid::Address, std::allocator<qpid::Address> >&) (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x53A748B: qpid::Url::getIpAddressesUrl(unsigned short) (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x4DAF874: qpid::broker::Broker::getKnownBrokersImpl() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DBAC47: boost::_mfi::mf0<std::vector<qpid::Url, std::allocator<qpid::Url> >, qpid::broker::Broker>::operator()(qpid::broker::Broker*) const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DBACF6: std::vector<qpid::Url, std::allocator<qpid::Url> > boost::_bi::list1<boost::_bi::value<qpid::broker::Broker*> >::operator()<std::vector<qpid::Url, std::allocator<qpid::Url> >, boost::_mfi::mf0<std::allocator<qpid::Url>, qpid::broker::Broker>, boost::_bi::list0>(boost::_bi::type<boost::_mfi::mf0<std::allocator<qpid::Url>, qpid::broker::Broker> >, boost::_mfi::mf0<std::allocator<qpid::Url>, qpid::broker::Broker>&, boost::_bi::list0&, long) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DBAD43: boost::_bi::bind_t<std::vector<qpid::Url, std::allocator<qpid::Url> >, boost::_mfi::mf0<std::vector<qpid::Url, std::allocator<qpid::Url> >, qpid::broker::Broker>, boost::_bi::list1<boost::_bi::value<qpid::broker::Broker*> > >::operator()() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DBAD71: boost::detail::function::function_obj_invoker0<boost::_bi::bind_t<std::vector<qpid::Url, std::allocator<qpid::Url> >, boost::_mfi::mf0<std::vector<qpid::Url, std::allocator<qpid::Url> >, qpid::broker::Broker>, boost::_bi::list1<boost::_bi::value<qpid::broker::Broker*> > >, std::vector<qpid::Url, std::allocator<qpid::Url> > >::invoke(boost::detail::function::any_pointer) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DEA6FF: boost::function0<std::vector<qpid::Url, std::allocator<qpid::Url> >, std::allocator<void> >::operator()() const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801== 
==9801== ERROR SUMMARY: 30 errors from 15 contexts (suppressed: 36 from 1)
==9801== 
==9801== 2 errors in context 1 of 15:
==9801== Use of uninitialised value of size 8
==9801==    at 0x4E15874: qpid::sys::operator<(qpid::sys::AbsTime const&, qpid::sys::AbsTime const&) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4E13F1E: qpid::broker::Message::hasExpired() const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DD1571: qpid::broker::Queue::purgeExpired() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDD964: void boost::_mfi::mf0<void, qpid::broker::Queue>::call<boost::shared_ptr<qpid::broker::Queue> const>(boost::shared_ptr<qpid::broker::Queue> const&, void const*) const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDD988: void boost::_mfi::mf0<void, qpid::broker::Queue>::operator()<boost::shared_ptr<qpid::broker::Queue> const>(boost::shared_ptr<qpid::broker::Queue> const&) const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDD9CF: void boost::_bi::list1<boost::arg<1> >::operator()<boost::_mfi::mf0<void, qpid::broker::Queue>, boost::_bi::list1<boost::shared_ptr<qpid::broker::Queue> const&> >(boost::_bi::type<void>, boost::_mfi::mf0<void, qpid::broker::Queue>&, boost::_bi::list1<boost::shared_ptr<qpid::broker::Queue> const&>&, int) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDDA15: void boost::_bi::bind_t<void, boost::_mfi::mf0<void, qpid::broker::Queue>, boost::_bi::list1<boost::arg<1> > >::operator()<boost::shared_ptr<qpid::broker::Queue> const>(boost::shared_ptr<qpid::broker::Queue> const&) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDDAFA: void qpid::broker::QueueRegistry::eachQueue<boost::_bi::bind_t<void, boost::_mfi::mf0<void, qpid::broker::Queue>, boost::_bi::list1<boost::arg<1> > > >(boost::_bi::bind_t<void, boost::_mfi::mf0<void, qpid::broker::Queue>, boost::_bi::list1<boost::arg<1> > >) const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDD4DD: qpid::broker::QueueCleaner::fired() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DDD558: qpid::broker::QueueCleaner::Task::fire() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4E5D004: qpid::broker::Timer::run() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x5389197: qpid::sys::(anonymous namespace)::runRunnable(void*) (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801== 
==9801== 2 errors in context 2 of 15:
==9801== Thread 1:
==9801== Use of uninitialised value of size 8
==9801==    at 0x5F32711: __pthread_mutex_unlock_usercnt (in /lib64/libpthread-2.5.so)
==9801==    by 0x4D76BE8: qpid::sys::Mutex::unlock() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DD9EB8: qpid::sys::ScopedLock<qpid::sys::Monitor>::~ScopedLock() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DD9F74: qpid::broker::PersistableMessage::isEnqueueComplete() (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4DCEFED: qpid::broker::Queue::getMessageCount() const (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x4E46D6C: qpid::broker::SessionAdapter::QueueHandlerImpl::query(std::string const&) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
==9801==    by 0x53062FF: qpid::framing::QueueQueryResult qpid::framing::QueueQueryBody::invoke<qpid::framing::AMQP_ServerOperations::QueueHandler>(qpid::framing::AMQP_ServerOperations::QueueHandler&) const (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x53013F9: qpid::framing::AMQP_ServerOperations::QueueHandler::Invoker::visit(qpid::framing::QueueQueryBody const&) (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x53694C7: qpid::framing::QueueQueryBody::accept(qpid::framing::MethodBodyConstVisitor&) const (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x5302730: qpid::framing::AMQP_ServerOperations::Invoker::visit(qpid::framing::QueueQueryBody const&) (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x53694C7: qpid::framing::QueueQueryBody::accept(qpid::framing::MethodBodyConstVisitor&) const (in /root/mrg_installed/lib/libqpidcommon.so.0.1.0)
==9801==    by 0x4E56103: qpid::framing::Invoker::Result qpid::framing::invoke<qpid::broker::SessionAdapter>(qpid::broker::SessionAdapter&, qpid::framing::AMQMethodBody const&) (in /root/mrg_installed/lib/libqpidbroker.so.0.1.0)
...

Full report can be found in mrg5.lab.bos.redhat.com:/root/qpid_test_kolchak_VALGRIND_RHEL52_i386_fails081027.tar.gz
see tail of qpid_test_kolchak/qpidd.log file

Comment 1 Gordon Sim 2008-10-30 11:26:24 UTC
Also tracked by https://issues.apache.org/jira/browse/QPID-1415.

Comment 2 Andrew Stitcher 2008-11-12 21:48:04 UTC
It's not clear to me that this represents a real bug in qpidd as the end of the log seems to suggest that this might be a bug in valgrind itself:

--9801-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--9801-- si_code=80;  Faulting address: 0x0;  sp: 0x404EA2628

valgrind: m_signals.c:1772 (sync_signalhandler): Assertion 'tid != 0' failed.
==9801==    at 0x380176D7: report_and_quit (m_libcassert.c:136)
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???
==9801==    by 0xDEADBEEFDEADBEEE: ???

Comment 3 Andrew Stitcher 2008-11-12 21:50:08 UTC
So I'd suggest running the same soak test *without* valgrind and with core file size set to unlimited.

Then if we get a similar SIGSEGV and stack trace we should investigate again.

Comment 4 Gordon Sim 2008-11-13 12:54:56 UTC
The fact that the stack trace (first one) has getKnownBrokersImpl() near the top makes me suspect that the fix to BZ471247 *may* resolve this in practice. I can't quite work out why concurrent calls to qpid::sys::SystemInfo::getLocalIpAddresses would cause problems, so haven't marked this as a dup.

Comment 5 Andrew Stitcher 2008-11-18 14:24:25 UTC
qpid::sys::SystemInfo::getLocalIpAddresses contains a call to inet_ntoa() which returns its result in a statically allocated buffer (and this indeed) is where the crash occurs so multiple simultaneous calls to getLocalAddresses will indeed cause problems.

*** This bug has been marked as a duplicate of bug 471247 ***