Bug 518410 - Seg fault in clustered broker where ttl is used in conjunction with lvq
Summary: Seg fault in clustered broker where ttl is used in conjunction with lvq
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.6
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: 1.2
: ---
Assignee: mick
QA Contact: Jiri Kolar
URL:
Whiteboard:
: 521854 (view as bug list)
Depends On:
Blocks: 527551
TreeView+ depends on / blocked
 
Reported: 2009-08-20 09:50 UTC by Gordon Sim
Modified: 2018-10-20 04:23 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Messaging bug fix C: Messages with a time to live (TTL) value were sent to a last value queue (LVQ) on a clustered broker, and replaced older messages also with a TTL set C: The broker would experience a segfault and crash. F: This bug resulted from a partial application of svn rev 760087. Some code was omitted. The fix is to use all the code from that rev. R: TTL messages sent to a clustered broker no longer cause crashes. Messages with a time to live (TTL) value were sent to a last value queue (LVQ) on a clustered broker, and replaced older messages also with a TTL set. When this occurred, the broker would experience a segfault and crash. This bug occurred because some code was omitted from a patch. The entire patch has now been applied, and TTL messages sent to a clustered broker no longer cause crashes.
Clone Of:
Environment:
Last Closed: 2009-12-03 09:17:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch to fix problem (1.32 KB, patch)
2009-09-10 15:43 UTC, mick
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:1633 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.2 2009-12-03 09:15:33 UTC

Description Gordon Sim 2009-08-20 09:50:21 UTC
Description of problem:

Wehn messages with a ttl set are sent to an LVQ on a clustered broker (and replace previoues messages also with a ttl set).

Version-Release number of selected component (if applicable):

qpidd-0.5.752581-26.el5 (but likely present since 1.1.1)

How reproducible:

100%

Steps to Reproduce:
1. start clustered broker node
2. qpid-config add queue test-queue --order lvq
3. for i in `seq 1 10`; do echo "my-value-$i"; done | sender --ttl 5000 --lvq-match-value my-key
  
Actual results:

After about 5 seconds the broker crashes with a segfault.

Expected results:

Broker doesn't crash.

Additional info:

Backtrace from core:

#0  0x00000032f6eb7672 in __gnu_cxx::__exchange_and_add () from /usr/lib64/libstdc++.so.6
#1  0x00000032f4f4176b in qpid::broker::Message::setExpiryPolicy (this=<value optimized out>, e=<value optimized out>)
    at /usr/include/boost/detail/atomic_count_gcc.hpp:48
#2  0x00002b9dfbf4fe6b in qpid::cluster::ExpiryPolicy::deliverExpire (this=0xdff7b70, id=<value optimized out>)
    at qpid/cluster/ExpiryPolicy.cpp:71
#3  0x00000032fa3494b2 in qpid::framing::AMQP_AllOperations::ClusterHandler::Invoker::visit (this=<value optimized out>,
    body=<value optimized out>) at gen/qpid/framing/ClusterMessageExpiredBody.h:62
#4  0x00002b9dfbf161cf in qpid::framing::invoke<qpid::cluster::ClusterDispatcher> (target=<value optimized out>,
    body=<value optimized out>) at qpid/framing/Invoker.h:80
#5  0x00002b9dfbf0b3d8 in qpid::cluster::Cluster::processFrame (this=0xdff7c90, e=@0x446972a0, l=@0x44697310)
    at qpid/cluster/Cluster.cpp:431
#6  0x00002b9dfbf0d015 in qpid::cluster::Cluster::deliveredFrame (this=0xdff7c90, efConst=<value optimized out>)
    at qpid/cluster/Cluster.cpp:420
#7  0x00002b9dfbf1724a in boost::function1<void, qpid::cluster::EventFrame const&, std::allocator<void> >::operator() (
    this=0xffffffff, a0=@0xffffffff) at /usr/include/boost/function/function_template.hpp:576
#8  0x00002b9dfbf1e54b in qpid::cluster::PollableQueue<qpid::cluster::EventFrame>::handleBatch (this=0xdff8130,
    values=@0xdff81e8) at qpid/cluster/PollableQueue.h:52
#9  0x00002b9dfbf13e56 in boost::detail::function::function_obj_invoker1<boost::_bi::bind_t<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, boost::_mfi::mf1<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, qpid::cluster::PollableQueue<qpid::cluster::EventFrame>, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>, boost::_bi::list2<boost::_bi::value<qpid::cluster::PollableQueue<qpid::cluster::EventFrame>*>, boost::arg<1> > >, __gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&>::invoke (function_obj_ptr=<value optimized out>, a0=@0xffffffff)
    at /usr/include/boost/bind/mem_fn_template.hpp:149
#10 0x00002b9dfbf1702a in boost::function1<__gnu_cxx::__normal_iterator<qpid::cluster::EventFrame const*, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > >, std::vector<qpid::cluster::EventFrame, std::allocator<qpid::cluster::EventFrame> > const&, std::allocator<void> >::operator() (this=0xffffffff, a0=@0xffffffff)
    at /usr/include/boost/function/function_template.hpp:576
#11 0x00002b9dfbf1dc6c in qpid::sys::PollableQueue<qpid::cluster::EventFrame>::process (this=0xdff8130)
    at qpid/sys/PollableQueue.h:153
#12 0x00002b9dfbf1f6ed in qpid::sys::PollableQueue<qpid::cluster::EventFrame>::dispatch (this=0xdff8130, cond=@0xdff81a0)
    at qpid/sys/PollableQueue.h:138
#13 0x00000032fa37b10f in boost::function1<void, qpid::sys::PollableCondition&, std::allocator<boost::function_base> >::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:576
#14 0x00000032fa3c9d07 in boost::function1<void, qpid::sys::DispatchHandle&, std::allocator<boost::function_base> >::operator() (this=<value optimized out>, a0=<value optimized out>) at /usr/include/boost/function/function_template.hpp:576
---Type <return> to continue, or q <return> to quit---

Comment 1 mick 2009-09-10 15:42:01 UTC
This problem goes away in svn rev 760087.

But somehow, part of Alan delta in that rev got omitted from our -26 patch to svn rev 752581.  Which means that it got omitted from our 1.1.6 release.

I am attaching the patch that should be applied to rev 752581 *after* the -26 patch, to fix this problem.

Gordon's reproducer, in his comment above, was 100% effective for me always within 1 or 2 seconds.  After this patch the problem does not recur after ten iterations of his reproducer.

Comment 2 mick 2009-09-10 15:43:59 UTC
Created attachment 360524 [details]
patch to fix problem

this patch fixes the problem.

Comment 3 mick 2009-09-23 19:32:34 UTC
*** Bug 521854 has been marked as a duplicate of this bug. ***

Comment 4 mick 2009-09-23 19:59:16 UTC
The LVQ aspect is not causal.  If you just leave that out (use the same reproducer as above, but without the LVQ option) you get duplicate bug 521854.  Which has the same behavior and cure as this one.

Comment 5 Jiri Kolar 2009-10-14 11:37:13 UTC
Tested:
on -26 bug aapears (but only on x86_64)
on -28 has been fixed

validated on RHEL RHEL5-Server-U4 i386 / x86_64 

packages:

# rpm -qa | grep -E '(qpid|openais|rhm)' | sort -u

openais-0.80.6-8.el5
python-qpid-0.5.752581-3.el5
qpidc-0.5.752581-28.el5
qpidc-debuginfo-0.5.752581-28.el5
qpidc-devel-0.5.752581-28.el5
qpidc-rdma-0.5.752581-28.el5
qpidc-ssl-0.5.752581-28.el5
qpidd-0.5.752581-28.el5
qpidd-acl-0.5.752581-28.el5
qpidd-cluster-0.5.752581-28.el5
qpidd-devel-0.5.752581-28.el5
qpid-dotnet-0.4.738274-2.el5
qpidd-rdma-0.5.752581-28.el5
qpidd-ssl-0.5.752581-28.el5
qpidd-xml-0.5.752581-28.el5
qpid-java-client-0.5.751061-9.el5
qpid-java-common-0.5.751061-9.el5
rhm-0.5.3206-14.el5
rhm-docs-0.5.756148-1.el5
rh-tests-distribution-MRG-Messaging-qpid_common-1.5-15

->VERIFIED

Comment 6 Irina Boverman 2009-10-28 17:37:33 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The clustered broker is no longer crashing when ttl is used in conjunction with lvq. This crash was occurring when messages with a ttl were sent to an lvq on a clustered broker to replace previous messages also with a ttl set (518410)

Comment 7 Lana Brindley 2009-11-24 03:00:03 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1,8 @@
-The clustered broker is no longer crashing when ttl is used in conjunction with lvq. This crash was occurring when messages with a ttl were sent to an lvq on a clustered broker to replace previous messages also with a ttl set (518410)+Messaging bug fix
+
+C: Messages with a time to live (TTL) value were sent to a last value queue (LVQ) on a clustered broker, and replaced older messages also with a TTL set
+C: The broker would experience a segfault and crash.
+F: 
+R:
+
+MORE INFORMATION REQUIRED FOR RELNOTE

Comment 9 Lana Brindley 2009-12-01 23:30:32 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -2,7 +2,7 @@
 
 C: Messages with a time to live (TTL) value were sent to a last value queue (LVQ) on a clustered broker, and replaced older messages also with a TTL set
 C: The broker would experience a segfault and crash.
-F: 
-R:
+F: This bug resulted from a partial application of svn rev 760087. Some code was omitted. The fix is to use all the code from that rev.
+R: TTL messages sent to a clustered broker no longer cause crashes.
 
-MORE INFORMATION REQUIRED FOR RELNOTE+Messages with a time to live (TTL) value were sent to a last value queue (LVQ) on a clustered broker, and replaced older messages also with a TTL set. When this occurred, the broker would experience a segfault and crash. This bug occurred because some code was omitted from a patch. The entire patch has now been applied, and TTL messages sent to a clustered broker no longer cause crashes.

Comment 10 errata-xmlrpc 2009-12-03 09:17:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-1633.html


Note You need to log in before you can comment on or make changes to this bug.