Bug 1713560

Summary: qpidd segfault when processing QMF message from closed connection
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: Mike Cressman <mcressma>
Status: CLOSED ERRATA QA Contact: Zdenek Kraus <zkraus>
Severity: high Docs Contact:
Priority: high    
Version: 3.2CC: crolke, jfrancin, jross, mcressma, zkraus
Target Milestone: 3.2.13   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-1.36.0-22 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-15 07:54:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
reproducer client none

Description Pavel Moravec 2019-05-24 06:50:21 UTC
Description of problem:
User story: when running concurrently 2 times a program that:
1) Creates a queue on the broker "HelloQueue"
2) Creates a second queue called "HelloQueue.AutoDelete" with auto-delete set and alternate exchange set to "qmf.default.direct" and hold open the Receiver that is subscribed to it.
3) Puts a QMF message into the "HelloQueue.AutoDelete" queue that will delete the "HelloQueue" queue when it is processed.
4) Waits 10 seconds.
5) Closes the receiver, triggering the auto-delete of "HelloQueue.AutoDelete".

Then the QMF message will be sent to "qmf.default.direct" because of the alternate exchange, resulting in the deletion of "HelloQueue" regardless of whether or not there are other subscribers connected to it. And with some high probability, the 2nd QMF request from just dropped connection will attempt to be processed, but causes segfault.


Version-Release number of selected component (if applicable):
qpid-cpp 1.36.0-15 (or -21 or -21+hf2), I expect any


How reproducible:
75% in my case


Steps to Reproduce:
1. Compile attached program.
2. qpidd &
3. ./QmfBrokerCrashRepro localhost:5672 & ./QmfBrokerCrashRepro localhost:5672 &


Actual results:
client program aborts every time (unhandled exception, no deal), but very often qpidd segfaults as well, with backtrace:

(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f9b5cdca752 in qpid::management::(anonymous namespace)::ScopedManagementContext::getUserId (this=<value optimized out>)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/management/ManagementAgent.cpp:105
#2  0x00007f9b5cde8055 in qpid::management::ManagementAgent::dispatchAgentCommand (this=0x1680930, msg=..., viaLocal=true)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/management/ManagementAgent.cpp:2347
#3  0x00007f9b5cde8958 in qpid::management::ManagementAgent::dispatchCommand (this=0x1680930, deliverable=<value optimized out>, routingKey="broker", topic=false, qmfVersion=2)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/management/ManagementAgent.cpp:1255
#4  0x00007f9b5cdfb219 in qpid::broker::ManagementDirectExchange::route (this=0x168b6f0, msg=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/management/ManagementDirectExchange.cpp:48
#5  0x00007f9b5cccfa2a in qpid::broker::Exchange::routeWithAlternate (this=0x168b768, msg=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Exchange.cpp:410
#6  0x00007f9b5ccfddb5 in qpid::broker::Queue::reroute (e=<value optimized out>, m=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:1761
#7  0x00007f9b5ccfe006 in qpid::broker::Queue::abandoned (this=0x16ba740, message=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:1156
#8  0x00007f9b5ccf16cd in operator() (this=0x16ba740, maxCount=0, p=..., f=..., type=<value optimized out>, triggerAutoDelete=false, maxTests=0)
    at /usr/include/boost/function/function_template.hpp:1013
#9  qpid::broker::Queue::remove (this=0x16ba740, maxCount=0, p=..., f=..., type=<value optimized out>, triggerAutoDelete=false, maxTests=0)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:795
#10 0x00007f9b5ccf49d5 in qpid::broker::Queue::destroyed (this=0x16ba740) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:1167
#11 0x00007f9b5cd73b09 in qpid::broker::QueueRegistry::destroyIfUntouched (this=0x167f2f8, targetQ=<value optimized out>, version=<value optimized out>, connectionId="", userId="")
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/QueueRegistry.cpp:156
#12 0x00007f9b5ccee336 in qpid::broker::Queue::tryAutoDelete (this=0x16ba740, expectedVersion=1) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:1358
#13 0x00007f9b5ccee834 in qpid::broker::Queue::scheduleAutoDelete (this=0x16ba740, immediate=false) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:1342
#14 0x00007f9b5ccef626 in qpid::broker::Queue::cancel (this=0x16ba740, c=..., connectionId="qpid.[::1]:5672-[::1]:54658", userId="anonymous@QPID")
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/Queue.cpp:638
#15 0x00007f9b5cd90eca in qpid::broker::SemanticState::cancel (this=0x7f9b4c00a078, c=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/SemanticState.cpp:475
#16 0x00007f9b5cd98775 in qpid::broker::SemanticState::closed (this=0x7f9b4c00a078) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/SemanticState.cpp:111
#17 0x00007f9b5cdb0301 in qpid::broker::SessionState::~SessionState (this=0x7f9b4c009eb0, __in_chrg=<value optimized out>)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/SessionState.cpp:107
#18 0x00007f9b5cdb08a9 in qpid::broker::SessionState::~SessionState (this=0x7f9b4c009eb0, __in_chrg=<value optimized out>)
    at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/SessionState.cpp:110
#19 0x00007f9b5cdb5c44 in ~auto_ptr (this=0x7f9b4c009d00) at /usr/include/c++/4.4.7/backward/auto_ptr.h:168
#20 qpid::broker::SessionHandler::handleDetach (this=0x7f9b4c009d00) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/SessionHandler.cpp:110
#21 0x00007f9b5cd1b564 in qpid::broker::amqp_0_10::Connection::closed (this=0x7f9b4c003e30) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/broker/amqp_0_10/Connection.cpp:378
#22 0x00007f9b5c7f374d in qpid::sys::AsynchIOHandler::disconnect (this=0x168f270) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/AsynchIOHandler.cpp:201
#23 0x00007f9b5c7f3ca9 in qpid::sys::AsynchIOHandler::eof (this=0x168f270, a=<value optimized out>) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/AsynchIOHandler.cpp:184
#24 0x00007f9b5c770e3a in operator() (this=0x168fc90, h=...) at /usr/include/boost/function/function_template.hpp:1013
#25 qpid::sys::posix::AsynchIO::readable (this=0x168fc90, h=...) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/posix/AsynchIO.cpp:486
#26 0x00007f9b5c7f79e3 in boost::function1<void, qpid::sys::DispatchHandle&>::operator() (this=<value optimized out>, a0=<value optimized out>)
    at /usr/include/boost/function/function_template.hpp:1013
#27 0x00007f9b5c7f6676 in qpid::sys::DispatchHandle::processEvent (this=0x168fc98, type=qpid::sys::Poller::READABLE) at /usr/src/debug/qpid-cpp-1.36.0/src/qpid/sys/DispatchHandle.cpp:280
..

Here, the context (of type qpid::broker::amqp_0_10::Connection) points to the 2nd client connection that was dropped. Qpid trace logs show the connection was already closed and its management object deleted - but a reference still kept due to this QMF method..?


Expected results:
no segfault


Additional info:

Comment 1 Pavel Moravec 2019-05-24 06:55:01 UTC
Created attachment 1572802 [details]
reproducer client

Comment 2 Pavel Moravec 2019-05-24 07:19:43 UTC
Even simplier reproducer:

- have the auto-del queue with timeout (have in the code ".. auto-delete:True, arguments:{'qpid.auto_delete_timeout':10}" )
- run the client program just once

Explanation:
- connection from the client will be gone for some time when auto-del will happen
- so the re-routed message to QMF exchange will refer to invalid connection

Simply, dealing with QMF methods and requests does not count with already closed connections.

Comment 3 Chuck Rolke 2019-05-31 17:04:18 UTC
Research shows in function Queue::remove (stack frame #9) there is a comment by Gordon Sim in 2012:
         
    if (f) f(*i);//ERROR? need to clear old persistent context?

Clearing the message's publisher context seems to avoid the crash.

Comment 4 Mike Cressman 2019-06-11 21:17:19 UTC
Fix now upstream, a bit different from the first proposal: see https://issues.apache.org/jira/browse/QPID-8319

Comment 5 Pavel Moravec 2019-06-17 07:10:00 UTC
Testing scratch build http://brew-task-repos.usersys.redhat.com/repos/scratch/mcressma/qpid-cpp/1.36.0/22.el6/qpid-cpp-1.36.0-22.el6-scratch.repo :

no segfault hit in repeated tests, BZ seems fixed.

Comment 8 Zdenek Kraus 2019-07-08 11:31:19 UTC
Tested on RHEL 6 and 7 with following packages:

qpid-cpp-server-1.36.0-22

fix work as expected.

->VERIFIED

Comment 10 errata-xmlrpc 2019-07-15 07:54:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1770