Bug 466822 - When using RDMA, topic exchange can cause deadlocks
Summary: When using RDMA, topic exchange can cause deadlocks
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
urgent
medium
Target Milestone: 1.1
: ---
Assignee: Gordon Sim
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-13 20:53 UTC by Gordon Sim
Modified: 2009-02-04 15:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 15:35:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch to rearrange the locking order in TopicExchange::route() (1.70 KB, patch)
2008-11-26 20:52 UTC, Andrew Stitcher
no flags Details | Diff
Test program that fails to reproduce the bug (6.92 KB, text/x-c++src)
2008-11-26 21:01 UTC, Andrew Stitcher
no flags Details
Test program to trigger deadlock (5.74 KB, text/x-c++src)
2008-11-27 16:30 UTC, Gordon Sim
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0035 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging 1.1 Release 2009-02-04 15:33:44 UTC

Description Gordon Sim 2008-10-13 20:53:30 UTC
TopicExchange holds a read lock while routing a message. When RDMA is in use, it is possible that a command on another connection is handled on that same thread before returning from route, and if this then tries to add/remove a binding from the topic exchange it will deadlock.

Comment 1 Frantisek Reznicek 2008-10-31 16:21:44 UTC
No test info. Putting NEEDINFO flag.

Comment 2 Andrew Stitcher 2008-11-07 15:30:29 UTC
The original bug is theoretical based on code examination - When I succeed in getting a good reproducer I'll attach it here.

[This is also a gate on the bug fix]

Comment 3 Andrew Stitcher 2008-11-26 20:52:39 UTC
Created attachment 324797 [details]
Patch to rearrange the locking order in TopicExchange::route()

Comment 4 Andrew Stitcher 2008-11-26 20:54:43 UTC
I have produced a fix for this putative bug and attached it here. It builds and passes the standard checks.

It also doesn't cause a noticeable regression when runnning perftest through the topic exchange.

I attach it here.

It has not been applied because I am unable to actually reproduce the bug itself.

I attach here also a program which attempts (but fails) to reproduce the bug.

Comment 5 Andrew Stitcher 2008-11-26 21:01:18 UTC
Created attachment 324799 [details]
Test program that fails to reproduce the bug

It does run for a very long time though and uses a lot of CPU in the broker (on a single CPU as there is only one connection)

Comment 6 Andrew Stitcher 2008-11-26 21:04:24 UTC
Gordon - Any more thoughts on this "bug"

Comment 7 Gordon Sim 2008-11-27 16:30:34 UTC
Created attachment 324903 [details]
Test program to trigger deadlock

To hit the deadlock we need to have concurrent transfers though- and bind/unbind operations on- the topic exchange. The attached topic_test uses a single connection and would never hit the issue unless multiple instances were run concurrently (with appropriate staggering between them such that the start up and end pases of one run would be concurrent with the transfers of another).

The attached test case targets the problem more directly and resulted in the suspected deadlock on mrg12. Will now test with the patch applied.

Comment 8 Gordon Sim 2008-11-27 17:10:51 UTC
Running my test case with --messages 50000 seemed to reliably (5 deadlocks out of 5 attempts) cause the deadlock (the default, 10000, passed on one occasion).

With patch applied it passed 5 out of 5 tries of the same; patch commited as r 721243.

Fyi, in reproducing the IB interface must be used as the broker address. E.g. on mrg12: ./bz466822  -b 192.168.10.36 --messages 50000

Comment 10 David Sommerseth 2008-12-08 20:08:24 UTC
Tried to reproduce deadlock, without luck.  Based on RPMs from SVN r722891.  Test verified on mrg14

Comment 12 errata-xmlrpc 2009-02-04 15:35:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html


Note You need to log in before you can comment on or make changes to this bug.