Bug 861838 - Broker can delete a dynamic bridge upon error instead of attempting to recover
Summary: Broker can delete a dynamic bridge upon error instead of attempting to recover
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: Development
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: 2.3
: ---
Assignee: Chuck Rolke
QA Contact: Leonid Zhaldybin
URL:
Whiteboard:
Depends On:
Blocks: 698367
TreeView+ depends on / blocked
 
Reported: 2012-10-01 03:23 UTC by Jason Dillaman
Modified: 2014-11-09 22:38 UTC (History)
5 users (show)

Fixed In Version: qpid-cpp-0.18-4
Doc Type: Bug Fix
Doc Text:
Cause: Dynamic bridges are improperly destroyed after a binding error. This often occurs during a broker restart when resources are being recreated sequentially and bindings cannot succeed until all the resources have been recreated. Consequence: Configured bridges are lost instead of being recovered during a maintenance cycle. Fix: Do not delete the dynamic bridge after a binding error. Result: After required resources have been restored then the dynamic bridge is created properly during a periodic retry.
Clone Of:
Environment:
Last Closed: 2013-03-06 18:52:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Quick patch to prevent the bridge from being destroyed (735 bytes, patch)
2012-10-01 17:52 UTC, Jason Dillaman
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2013:0561 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise MRG Messaging 2.3 security update 2013-03-06 23:48:13 UTC

Description Jason Dillaman 2012-10-01 03:23:18 UTC
Description of problem:
If a dynamic bridge's session has been detached while attempting to propagate a binding event, the broker will delete the bridge.  Normally, a detached bridge session will be automatically recovered during the maintenance periodic if possible. Needless to say, auto-deleting the bridge upon a session error prevents this normal recovery path from occurring.

This event can occur in a production system during broker startup/federation and also during source broker recovery since there is a potential race condition between creation of the source exchange and the creation of the dynamic bridge on the destination broker.

Log Message:
Sep 30 19:41:40 localhost qpidd[10497]: 2012-09-30 19:41:40 [Broker] error Cannot propagate binding for dynamic bridge as session has been detached, deleting dynamic bridge

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
100%

Steps to Reproduce:
1. Create a dynamic bridge between two brokers. Destination broker should have a valid destination exchange but the source broker should be missing the source exchange.
2. Create a new binding on the destination exchange.
  
Actual results:
Bridge is deleted because the session was previously detached due to the missing exchange.

Expected results:
After the source exchange is created, session error is recovered during the maintenance periodic and the binding event properly propagates.

Additional info:

Comment 1 Jason Dillaman 2012-10-01 17:52:36 UTC
Created attachment 619918 [details]
Quick patch to prevent the bridge from being destroyed

Comment 2 Chuck Rolke 2012-10-02 20:10:36 UTC
What does one do to have the session appear detached?

With these commands the brokers retry until the source exchange is created and then things proceeed normally.

# src broker: localhost:5801
# dst broker: localhost:5803
#
# Create exchange in dst broker
#
qpid-config -b localhost:5803 add exchange topic fed.topic

#
# create dynamic bridge
#
qpid-route dynamic add localhost:5803 localhost:5801 fed.topic

#
# create dst queue as bind target
#
qpid-config -b localhost:5803 add queue fed.topic.queue

#
# create binding on dest exchange
#
qpid-config -b localhost:5803 bind fed.topic fed.topic.queue

Comment 3 Jason Dillaman 2012-10-11 15:13:57 UTC
There is the potential that the session is not invalidated when you create the binding (might be in the process of recovering via the link maintenance interval -- which I believe you can increase). Recommend that you continue to send binding events until you see the log message above.

Comment 4 Chuck Rolke 2012-10-18 20:46:53 UTC
Proposed patch committed upstream QPID-4378, r1399837. Checked by Ted Ross.

I never reproduced the bug through normal session errors but simply binding and unbinding caused an issue that this patch corrects.

Comment 6 Chuck Rolke 2012-10-21 22:50:48 UTC
Code prints a warning message when links are fine and a bridge is unbound. This is wrong. No warning is required and the else clause of the original patch should be to do nothing.

This is included in QPID-4378, r1400736.

Comment 8 Leonid Zhaldybin 2012-11-28 13:22:47 UTC
Tested on RHEL5.9 and RHEL6.3, both i386 and x86_64. The broker does not delete a dynamic bridge if its session is detached.

Packages used for testing:

RHEL5.9
python-qpid-0.18-4.el5
python-qpid-qmf-0.18-9.el5
qpid-cpp-client-0.18-10.el5
qpid-cpp-client-devel-0.18-10.el5
qpid-cpp-client-ssl-0.18-10.el55
qpid-cpp-server-0.18-10.el5
qpid-cpp-server-cluster-0.18-10.el5
qpid-cpp-server-devel-0.18-10.el5
qpid-cpp-server-ha-0.18-10.el5
qpid-cpp-server-ssl-0.18-10.el5
qpid-cpp-server-store-0.18-10.el5
qpid-cpp-server-xml-0.18-10.el5
qpid-java-client-0.18-5.el5
qpid-java-common-0.18-5.el5
qpid-java-example-0.18-5.el5
qpid-qmf-0.18-9.el5
qpid-qmf-devel-0.18-9.el5
qpid-tools-0.18-7.el5

RHEL6.3
python-qpid-0.18-4.el6
python-qpid-qmf-0.18-10.el6_3
qpid-cpp-client-0.18-10.el6_3
qpid-cpp-client-devel-0.18-10.el6_3
qpid-cpp-client-ssl-0.18-10.el6_3
qpid-cpp-server-0.18-10.el6_3
qpid-cpp-server-cluster-0.18-10.el6_3
qpid-cpp-server-devel-0.18-10.el6_3
qpid-cpp-server-ha-0.18-10.el6_3
qpid-cpp-server-ssl-0.18-10.el6_3
qpid-cpp-server-store-0.18-10.el6_3
qpid-cpp-server-xml-0.18-10.el6_3
qpid-java-client-0.18-5.el6
qpid-java-common-0.18-5.el6
qpid-java-example-0.18-5.el6
qpid-qmf-0.18-10.el6_3
qpid-qmf-devel-0.18-10.el6_3
qpid-tools-0.18-7.el6_3.noarch

-> VERIFIED

Comment 10 errata-xmlrpc 2013-03-06 18:52:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0561.html


Note You need to log in before you can comment on or make changes to this bug.