Bug 861838

Summary: Broker can delete a dynamic bridge upon error instead of attempting to recover
Product: Red Hat Enterprise MRG Reporter: Jason Dillaman <jdillama>
Component: qpid-cppAssignee: Chuck Rolke <crolke>
Status: CLOSED ERRATA QA Contact: Leonid Zhaldybin <lzhaldyb>
Severity: unspecified Docs Contact:
Priority: high    
Version: DevelopmentCC: esammons, freznice, jross, lzhaldyb, mcressma
Target Milestone: 2.3   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-4 Doc Type: Bug Fix
Doc Text:
Cause: Dynamic bridges are improperly destroyed after a binding error. This often occurs during a broker restart when resources are being recreated sequentially and bindings cannot succeed until all the resources have been recreated. Consequence: Configured bridges are lost instead of being recovered during a maintenance cycle. Fix: Do not delete the dynamic bridge after a binding error. Result: After required resources have been restored then the dynamic bridge is created properly during a periodic retry.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-06 18:52:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 698367    
Attachments:
Description Flags
Quick patch to prevent the bridge from being destroyed none

Description Jason Dillaman 2012-10-01 03:23:18 UTC
Description of problem:
If a dynamic bridge's session has been detached while attempting to propagate a binding event, the broker will delete the bridge.  Normally, a detached bridge session will be automatically recovered during the maintenance periodic if possible. Needless to say, auto-deleting the bridge upon a session error prevents this normal recovery path from occurring.

This event can occur in a production system during broker startup/federation and also during source broker recovery since there is a potential race condition between creation of the source exchange and the creation of the dynamic bridge on the destination broker.

Log Message:
Sep 30 19:41:40 localhost qpidd[10497]: 2012-09-30 19:41:40 [Broker] error Cannot propagate binding for dynamic bridge as session has been detached, deleting dynamic bridge

Version-Release number of selected component (if applicable):
Qpid 0.18

How reproducible:
100%

Steps to Reproduce:
1. Create a dynamic bridge between two brokers. Destination broker should have a valid destination exchange but the source broker should be missing the source exchange.
2. Create a new binding on the destination exchange.
  
Actual results:
Bridge is deleted because the session was previously detached due to the missing exchange.

Expected results:
After the source exchange is created, session error is recovered during the maintenance periodic and the binding event properly propagates.

Additional info:

Comment 1 Jason Dillaman 2012-10-01 17:52:36 UTC
Created attachment 619918 [details]
Quick patch to prevent the bridge from being destroyed

Comment 2 Chuck Rolke 2012-10-02 20:10:36 UTC
What does one do to have the session appear detached?

With these commands the brokers retry until the source exchange is created and then things proceeed normally.

# src broker: localhost:5801
# dst broker: localhost:5803
#
# Create exchange in dst broker
#
qpid-config -b localhost:5803 add exchange topic fed.topic

#
# create dynamic bridge
#
qpid-route dynamic add localhost:5803 localhost:5801 fed.topic

#
# create dst queue as bind target
#
qpid-config -b localhost:5803 add queue fed.topic.queue

#
# create binding on dest exchange
#
qpid-config -b localhost:5803 bind fed.topic fed.topic.queue

Comment 3 Jason Dillaman 2012-10-11 15:13:57 UTC
There is the potential that the session is not invalidated when you create the binding (might be in the process of recovering via the link maintenance interval -- which I believe you can increase). Recommend that you continue to send binding events until you see the log message above.

Comment 4 Chuck Rolke 2012-10-18 20:46:53 UTC
Proposed patch committed upstream QPID-4378, r1399837. Checked by Ted Ross.

I never reproduced the bug through normal session errors but simply binding and unbinding caused an issue that this patch corrects.

Comment 6 Chuck Rolke 2012-10-21 22:50:48 UTC
Code prints a warning message when links are fine and a bridge is unbound. This is wrong. No warning is required and the else clause of the original patch should be to do nothing.

This is included in QPID-4378, r1400736.

Comment 8 Leonid Zhaldybin 2012-11-28 13:22:47 UTC
Tested on RHEL5.9 and RHEL6.3, both i386 and x86_64. The broker does not delete a dynamic bridge if its session is detached.

Packages used for testing:

RHEL5.9
python-qpid-0.18-4.el5
python-qpid-qmf-0.18-9.el5
qpid-cpp-client-0.18-10.el5
qpid-cpp-client-devel-0.18-10.el5
qpid-cpp-client-ssl-0.18-10.el55
qpid-cpp-server-0.18-10.el5
qpid-cpp-server-cluster-0.18-10.el5
qpid-cpp-server-devel-0.18-10.el5
qpid-cpp-server-ha-0.18-10.el5
qpid-cpp-server-ssl-0.18-10.el5
qpid-cpp-server-store-0.18-10.el5
qpid-cpp-server-xml-0.18-10.el5
qpid-java-client-0.18-5.el5
qpid-java-common-0.18-5.el5
qpid-java-example-0.18-5.el5
qpid-qmf-0.18-9.el5
qpid-qmf-devel-0.18-9.el5
qpid-tools-0.18-7.el5

RHEL6.3
python-qpid-0.18-4.el6
python-qpid-qmf-0.18-10.el6_3
qpid-cpp-client-0.18-10.el6_3
qpid-cpp-client-devel-0.18-10.el6_3
qpid-cpp-client-ssl-0.18-10.el6_3
qpid-cpp-server-0.18-10.el6_3
qpid-cpp-server-cluster-0.18-10.el6_3
qpid-cpp-server-devel-0.18-10.el6_3
qpid-cpp-server-ha-0.18-10.el6_3
qpid-cpp-server-ssl-0.18-10.el6_3
qpid-cpp-server-store-0.18-10.el6_3
qpid-cpp-server-xml-0.18-10.el6_3
qpid-java-client-0.18-5.el6
qpid-java-common-0.18-5.el6
qpid-java-example-0.18-5.el6
qpid-qmf-0.18-10.el6_3
qpid-qmf-devel-0.18-10.el6_3
qpid-tools-0.18-7.el6_3.noarch

-> VERIFIED

Comment 10 errata-xmlrpc 2013-03-06 18:52:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0561.html