Bug 824988

Summary: [RFE] Federated link heartbeat interval is hard coded to 120 seconds
Product: Red Hat Enterprise MRG Reporter: Jason Dillaman <jdillama>
Component: qpid-cppAssignee: Ted Ross <tross>
Status: CLOSED ERRATA QA Contact: Leonid Zhaldybin <lzhaldyb>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 2.0CC: agoldste, esammons, jdillama, jross, lzhaldyb, tross
Target Milestone: 3.0Keywords: FutureFeature, Patch
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-0.18 Doc Type: Enhancement
Doc Text:
This Enhancement introduces a configurable link heartbeat interval for the qpidd broker. In a worst-case scenario, the previous heartbeat default of 120 seconds would result in a system recovery under 240 seconds. For High Availability environments, this amount of time was considered to be too long, and a user-configurable time in seconds was required. The qpid broker now has an option `link-heartbeat-interval`, which allows a custom heartbeat interval (in seconds) to be configured. This feature is documented in the "Broker HA Options" section in the Messaging Installation and Configuration Guide.
Story Points: ---
Clone Of:
: 957950 (view as bug list) Environment:
Last Closed: 2014-09-24 15:04:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 698367, 957950    

Description Jason Dillaman 2012-05-24 18:28:27 UTC
Description of problem:
The federated link's connection heartbeat interval appears to be hard coded to 120 seconds.  This timeout will result in a recovery after approximately 240 seconds in the worst case scenario.  In a high-availability environment, the heartbeat interval needs to be configurable to a lower value to reduce system unavailability.

Version-Release number of selected component (if applicable):
qpid-cpp-server-0.12-6_ptc_hotfix_3.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Establish a federated link between two brokers over two hosts
2. Hard-kill one of the hosts
  
Actual results:
The surviving broker will take ~240 seconds to declare the link dead.

Expected results:
In a clustered HA environment, the surviving broker will failover to another broker within the cluster in a short period of time.

Additional info:

Comment 1 Ted Ross 2012-06-06 18:36:53 UTC
Fixed upstream at revision 1347044.

Comment 6 Leonid Zhaldybin 2014-01-09 16:20:55 UTC
Tested on RHEL6.5 (both i386 and x86_64). This feature was implemented and works as expected. The qpidd broker now has an option link-heartbeat-interval, which lets setting the heartbeat interval in seconds.

Packages used for testing:

python-qpid-0.22-9.el6
python-qpid-qmf-0.22-25.el6
qpid-cpp-client-0.22-31.el6
qpid-cpp-client-devel-0.22-31.el6
qpid-cpp-client-devel-docs-0.22-31.el6
qpid-cpp-client-rdma-0.22-31.el6
qpid-cpp-client-ssl-0.22-31.el6
qpid-cpp-server-0.22-31.el6
qpid-cpp-server-devel-0.22-31.el6
qpid-cpp-server-ha-0.22-31.el6
qpid-cpp-server-linearstore-0.22-31.el6
qpid-cpp-server-rdma-0.22-31.el6
qpid-cpp-server-ssl-0.22-31.el6
qpid-cpp-server-xml-0.22-31.el6
qpid-proton-c-0.6-1.el6
qpid-qmf-0.22-25.el6
qpid-tools-0.22-7.el6

-> VERIFIED

Comment 8 errata-xmlrpc 2014-09-24 15:04:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1296.html