Bug 504590 - qpidd does not use heartbeats to detect loss of clients
Summary: qpidd does not use heartbeats to detect loss of clients
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.1.1
Hardware: All
OS: Linux
urgent
medium
Target Milestone: 1.1.2
: ---
Assignee: Andrew Stitcher
QA Contact: Martin Kudlej
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-08 12:11 UTC by Gordon Sim
Modified: 2009-06-12 17:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 17:39:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
exclusive subscribe example (1.01 KB, text/x-c++src)
2009-06-08 12:16 UTC, Gordon Sim
no flags Details
heartbeat echo from the java and python clients (1.05 KB, patch)
2009-06-08 14:27 UTC, Rafael H. Schloming
no flags Details | Diff
Exclusive subscribe examples in c++, python and java (2.58 KB, application/x-compressed-tar)
2009-06-08 19:54 UTC, Gordon Sim
no flags Details
Patch to add client heartbeat/detection (14.66 KB, patch)
2009-06-08 20:15 UTC, Andrew Stitcher
no flags Details | Diff
Fix for issues in previous patch for 1.1.2 (5.03 KB, patch)
2009-06-11 05:52 UTC, Andrew Stitcher
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1097 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging bug fixing update 2009-06-12 17:38:48 UTC

Description Gordon Sim 2009-06-08 12:11:35 UTC
E.g. have client request exclusive subscription to queue on a remote broker, power off the machine on which the client is running, from another machine try to request and exclusive subscription to that same queue.

At present it can take 14 minutes for the broker to detect that the first client's session has been lost and grant an exclusive subscription to the second client. It should happen much faster.

Comment 1 Gordon Sim 2009-06-08 12:16:25 UTC
Created attachment 346857 [details]
exclusive subscribe example

This is a simple client that will request an exclusive subscription to the specified queue. While on instance of this is active, attempts to start further instances on the same queue will fail. If the first porcess is killed, that will allow another to take the exclusive subscription.

However without bi-directional heartbeats (and with default tcp settings) if the machine the client is on is powered down (or if it is unplugged from the network) the queue remains 'locked' until tcp timesout on retries.

Comment 2 Rafael H. Schloming 2009-06-08 14:27:04 UTC
Created attachment 346886 [details]
heartbeat echo from the java and python clients

Comment 3 Gordon Sim 2009-06-08 19:54:01 UTC
Created attachment 346919 [details]
Exclusive subscribe examples in c++, python and java

This tarball includes an equivalent example to the one attached earlier in java and python as well as c++. 

(For java the ant file included will both compile and run the test e.g. ant run -Dport=6672 -Dhost=mrg-xx)

Comment 4 Andrew Stitcher 2009-06-08 20:15:54 UTC
Created attachment 346927 [details]
Patch to add client heartbeat/detection

This patch against the cpp directory of the qpid 1.1.2 tree adds c++ client heartbeat and c++ broker detection of heartbeat timeout.

Comment 5 Andrew Stitcher 2009-06-08 20:16:39 UTC
fixed in 1.2 as well

Comment 6 Carl Trieloff 2009-06-09 00:49:53 UTC
This test should also be run on a node of a cluster

Comment 9 Frantisek Reznicek 2009-06-10 06:36:51 UTC
Putting back to ON_QA.

Comment 10 Andrew Stitcher 2009-06-10 21:04:42 UTC
I've found a problem with the first fix:

Reproducer:

Run 3 clustered brokers:

qpidd --auth no --cluster-name ams --port 21022 --no-data-dir
qpidd --auth no --cluster-name ams --port 21023 --no-data-dir
qpidd --auth no --cluster-name ams --port 21024 --no-data-dir

Run this line against brokers:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 2 ; kill -CONT %%; done

Comment 11 Andrew Stitcher 2009-06-11 05:42:37 UTC
The above test is a bit too fierce and doesn't leave enough time to be sure that the broker should kill the client connection as a hearbeat of 1s has a 2s timeout.

This means that BZ505210 can interfere.

use:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill
-STOP %% ; sleep 4 ; kill -CONT %%; done

Instead.

Comment 12 Andrew Stitcher 2009-06-11 05:52:04 UTC
Created attachment 347332 [details]
Fix for issues in previous patch for 1.1.2

Patch which fixes the issues in the previous client heartbeat patch

Comment 13 Martin Kudlej 2009-06-11 14:08:29 UTC
Tested on RHEL 4.7 and 5.3 on i386/x86_64 with qpidd-0.5.752581-16 and it works as we expect. About after 3 heartbeats clients can exclusive connect to queue again. -->VERIFIED

Comment 15 errata-xmlrpc 2009-06-12 17:39:02 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html


Note You need to log in before you can comment on or make changes to this bug.