Bug 504590 - qpidd does not use heartbeats to detect loss of clients
qpidd does not use heartbeats to detect loss of clients
Status: CLOSED ERRATA
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp (Show other bugs)
1.1.1
All Linux
urgent Severity medium
: 1.1.2
: ---
Assigned To: Andrew Stitcher
Martin Kudlej
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-06-08 08:11 EDT by Gordon Sim
Modified: 2009-06-12 13:39 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-06-12 13:39:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
exclusive subscribe example (1.01 KB, text/x-c++src)
2009-06-08 08:16 EDT, Gordon Sim
no flags Details
heartbeat echo from the java and python clients (1.05 KB, patch)
2009-06-08 10:27 EDT, Rafael H. Schloming
no flags Details | Diff
Exclusive subscribe examples in c++, python and java (2.58 KB, application/x-compressed-tar)
2009-06-08 15:54 EDT, Gordon Sim
no flags Details
Patch to add client heartbeat/detection (14.66 KB, patch)
2009-06-08 16:15 EDT, Andrew Stitcher
no flags Details | Diff
Fix for issues in previous patch for 1.1.2 (5.03 KB, patch)
2009-06-11 01:52 EDT, Andrew Stitcher
no flags Details | Diff

  None (edit)
Description Gordon Sim 2009-06-08 08:11:35 EDT
E.g. have client request exclusive subscription to queue on a remote broker, power off the machine on which the client is running, from another machine try to request and exclusive subscription to that same queue.

At present it can take 14 minutes for the broker to detect that the first client's session has been lost and grant an exclusive subscription to the second client. It should happen much faster.
Comment 1 Gordon Sim 2009-06-08 08:16:25 EDT
Created attachment 346857 [details]
exclusive subscribe example

This is a simple client that will request an exclusive subscription to the specified queue. While on instance of this is active, attempts to start further instances on the same queue will fail. If the first porcess is killed, that will allow another to take the exclusive subscription.

However without bi-directional heartbeats (and with default tcp settings) if the machine the client is on is powered down (or if it is unplugged from the network) the queue remains 'locked' until tcp timesout on retries.
Comment 2 Rafael H. Schloming 2009-06-08 10:27:04 EDT
Created attachment 346886 [details]
heartbeat echo from the java and python clients
Comment 3 Gordon Sim 2009-06-08 15:54:01 EDT
Created attachment 346919 [details]
Exclusive subscribe examples in c++, python and java

This tarball includes an equivalent example to the one attached earlier in java and python as well as c++. 

(For java the ant file included will both compile and run the test e.g. ant run -Dport=6672 -Dhost=mrg-xx)
Comment 4 Andrew Stitcher 2009-06-08 16:15:54 EDT
Created attachment 346927 [details]
Patch to add client heartbeat/detection

This patch against the cpp directory of the qpid 1.1.2 tree adds c++ client heartbeat and c++ broker detection of heartbeat timeout.
Comment 5 Andrew Stitcher 2009-06-08 16:16:39 EDT
fixed in 1.2 as well
Comment 6 Carl Trieloff 2009-06-08 20:49:53 EDT
This test should also be run on a node of a cluster
Comment 9 Frantisek Reznicek 2009-06-10 02:36:51 EDT
Putting back to ON_QA.
Comment 10 Andrew Stitcher 2009-06-10 17:04:42 EDT
I've found a problem with the first fix:

Reproducer:

Run 3 clustered brokers:

qpidd --auth no --cluster-name ams --port 21022 --no-data-dir
qpidd --auth no --cluster-name ams --port 21023 --no-data-dir
qpidd --auth no --cluster-name ams --port 21024 --no-data-dir

Run this line against brokers:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill -STOP %% ; sleep 2 ; kill -CONT %%; done
Comment 11 Andrew Stitcher 2009-06-11 01:42:37 EDT
The above test is a bit too fierce and doesn't leave enough time to be sure that the broker should kill the client connection as a hearbeat of 1s has a 2s timeout.

This means that BZ505210 can interfere.

use:

while true; do src/tests/perftest --port 21022 --heartbeat 1 & sleep 2 ; kill
-STOP %% ; sleep 4 ; kill -CONT %%; done

Instead.
Comment 12 Andrew Stitcher 2009-06-11 01:52:04 EDT
Created attachment 347332 [details]
Fix for issues in previous patch for 1.1.2

Patch which fixes the issues in the previous client heartbeat patch
Comment 13 Martin Kudlej 2009-06-11 10:08:29 EDT
Tested on RHEL 4.7 and 5.3 on i386/x86_64 with qpidd-0.5.752581-16 and it works as we expect. About after 3 heartbeats clients can exclusive connect to queue again. -->VERIFIED
Comment 15 errata-xmlrpc 2009-06-12 13:39:02 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1097.html

Note You need to log in before you can comment on or make changes to this bug.