Bug 467878 - Cluster to support message TTL
Summary: Cluster to support message TTL
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: beta
Hardware: All
OS: Linux
high
high
Target Milestone: 1.1.1
: ---
Assignee: Alan Conway
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-10-21 14:05 UTC by Alan Conway
Modified: 2009-04-21 16:17 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-04-21 16:17:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2009:0434 0 normal SHIPPED_LIVE Red Hat Enterprise MRG Messaging and Grid Version 1.1.1 2009-04-21 16:15:50 UTC

Description Alan Conway 2008-10-21 14:05:16 UTC
Description of problem:

AMQP messages can have a TTL, and expire when that time is up. Current broker uses local system clock to determine timeout. Clock skew differences could cause inconsistencies in the cluster.

How reproducible:

Diffficult to reproduce, requires hosts with deliberately skewed clocks. It is a real race condition however.

Additional info:

Cluster members need to exchange time messages for timed events so there is an agreed "cluster time" relative to CPG message delivery.

Comment 1 Alan Conway 2008-10-31 17:56:59 UTC
Test info:the issue is that existing cluster nodes each calculate TTL expiry independently so there is a small window for clients on one node to see different results from those on another node, if one client's actions occur just before the expiring according to its node, the other just after according to its node.

This is very difficult to test, since the timing conditions may not occur and everything may appear fine.

Best hope for testing the TTL might be to set TTL as part of a "stress test" where a cluster is subjected to intense activity for a long period of time. Due to the time consuming nature of such tests, it may be best to compile features needing testing into a single stress test that can test multiple potential problems at once.

Comment 2 Alan Conway 2009-02-10 13:24:34 UTC
Fixed in revision 742774

Comment 4 Jan Sarenik 2009-03-03 10:32:17 UTC
Testing method:

  * use two (virtual) machines: A and B
  * set up OpenAIS in /etc/ais/openais.conf and run it on both A and B
  * ensure /root/.qpidd is empty
    # rm -rf /root/.qpidd
  * on both machines run qpidd (order is not important):
    # qpidd -t --auth=no --cluster-name="test"
  * on machine A run perftest in another console
    root@A:~# perftest
  * on machine B run date and adjust clock
    root@B:~# date `date +%m%d`0000.00
  * if nothing happens, then try to update the clock back
    root@B:~# ntpdate time.fi.muni.cz
  * the both qpidd daemons should be interrupted,
      on machine A with "Cannot mcast to CPG group ahoj: access denied."
      on machine B with "Segmentation fault"

For this I used stable (1.1) versions of qpidd-cluster and qpidc-perftest:
  qpidd-cluster-0.4.732838-1.el5
  qpidc-perftest-0.4.732838-1.el5

Alan, is it the same what you have been experiencing?

Comment 5 Jan Sarenik 2009-03-03 11:51:48 UTC
I am not able to produce anything similar on latest 1.1.1 candidate
  qpidd-0.4.744917-1.el5
  qpidc-perftest-0.4.744917-1.el5

Even though I was running two brokers in a cluster
and on one of them (B in previous example) this
script was running.

----------------------------------------------------
while true
do
	date `date +%m%d`$(((($RANDOM)%14)+10))00.00
	sleep 1
	ntpdate time.englab.brq.redhat.com
	sleep 1
done
----------------------------------------------------

Comment 6 Alan Conway 2009-03-03 14:47:42 UTC
I had not tried altering the clocks while a test is running. I'm not clear from your comment above, is it working correctly with the latest candidate?

Comment 7 Jan Sarenik 2009-03-03 15:02:12 UTC
Sorry for confusing wording.
Yes, it is working correctly with latest candidate.

Comment 9 errata-xmlrpc 2009-04-21 16:17:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0434.html


Note You need to log in before you can comment on or make changes to this bug.