452123 – last node of cluster to switch to durable queues

Bug 452123 - last node of cluster to switch to durable queues

Summary: last node of cluster to switch to durable queues

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	qpid-cpp
Sub Component:
Version:	1.0
Hardware:	All
OS:	Linux
Priority:	low
Severity:	medium
Target Milestone:	1.1
Target Release:	---
Assignee:	Carl Trieloff
QA Contact:	Kim van der Riet
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	453535
TreeView+	depends on / blocked

Reported:	2008-06-19 14:56 UTC by Sergei Vorobiev
Modified:	2009-02-04 15:37 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2009-02-04 15:37:53 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2009:0035	0	normal	SHIPPED_LIVE	Red Hat Enterprise MRG Messaging 1.1 Release	2009-02-04 15:33:44 UTC

Description Sergei Vorobiev 2008-06-19 14:56:25 UTC

In clustered messaging configuration with transient queues and transient
messages, the last surviving node of the cluster is expected to automatically
switch all its queues to durable mode and flush current state of queues to a
persistent storage. In event of such scenario, during the commitment of the
queues's states from memory to persistent storage, priority of that process
should be much higher than any process that accepts new messages into the queues.

Comment 1 Carl Trieloff 2008-10-06 17:59:11 UTC

The implementation has been completed, with test

Comment 2 Frantisek Reznicek 2008-10-31 15:58:51 UTC

Expectations are clear but vague "how to test" info, putting NEEDINFO flag.

Comment 3 Carl Trieloff 2008-11-04 17:12:03 UTC


How to verify.

This can be tested by using qpid-config to pre create a queue, then use perftest to create some load onto a cluster with 2 brokers connected. Then fail the one broker that perftest is not talking to and the messages should become durable. This can be observed through the qpid-tool.

Then add one node back to the cluster and the messages should become transient on the next run from perftest. this can be observed via qpid-tool.

Carl.

Comment 5 David Sommerseth 2009-01-08 17:22:21 UTC

This bug is verified.

Test procedure:

1.  Configured openais on a single computer
2.  Started two brokers with common --cluster-name and different --port and --data-dir parameteres
3.  Configured a --durable --cluster-durable queue with qpid-config against one of the brokers:
    
     qpid-config -a localhost:<port> add queue perftest0 --cluster-durable --durable

4.  Verified that in each of the data-dirs the directory <data-dir>/rhm/jrnl/001a/perftest0 directory is found.

5.  Ran perftest -p <port> --mode shared --count 1000 --size 256 against one of the brokers.

6.  Shut down one of the cluster

7.  Ran perftest -p <port> --mode shared --count 1000 --size 256 against the still available broker.

8.  Ran qpid-tool against the still alive broker and got these results:

qpid: show com.redhat.rhm.store:journal
Object of type com.redhat.rhm.store:journal: (last sample time: 17:08:08)
    Type       Element                  112                    118
    ==================================================================================================
    property   queueRef                 0                      111
    property   name                     TplStore               perftest0
    property   directory                /root/qpid2//rhm/tpl/  /root/qpid2//rhm/jrnl/001a/perftest0/
    property   baseFileName             tpl                    JournalData
    property   writePageSize            0 bytes                32768
    property   writePages               0 wpages               32
    property   readPageSize             65536 bytes            65536
    property   readPages                16 rpages              16
    property   initialFileCount         0 files                8
    property   dataFileSize             0 bytes                1572864
    property   currentFileCount         0 files                8
    statistic  recordDepth              0 records              0
    statistic  recordDepthHigh          0                      0
    statistic  recordDepthLow           0                      0
    statistic  enqueues                 0                      1000
    statistic  dequeues                 0                      1000
    statistic  txnEnqueues              0                      0
    statistic  txnDequeues              0                      1000
    statistic  txnCommits               0                      0
    statistic  txnAborts                0                      0
    statistic  outstandingAIOs          0 aio_ops              0
    statistic  outstandingAIOsHigh      0                      0
    statistic  outstandingAIOsLow       0                      0
    statistic  freeFileCount            0 files                0
    statistic  freeFileCountHigh        0                      0
    statistic  freeFileCountLow         0                      0
    statistic  availableFileCount       0                      0
    statistic  availableFileCountHigh   0                      0
    statistic  availableFileCountLow    0                      0
    statistic  writeWaitFailures        0 records              0
    statistic  writeBusyFailures        0                      0
    statistic  readRecordCount          0                      0
    statistic  readBusyFailures         0                      0
    statistic  writePageCacheDepth      0 wpages               0
    statistic  writePageCacheDepthHigh  0                      0
    statistic  writePageCacheDepthLow   0                      0
    statistic  readPageCacheDepth       0 rpages               0
    statistic  readPageCacheDepthHigh   0                      0
    statistic  readPageCacheDepthLow    0                      0

qpid: show 111  (111 is taken from the property 'queueRef' in the listing above)
Object of type org.apache.qpid.broker:queue: (last sample time: 17:08:08)
    Type       Element                111
    ===================================================================================================================
    property   vhostRef               103
    property   name                   perftest0
    property   durable                True
    property   autoDelete             False
    property   exclusive              False
    property   arguments              {u'qpid.file_size': 24L, u'qpid.file_count': 8L, u'qpid.persist_last_node': 1L}
    statistic  msgTotalEnqueues       2000 messages
    statistic  msgTotalDequeues       2000
    statistic  msgTxnEnqueues         0
    statistic  msgTxnDequeues         0
    statistic  msgPersistEnqueues     1000
    statistic  msgPersistDequeues     1000
    statistic  msgDepth               0
    statistic  byteDepth              0 octets
    statistic  byteTotalEnqueues      512000
    statistic  byteTotalDequeues      512000
    statistic  byteTxnEnqueues        0
    statistic  byteTxnDequeues        0
    statistic  bytePersistEnqueues    256000
    statistic  bytePersistDequeues    256000
    statistic  consumerCount          0 consumers
    statistic  consumerCountHigh      0
    statistic  consumerCountLow       0
    statistic  bindingCount           1 binding
    statistic  bindingCountHigh       1
    statistic  bindingCountLow        1
    statistic  unackedMessages        0 messages
    statistic  unackedMessagesHigh    0
    statistic  unackedMessagesLow     0
    statistic  messageLatencySamples  0
    statistic  messageLatencyMin      0
    statistic  messageLatencyMax      0
    statistic  messageLatencyAverage  0

9.  Concluded this is correct.  We sent 1000 messages with both brokers available, then 1000 with only 1 of 2 brokers available.  Total messages processed is 2000, and 1000 of them where written/read to/from disk (msgPersistEnqueues/msgPersistDequeues).

    I also checked with qpid-tool between the two perftest runs that message count was 1000 and none was written/read to/from disk.



The bug is moved back to ASSIGNED as the docs needs to be updated that such a failover solution needs the queue to be declared as both --durable and --cluster-durable.  Failing to remember to also include --durable will not make the queue work as expected.

Comment 6 Carl Trieloff 2009-01-08 20:03:40 UTC

I am going to see if we can default --durable with --cluster-durable to make configuration less error prone.

Comment 7 Carl Trieloff 2009-01-08 21:36:16 UTC

reviewing the code I believe this is best as verified, as late creating the a store is probably not a good idea for constituency in managing queues.

Assigning to Doc:

- Please add to the queue section of the documentation, chapter 3:
- that should be it on the BZ
----------------------------


Persist Last Node

This option is used in conjunction with clustering. It allows for a queue configured to persist transient messages if the cluster fails down to the last node. If additional nodes in the cluster are restored it will stop persisting transient messages.

Note
    * if a cluster is started with only one active node, this mode will not be triggered. It is only triggered the first time the cluster fails down to 1 node.
    * The queue MUST be configured durable when this mode is used

Example:

#include "qpid/client/QueueOptions.h"

    QueueOptions qo;
    qo.clearPersistLastNode();

    session.queueDeclare(arg::queue=queue, arg::durable=true, rg::arguments=qo);

Comment 8 Lana Brindley 2009-01-13 01:13:53 UTC

<formalpara id="form-Messaging_User_Guide-Queues-Enforcing_persistence_on_the_last_node_in_a_cluster">
		<title>Enforcing persistence on the last node in a cluster</title>
		<para>
			<command>PersistLastNode</command> is used if a cluster fails down to a single node. In this situation, a queue would treat all transient messages as persistent until additional nodes in the cluster are restored.
		</para>
	</formalpara>
	<para>
		This mode will not be triggered if a cluster is started with only one node. It will only be triggered if active nodes fail until there is only one node remaining.
	</para>
	<para>
		If this mode is used, queues must be configured to be durable, otherwise it will fail to persist.
	</para>
	<example id="exam-Messaging_User_Guide-Queues-Using_Persist_Last_Node">
		<title>Using <command>Persist Last Node</command></title>
		<para>
			This example demonstrates the use of <command>Persist Last Node</command>
<programlisting>
#include "qpid/client/QueueOptions.h"
	
QueueOptions qo;
qo.clearPersistLastNode();

session.queueDeclare(arg::queue=queue, arg::durable=true,
rg::arguments=qo);
</programlisting>
		</para>
	</example>

Available for review on the stage shortly.

LKB

Comment 9 David Sommerseth 2009-01-13 07:58:20 UTC

Back to assigned, and can hopefully move it to VERIFIED, as that is what this bug should say.  This bug has become a unified bug for both developers, QE, and docs, and it is an errata bug.

Comment 10 David Sommerseth 2009-01-13 08:02:06 UTC

To proper verified status

Comment 12 errata-xmlrpc 2009-02-04 15:37:53 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHEA-2009-0035.html

Note You need to log in before you can comment on or make changes to this bug.