In clustered messaging configuration with transient queues and transient messages, the last surviving node of the cluster is expected to automatically switch all its queues to durable mode and flush current state of queues to a persistent storage. In event of such scenario, during the commitment of the queues's states from memory to persistent storage, priority of that process should be much higher than any process that accepts new messages into the queues.
The implementation has been completed, with test
Expectations are clear but vague "how to test" info, putting NEEDINFO flag.
How to verify. This can be tested by using qpid-config to pre create a queue, then use perftest to create some load onto a cluster with 2 brokers connected. Then fail the one broker that perftest is not talking to and the messages should become durable. This can be observed through the qpid-tool. Then add one node back to the cluster and the messages should become transient on the next run from perftest. this can be observed via qpid-tool. Carl.
This bug is verified. Test procedure: 1. Configured openais on a single computer 2. Started two brokers with common --cluster-name and different --port and --data-dir parameteres 3. Configured a --durable --cluster-durable queue with qpid-config against one of the brokers: qpid-config -a localhost:<port> add queue perftest0 --cluster-durable --durable 4. Verified that in each of the data-dirs the directory <data-dir>/rhm/jrnl/001a/perftest0 directory is found. 5. Ran perftest -p <port> --mode shared --count 1000 --size 256 against one of the brokers. 6. Shut down one of the cluster 7. Ran perftest -p <port> --mode shared --count 1000 --size 256 against the still available broker. 8. Ran qpid-tool against the still alive broker and got these results: qpid: show com.redhat.rhm.store:journal Object of type com.redhat.rhm.store:journal: (last sample time: 17:08:08) Type Element 112 118 ================================================================================================== property queueRef 0 111 property name TplStore perftest0 property directory /root/qpid2//rhm/tpl/ /root/qpid2//rhm/jrnl/001a/perftest0/ property baseFileName tpl JournalData property writePageSize 0 bytes 32768 property writePages 0 wpages 32 property readPageSize 65536 bytes 65536 property readPages 16 rpages 16 property initialFileCount 0 files 8 property dataFileSize 0 bytes 1572864 property currentFileCount 0 files 8 statistic recordDepth 0 records 0 statistic recordDepthHigh 0 0 statistic recordDepthLow 0 0 statistic enqueues 0 1000 statistic dequeues 0 1000 statistic txnEnqueues 0 0 statistic txnDequeues 0 1000 statistic txnCommits 0 0 statistic txnAborts 0 0 statistic outstandingAIOs 0 aio_ops 0 statistic outstandingAIOsHigh 0 0 statistic outstandingAIOsLow 0 0 statistic freeFileCount 0 files 0 statistic freeFileCountHigh 0 0 statistic freeFileCountLow 0 0 statistic availableFileCount 0 0 statistic availableFileCountHigh 0 0 statistic availableFileCountLow 0 0 statistic writeWaitFailures 0 records 0 statistic writeBusyFailures 0 0 statistic readRecordCount 0 0 statistic readBusyFailures 0 0 statistic writePageCacheDepth 0 wpages 0 statistic writePageCacheDepthHigh 0 0 statistic writePageCacheDepthLow 0 0 statistic readPageCacheDepth 0 rpages 0 statistic readPageCacheDepthHigh 0 0 statistic readPageCacheDepthLow 0 0 qpid: show 111 (111 is taken from the property 'queueRef' in the listing above) Object of type org.apache.qpid.broker:queue: (last sample time: 17:08:08) Type Element 111 =================================================================================================================== property vhostRef 103 property name perftest0 property durable True property autoDelete False property exclusive False property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L, u'qpid.persist_last_node': 1L} statistic msgTotalEnqueues 2000 messages statistic msgTotalDequeues 2000 statistic msgTxnEnqueues 0 statistic msgTxnDequeues 0 statistic msgPersistEnqueues 1000 statistic msgPersistDequeues 1000 statistic msgDepth 0 statistic byteDepth 0 octets statistic byteTotalEnqueues 512000 statistic byteTotalDequeues 512000 statistic byteTxnEnqueues 0 statistic byteTxnDequeues 0 statistic bytePersistEnqueues 256000 statistic bytePersistDequeues 256000 statistic consumerCount 0 consumers statistic consumerCountHigh 0 statistic consumerCountLow 0 statistic bindingCount 1 binding statistic bindingCountHigh 1 statistic bindingCountLow 1 statistic unackedMessages 0 messages statistic unackedMessagesHigh 0 statistic unackedMessagesLow 0 statistic messageLatencySamples 0 statistic messageLatencyMin 0 statistic messageLatencyMax 0 statistic messageLatencyAverage 0 9. Concluded this is correct. We sent 1000 messages with both brokers available, then 1000 with only 1 of 2 brokers available. Total messages processed is 2000, and 1000 of them where written/read to/from disk (msgPersistEnqueues/msgPersistDequeues). I also checked with qpid-tool between the two perftest runs that message count was 1000 and none was written/read to/from disk. The bug is moved back to ASSIGNED as the docs needs to be updated that such a failover solution needs the queue to be declared as both --durable and --cluster-durable. Failing to remember to also include --durable will not make the queue work as expected.
I am going to see if we can default --durable with --cluster-durable to make configuration less error prone.
reviewing the code I believe this is best as verified, as late creating the a store is probably not a good idea for constituency in managing queues. Assigning to Doc: - Please add to the queue section of the documentation, chapter 3: - that should be it on the BZ ---------------------------- Persist Last Node This option is used in conjunction with clustering. It allows for a queue configured to persist transient messages if the cluster fails down to the last node. If additional nodes in the cluster are restored it will stop persisting transient messages. Note * if a cluster is started with only one active node, this mode will not be triggered. It is only triggered the first time the cluster fails down to 1 node. * The queue MUST be configured durable when this mode is used Example: #include "qpid/client/QueueOptions.h" QueueOptions qo; qo.clearPersistLastNode(); session.queueDeclare(arg::queue=queue, arg::durable=true, rg::arguments=qo);
<formalpara id="form-Messaging_User_Guide-Queues-Enforcing_persistence_on_the_last_node_in_a_cluster"> <title>Enforcing persistence on the last node in a cluster</title> <para> <command>PersistLastNode</command> is used if a cluster fails down to a single node. In this situation, a queue would treat all transient messages as persistent until additional nodes in the cluster are restored. </para> </formalpara> <para> This mode will not be triggered if a cluster is started with only one node. It will only be triggered if active nodes fail until there is only one node remaining. </para> <para> If this mode is used, queues must be configured to be durable, otherwise it will fail to persist. </para> <example id="exam-Messaging_User_Guide-Queues-Using_Persist_Last_Node"> <title>Using <command>Persist Last Node</command></title> <para> This example demonstrates the use of <command>Persist Last Node</command> <programlisting> #include "qpid/client/QueueOptions.h" QueueOptions qo; qo.clearPersistLastNode(); session.queueDeclare(arg::queue=queue, arg::durable=true, rg::arguments=qo); </programlisting> </para> </example> Available for review on the stage shortly. LKB
Back to assigned, and can hopefully move it to VERIFIED, as that is what this bug should say. This bug has become a unified bug for both developers, QE, and docs, and it is an errata bug.
To proper verified status
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-0035.html