Red Hat Bugzilla – Bug 491305
clustered qpidd - replicating non-acked messages is not made visible for managent tools qpid-tool/cumin
Last modified: 2015-11-15 19:07:09 EST
Description of problem: Following https://issues.apache.org/jira/browse/QPID-1618 instructions I found that replication of non-acked messages is done correctly (bug 473308), but management components (qpid-tool, possibly also cumin) are not noticed. Which results in state that qpid-tool displays inaccurate data - in this particular case doesn't show that there some messages enqueued into queue (cluster newbee <-> qpid-tool). Version-Release number of selected component (if applicable): [root@dhcp-lab-200 bz473308]# rpm -qa | egrep '(rhm|qpid)' rhm-docs-0.5.753238-1.el5 rhm-0.5.3153-1.el5 qpidd-devel-0.5.752581-1.el5 qpidd-0.5.752581-1.el5 qpid-java-client-0.5.751061-1.el5 qpidc-rdma-0.5.752581-1.el5 qpidc-perftest-0.5.752581-1.el5 qpidd-rdma-0.5.752581-1.el5 qpidd-xml-0.5.752581-1.el5 qpidd-acl-0.5.752581-1.el5 python-qpid-0.5.752581-1.el5 qpidc-devel-0.5.752581-1.el5 qpidd-ssl-0.5.752581-1.el5 qpidc-0.5.752581-1.el5 qpidc-ssl-0.5.752581-1.el5 qpid-java-common-0.5.751061-1.el5 qpidd-cluster-0.5.752581-1.el5 How reproducible: 100% Steps to Reproduce: 1. follow Additional info transcript or 1. follow https://issues.apache.org/jira/browse/QPID-1618 steps 1-6 2. start qpid-tool localhost:5813 and show 111 to show MY_QUEUE stat there you should see 5 messages enqueued: 3. perform follow https://issues.apache.org/jira/browse/QPID-1618 step 7 4. start qpid-tool localhost:5814 and show 111 to show MY_QUEUE stat 5 enqueued not seen there!, but they are actually enqueued as Additional info shows... Actual results: Management data are not always accurate. Expected results: Management data should be always accurate. Additional info: (transcript) [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# # ########################################################################### [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# cname=$(hostname) [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# port1=5813 [root@dhcp-lab-200 bz473308]# port2=5814 [root@dhcp-lab-200 bz473308]# q_name="MY_QUEUE" [root@dhcp-lab-200 bz473308]# e_name="amq.direct" [root@dhcp-lab-200 bz473308]# rk_name="routing_key" [root@dhcp-lab-200 bz473308]# # ########################################################################### [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# # broker datadir creation + logs [root@dhcp-lab-200 bz473308]# echo "clean-up&init" clean-up&init [root@dhcp-lab-200 bz473308]# rm -rf ./data1 ./data2 *.log [root@dhcp-lab-200 bz473308]# service openais restart Stopping OpenAIS daemon (aisexec): [ OK ] Starting OpenAIS daemon (aisexec): [ OK ] [root@dhcp-lab-200 bz473308]# setenforce 0 [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# # ########################################################################### [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# echo "start broker I" start broker I [root@dhcp-lab-200 bz473308]# qpidd --auth no --data-dir ./data1 --log-enable debug+ \ > -p ${port1} --cluster-name ${cname} >qpidd1.log 2>&1 & [1] 16069 [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# sleep 3 [root@dhcp-lab-200 bz473308]# netstat -nlp | grep qpidd | grep ${port1} tcp 0 0 0.0.0.0:5813 0.0.0.0:* LISTEN 16069/qpidd [root@dhcp-lab-200 bz473308]# pid1=$(netstat -nlp | grep qpidd | grep ${port1} | awk '{print $NF}' | \ > awk -F/ '{print $(NF-1)}') [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# # ########################################################################### [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# echo -n "Step 4: Declaring and binding queue: ecode:" Step 4: Declaring and binding queue: ecode:[root@dhcp-lab-200 bz473308]# # ./declare_queues localhost ${port1} [root@dhcp-lab-200 bz473308]# qpid-config -a localhost:${port1} add queue ${q_name} --durable [root@dhcp-lab-200 bz473308]# ecode=$? [root@dhcp-lab-200 bz473308]# qpid-config -a localhost:${port1} bind ${e_name} ${q_name} ${rk_name} [root@dhcp-lab-200 bz473308]# echo ${ecode}$? 00 [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# qpid-config -b -a localhost:${port1} queues Queue 'MY_QUEUE' bind [MY_QUEUE] => '' bind [routing_key] => amq.direct Queue 'reply-dhcp-lab-200.englab.brq.redhat.com.16105.1' bind [reply-dhcp-lab-200.englab.brq.redhat.com.16105.1] => '' bind [reply-dhcp-lab-200.englab.brq.redhat.com.16105.1] => amq.direct Queue 'topic-dhcp-lab-200.englab.brq.redhat.com.16105.1' bind [topic-dhcp-lab-200.englab.brq.redhat.com.16105.1] => '' bind [schema.#] => qpid.management bind [console.obj.*.*.org.apache.qpid.broker.agent] => qpid.management [root@dhcp-lab-200 bz473308]# qpid-config -b -a localhost:${port1} exchanges Exchange '' (direct) bind [MY_QUEUE] => MY_QUEUE bind [reply-dhcp-lab-200.englab.brq.redhat.com.16112.1] => reply-dhcp-lab-200.englab.brq.redhat.com.16112.1 bind [topic-dhcp-lab-200.englab.brq.redhat.com.16112.1] => topic-dhcp-lab-200.englab.brq.redhat.com.16112.1 Exchange 'qpid.management' (topic) bind [schema.#] => topic-dhcp-lab-200.englab.brq.redhat.com.16112.1 bind [console.obj.*.*.org.apache.qpid.broker.agent] => topic-dhcp-lab-200.englab.brq.redhat.com.16112.1 Exchange 'amq.direct' (direct) bind [routing_key] => MY_QUEUE bind [reply-dhcp-lab-200.englab.brq.redhat.com.16112.1] => reply-dhcp-lab-200.englab.brq.redhat.com.16112.1 Exchange 'amq.topic' (topic) Exchange 'amq.fanout' (fanout) Exchange 'amq.match' (headers) Exchange 'amq.failover' (amq.failover) Exchange 'qpid.cluster-update' (fanout) [root@dhcp-lab-200 bz473308]# echo -n "Step 5: Starting receiver..." Step 5: Starting receiver...[root@dhcp-lab-200 bz473308]# ./receiver -p ${port1} --queue ${q_name} --messages 10 \ > --ack-frequency 10 > receiver.log 2>&1 & [2] 16119 [root@dhcp-lab-200 bz473308]# sleep 1 [root@dhcp-lab-200 bz473308]# echo "done" done [root@dhcp-lab-200 bz473308]# echo -n "Publish only 5 messages, so the receiver will not yet ack. ecode:" Publish only 5 messages, so the receiver will not yet ack. ecode:[root@dhcp-lab-200 bz473308]# ./publish -p ${port1} --count 5 --durable yes --destination ${e_name} \ > --routing-key ${rk_name} --log-enable debug+ >publish.log 2>&1 [root@dhcp-lab-200 bz473308]# echo $? 0 [root@dhcp-lab-200 bz473308]# qpid-tool loaclhost:$port1 Socket Error (loaclhost:5813): Name or service not known [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port1 Management Tool for QPID qpid: schema Classes in Schema: Class Properties Statistics Methods ==================================================================== com.redhat.rhm.store:journal 12 29 1 com.redhat.rhm.store:store 12 10 0 org.apache.qpid.broker:agent 7 1 0 org.apache.qpid.broker:binding 6 2 0 org.apache.qpid.broker:bridge 13 1 1 org.apache.qpid.broker:broker 10 2 3 org.apache.qpid.broker:connection 10 6 1 org.apache.qpid.broker:exchange 6 13 0 org.apache.qpid.broker:link 6 3 2 org.apache.qpid.broker:queue 7 28 1 org.apache.qpid.broker:session 9 7 4 org.apache.qpid.broker:system 7 1 0 org.apache.qpid.broker:vhost 4 1 0 org.apache.qpid.cluster:cluster 10 1 2 qpid: show org.apache.qpid.broker:queue Object of type org.apache.qpid.broker:queue: (last sample time: 09:56:44) Type Element 111 121 126 128 ==================================================================================================================================================================================================================================== property vhostRef 103 103 103 103 property name MY_QUEUE amq.failover2af4f5f6-1ef8-4e20-8197-b9f1e377ec4e mgmt-dhcp-lab-200.englab.brq.redhat.com.16137 repl-dhcp-lab-200.englab.brq.redhat.com.16137 property durable True False False False property autoDelete False True True True property exclusive False True True True property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L} {} {} {} statistic msgTotalEnqueues 5 messages 1 50 49 statistic msgTotalDequeues 0 1 38 49 statistic msgTxnEnqueues 0 0 0 0 statistic msgTxnDequeues 0 0 0 0 statistic msgPersistEnqueues 5 0 0 0 statistic msgPersistDequeues 0 0 0 0 statistic msgDepth 5 0 12 0 statistic byteDepth 1280 octets 0 1923 0 statistic byteTotalEnqueues 1280 0 8164 23566 statistic byteTotalDequeues 0 0 6241 23566 statistic byteTxnEnqueues 0 0 0 0 statistic byteTxnDequeues 0 0 0 0 statistic bytePersistEnqueues 1280 0 0 0 statistic bytePersistDequeues 0 0 0 0 statistic consumerCount 1 consumer 1 1 1 statistic consumerCountHigh 1 1 1 1 statistic consumerCountLow 1 1 0 0 statistic bindingCount 2 bindings 1 3 2 statistic bindingCountHigh 2 1 3 2 statistic bindingCountLow 2 1 0 0 statistic unackedMessages 0 messages 0 0 0 statistic unackedMessagesHigh 0 0 0 0 statistic unackedMessagesLow 0 0 0 0 statistic messageLatencySamples 0 0 0 0 statistic messageLatencyMin 0 0 0 0 statistic messageLatencyMax 0 0 0 0 statistic messageLatencyAverage 0 0 0 0 qpid: show 111 Object of type org.apache.qpid.broker:queue: (last sample time: 09:56:04) Type Element 111 ==================================================================================== property vhostRef 103 property name MY_QUEUE property durable True property autoDelete False property exclusive False property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L} statistic msgTotalEnqueues 5 messages statistic msgTotalDequeues 0 statistic msgTxnEnqueues 0 statistic msgTxnDequeues 0 statistic msgPersistEnqueues 5 statistic msgPersistDequeues 0 statistic msgDepth 5 statistic byteDepth 1280 octets statistic byteTotalEnqueues 1280 statistic byteTotalDequeues 0 statistic byteTxnEnqueues 0 statistic byteTxnDequeues 0 statistic bytePersistEnqueues 1280 statistic bytePersistDequeues 0 statistic consumerCount 1 consumer statistic consumerCountHigh 1 statistic consumerCountLow 1 statistic bindingCount 2 bindings statistic bindingCountHigh 2 statistic bindingCountLow 2 statistic unackedMessages 0 messages statistic unackedMessagesHigh 0 statistic unackedMessagesLow 0 statistic messageLatencySamples 0 statistic messageLatencyMin 0 statistic messageLatencyMax 0 statistic messageLatencyAverage 0 qpid: quit Exiting... [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port2 Socket Error (localhost:5814): Connection refused [root@dhcp-lab-200 bz473308]# echo "start broker II" start broker II [root@dhcp-lab-200 bz473308]# qpidd --auth no --data-dir ./data2 --log-enable debug+ \ > -p ${port2} --cluster-name ${cname} >qpidd2.log 2>&1 & [3] 16151 [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# sleep 3 [root@dhcp-lab-200 bz473308]# netstat -nlp | grep qpidd | grep ${port2} tcp 0 0 0.0.0.0:5814 0.0.0.0:* LISTEN 16151/qpidd [root@dhcp-lab-200 bz473308]# pid2=$(netstat -nlp | grep qpidd | grep ${port2} | awk '{print $NF}' | \ > awk -F/ '{print $(NF-1)}') [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# sleep 3 [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port2 Management Tool for QPID qpid: schema Classes in Schema: Class Properties Statistics Methods ==================================================================== com.redhat.rhm.store:journal 12 29 1 com.redhat.rhm.store:store 12 10 0 org.apache.qpid.broker:agent 7 1 0 org.apache.qpid.broker:binding 6 2 0 org.apache.qpid.broker:bridge 13 1 1 org.apache.qpid.broker:broker 10 2 3 org.apache.qpid.broker:connection 10 6 1 org.apache.qpid.broker:exchange 6 13 0 org.apache.qpid.broker:link 6 3 2 org.apache.qpid.broker:queue 7 28 1 org.apache.qpid.broker:session 9 7 4 org.apache.qpid.broker:system 7 1 0 org.apache.qpid.broker:vhost 4 1 0 org.apache.qpid.cluster:cluster 10 1 2 qpid: show 111 Object of type org.apache.qpid.broker:queue: (last sample time: 09:57:26) Type Element 111 ==================================================================================== property vhostRef 103 property name MY_QUEUE property durable True property autoDelete False property exclusive False property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L} statistic msgTotalEnqueues 0 messages statistic msgTotalDequeues 0 statistic msgTxnEnqueues 0 statistic msgTxnDequeues 0 statistic msgPersistEnqueues 0 statistic msgPersistDequeues 0 statistic msgDepth 0 statistic byteDepth 0 octets statistic byteTotalEnqueues 0 statistic byteTotalDequeues 0 statistic byteTxnEnqueues 0 statistic byteTxnDequeues 0 statistic bytePersistEnqueues 0 statistic bytePersistDequeues 0 statistic consumerCount 1 consumer statistic consumerCountHigh 1 statistic consumerCountLow 1 statistic bindingCount 2 bindings statistic bindingCountHigh 2 statistic bindingCountLow 2 statistic unackedMessages 0 messages statistic unackedMessagesHigh 0 statistic unackedMessagesLow 0 statistic messageLatencySamples 0 statistic messageLatencyMin 0 statistic messageLatencyMax 0 statistic messageLatencyAverage 0 qpid: quit Exiting... [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port1 Management Tool for QPID qpid: show 111 Id not known: 111 qpid: show 111 Id not known: 111 qpid: show 111 Object of type org.apache.qpid.broker:queue: (last sample time: 09:56:04) Type Element 111 ==================================================================================== property vhostRef 103 property name MY_QUEUE property durable True property autoDelete False property exclusive False property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L} statistic msgTotalEnqueues 5 messages statistic msgTotalDequeues 0 statistic msgTxnEnqueues 0 statistic msgTxnDequeues 0 statistic msgPersistEnqueues 5 statistic msgPersistDequeues 0 statistic msgDepth 5 statistic byteDepth 1280 octets statistic byteTotalEnqueues 1280 statistic byteTotalDequeues 0 statistic byteTxnEnqueues 0 statistic byteTxnDequeues 0 statistic bytePersistEnqueues 1280 statistic bytePersistDequeues 0 statistic consumerCount 1 consumer statistic consumerCountHigh 1 statistic consumerCountLow 1 statistic bindingCount 2 bindings statistic bindingCountHigh 2 statistic bindingCountLow 2 statistic unackedMessages 0 messages statistic unackedMessagesHigh 0 statistic unackedMessagesLow 0 statistic messageLatencySamples 0 statistic messageLatencyMin 0 statistic messageLatencyMax 0 statistic messageLatencyAverage 0 qpid: quit Exiting... [root@dhcp-lab-200 bz473308]# jobs [1] Running qpidd --auth no --data-dir ./data1 --log-enable debug+ -p ${port1} --cluster-name ${cname} > qpidd1.log 2>&1 & [2]- Running ./receiver -p ${port1} --queue ${q_name} --messages 10 --ack-frequency 10 > receiver.log 2>&1 & [3]+ Running qpidd --auth no --data-dir ./data2 --log-enable debug+ -p ${port2} --cluster-name ${cname} > qpidd2.log 2>&1 & [root@dhcp-lab-200 bz473308]# kill -2 ${pid1} ${pid2} [root@dhcp-lab-200 bz473308]# [1] Done qpidd --auth no --data-dir ./data1 --log-enable debug+ -p ${port1} --cluster-name ${cname} > qpidd1.log 2>&1 [2]- Exit 1 ./receiver -p ${port1} --queue ${q_name} --messages 10 --ack-frequency 10 > receiver.log 2>&1 [3]+ Done qpidd --auth no --data-dir ./data2 --log-enable debug+ -p ${port2} --cluster-name ${cname} > qpidd2.log 2>&1 [root@dhcp-lab-200 bz473308]# echo "restart broker II" restart broker II [root@dhcp-lab-200 bz473308]# qpidd --auth no --data-dir ./data2 --log-enable debug+ \ > -p ${port2} --cluster-name ${cname} >qpidd2b.log 2>&1 & [1] 16201 [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# sleep 3 [root@dhcp-lab-200 bz473308]# netstat -nlp | grep qpidd | grep ${port2} tcp 0 0 0.0.0.0:5814 0.0.0.0:* LISTEN 16201/qpidd [root@dhcp-lab-200 bz473308]# pid2=$(netstat -nlp | grep qpidd | grep ${port2} | awk '{print $NF}' | \ > awk -F/ '{print $(NF-1)}') [root@dhcp-lab-200 bz473308]# [root@dhcp-lab-200 bz473308]# sleep 3 [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port1 Socket Error (localhost:5813): Connection refused [root@dhcp-lab-200 bz473308]# qpid-tool localhost:$port2 Management Tool for QPID qpid: schema Classes in Schema: Class Properties Statistics Methods ==================================================================== com.redhat.rhm.store:journal 12 29 1 com.redhat.rhm.store:store 12 10 0 org.apache.qpid.broker:agent 7 1 0 org.apache.qpid.broker:binding 6 2 0 org.apache.qpid.broker:bridge 13 1 1 org.apache.qpid.broker:broker 10 2 3 org.apache.qpid.broker:connection 10 6 1 org.apache.qpid.broker:exchange 6 13 0 org.apache.qpid.broker:link 6 3 2 org.apache.qpid.broker:queue 7 28 1 org.apache.qpid.broker:session 9 7 4 org.apache.qpid.broker:system 7 1 0 org.apache.qpid.broker:vhost 4 1 0 org.apache.qpid.cluster:cluster 10 1 2 qpid: show 111 Object of type org.apache.qpid.broker:queue: (last sample time: 09:58:34) Type Element 111 ==================================================================================== property vhostRef 103 property name MY_QUEUE property durable True property autoDelete False property exclusive False property arguments {u'qpid.file_size': 24L, u'qpid.file_count': 8L} statistic msgTotalEnqueues 5 messages statistic msgTotalDequeues 0 statistic msgTxnEnqueues 0 statistic msgTxnDequeues 0 statistic msgPersistEnqueues 5 statistic msgPersistDequeues 0 statistic msgDepth 5 statistic byteDepth 1280 octets statistic byteTotalEnqueues 1280 statistic byteTotalDequeues 0 statistic byteTxnEnqueues 0 statistic byteTxnDequeues 0 statistic bytePersistEnqueues 1280 statistic bytePersistDequeues 0 statistic consumerCount 0 consumers statistic consumerCountHigh 0 statistic consumerCountLow 0 statistic bindingCount 2 bindings statistic bindingCountHigh 2 statistic bindingCountLow 2 statistic unackedMessages 0 messages statistic unackedMessagesHigh 0 statistic unackedMessagesLow 0 statistic messageLatencySamples 0 statistic messageLatencyMin 0 statistic messageLatencyMax 0 statistic messageLatencyAverage 0 qpid: quit Exiting... [root@dhcp-lab-200 bz473308]# ./receiver -p ${port2} --queue MY_QUEUE --messages 10 \ > --ack-frequencyroot@dhcp-lab-200 bz473308]# qpid-tool localhost:$port2 Management Tool for QPID qpid: help Exiting... [root@dhcp-lab-200 bz473308]# jobs [1]+ Running qpidd --auth no --data-dir ./data2 --log-enable debug+ -p ${port2} --cluster-name ${cname} > qpidd2b.log 2>&1 & [root@dhcp-lab-200 bz473308]# fg qpidd --auth no --data-dir ./data2 --log-enable debug+ -p ${port2} --cluster-name ${cname} > qpidd2b.log 2>&1
Fixed (primarily by r779183).
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: When a new broker joins a cluster, messages to the cluster that had been acquired but had not yet been either accepted or released were not being communicated to the newbie. Consequence: Several aspects of broker state ( store state, policy state, mgmt stats ) could be inaccurate in the newbie. Fix: Make sure that non-completed messages -- as well as all queue properties -- are replicated to newly joining cluster members. Result: The state of clustered brokers viewable in management tools no longer occasionally diverges.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,13 +1 @@ -Cause: +When a new broker joined a cluster, messages to the cluster which had been acquired—but not yet either accepted or released—were not communicated to the new broker. As a result, the state of the newly-joined broker could have shown inaccuracies in the management tools. This update ensures that non-completed messages and queue properties are replicated to newly-joined cluster members, with the result that new broker state no longer occasionally diverges.-When a new broker joins a cluster, messages to the cluster that had been acquired but had not yet been either accepted or released were not being communicated to the newbie. - -Consequence: -Several aspects of broker state ( store state, policy state, mgmt stats ) could be inaccurate in the newbie. - - -Fix: -Make sure that non-completed messages -- as well as all queue properties -- are replicated to newly joining cluster members. - - -Result: -The state of clustered brokers viewable in management tools no longer occasionally diverges.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -When a new broker joined a cluster, messages to the cluster which had been acquired—but not yet either accepted or released—were not communicated to the new broker. As a result, the state of the newly-joined broker could have shown inaccuracies in the management tools. This update ensures that non-completed messages and queue properties are replicated to newly-joined cluster members, with the result that new broker state no longer occasionally diverges.+When a new broker joined a cluster, messages to the cluster which had been acquired—but not yet either accepted or released—were not communicated to the new broker. As a result, the graphical management tools could have shown state inaccuracies. This update ensures that non-completed messages and queue properties are replicated to newly-joined cluster members, with the result that new broker state no longer occasionally diverges.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0773.html