Hide Forgot
Description of problem: patch for bz1007398 introduced a regression that durable messages discarded due to LVQ limit policy are not removed from journal (and probably not from MessageMap). Version-Release number of selected component (if applicable): qpid 0.18-18 How reproducible: 100% Steps to Reproduce: rm -rf /var/lib/qpidd/*; service qpidd restart address="lvq; {create:always, node:{durable:True, x-declare:{arguments:{'qpid.flow_stop_count': 0, 'qpid.max_count': 0, 'qpid.last_value_queue': True, 'qpid.last_value_queue_key': 'qpid.LVQ_key', 'qpid.flow_resume_count': 0, 'qpid.flow_stop_size': 0, 'qpid.max_size': 1024000000, 'qpid.flow_resume_size': 0}}}}" qpid-send -a "$address" --group-key=qpid.LVQ_key --group-interleave=10 --group-size=3 -m 5000 --durable=yes --content-size=100 qpid-receive -a lvq -m 10000 --print-content=no qpid-stat -q service qpidd restart qpid-stat -q Actual results: 1st qpid-stat shows 0 msg in the queue 2nd qpid-stat, just after broker restart, shows 1.67k messages there Expected results: Both qpid-stats show 0 msg in the queue Additional info: 1) Really _discarded_ messages are kept in the journal. Check e.g. sn (sequence numbers) of messages received in qpid-receive before restrat and after it. Or, see that after restart, queue has 3.33k msgIn (originally had 5.0k). 2) The bug isn't present in MRG 3.0 pre-release or in upstream. Just in MRG 2.3.6.
FYI the bug is _not_ in store module, but in MessageMap <-> Queue call flow for LVQ queue. See below what happens for LVQ, when enqueueing a new message with an LVQ key of some already enqueued message (ideally, the original message should be removed from MessageMap and from store): 1) Queue::deliver calls Queue::push that: a) calls MessageMap::push that calls MessageMap::replace and _removes_ original message from the map (and adds there the new one) b) calls Queue::dequeue (as already stored message needs to be dequeued) 2) Queue::dequeue calls "deleted = messages->deleted(msg);" with return value false - as MessageMap already replaced the original message 3) Hence the method evaluates test: if ((msg.payload->isPersistent() || msg.payload->checkContentReleasable()) && store && deleted) to false and the original message is _not_ removed from the store. Crucial question: why value of "deleted" could determine removing the message from store? Is there some use case when messages->deleted(msg) is false and we should keep the message in store? (is it really flow-to-disk feature? if so, can't we add a nasty fix: deleted = deleted && policy_is_FTD ?)
Pavel, thanks for helping me direct this one to the right place.
(In reply to Pavel Moravec from comment #2) > Crucial question: why value of "deleted" could determine removing the > message from store? Is there some use case when messages->deleted(msg) is > false and we should keep the message in store? The test is required to determine whether the message is still on the queue, e.g. when a message on an LVQ is acked after it has been replaced. The QueueuPolicy::isEnqueued() only protects the ring-queue case, not LVQ. As you point out, in the case that messages are dequeued after being replaced by a new message in push(), this check is not needed and will give the wrong result since the message has already been removed from messages. I've pushed a branch with a proposed fix: http://git.app.eng.bos.redhat.com/rh-qpid.git/log/?h=0.18-mrg_BZ1036650
Tested on RHEL5.10 and RHEL6.5 (both i386 and x86_64). This issue has been fixed. Packages used for testing: RHEL5.10 python-qpid-0.18-8.el5_10 python-qpid-qmf-0.18-20.el5_10 qpid-cpp-client-0.18-20.el5_10 qpid-cpp-client-devel-0.18-20.el5_10 qpid-cpp-client-devel-docs-0.18-20.el5_10 qpid-cpp-client-rdma-0.18-20.el5_10 qpid-cpp-client-ssl-0.18-20.el5_10 qpid-cpp-server-0.18-20.el5_10 qpid-cpp-server-cluster-0.18-20.el5_10 qpid-cpp-server-devel-0.18-20.el5_10 qpid-cpp-server-ha-0.18-20.el5_10 qpid-cpp-server-rdma-0.18-20.el5_10 qpid-cpp-server-ssl-0.18-20.el5_10 qpid-cpp-server-store-0.18-20.el5_10 qpid-cpp-server-xml-0.18-20.el5_10 qpid-java-client-0.18-8.el5_9 qpid-java-common-0.18-8.el5_9 qpid-java-example-0.18-8.el5_9 qpid-jca-0.18-8.el5 qpid-jca-xarecovery-0.18-8.el5 qpid-jca-zip-0.18-8.el5 qpid-qmf-0.18-20.el5_10 qpid-qmf-devel-0.18-20.el5_10 qpid-tests-0.18-2.el5 qpid-tools-0.18-10.el5_9 rh-qpid-cpp-tests-0.18-20.el5_10 ruby-qpid-qmf-0.18-20.el5_10 RHEL6.5 python-qpid-0.18-8.el6 python-qpid-qmf-0.18-20.el6 qpid-cpp-client-0.18-20.el6 qpid-cpp-client-devel-0.18-20.el6 qpid-cpp-client-devel-docs-0.18-20.el6 qpid-cpp-client-rdma-0.18-20.el6 qpid-cpp-client-ssl-0.18-20.el6 qpid-cpp-server-0.18-20.el6 qpid-cpp-server-cluster-0.18-20.el6 qpid-cpp-server-devel-0.18-20.el6 qpid-cpp-server-ha-0.18-20.el6 qpid-cpp-server-rdma-0.18-20.el6 qpid-cpp-server-ssl-0.18-20.el6 qpid-cpp-server-store-0.18-20.el6 qpid-cpp-server-xml-0.18-20.el6 qpid-java-client-0.18-8.el6_4 qpid-java-common-0.18-8.el6_4 qpid-java-example-0.18-8.el6_4 qpid-jca-0.18-8.el6 qpid-jca-xarecovery-0.18-8.el6 qpid-jca-zip-0.18-8.el6 qpid-qmf-0.18-20.el6 qpid-qmf-devel-0.18-20.el6 qpid-tests-0.18-2.el6 qpid-tools-0.18-10.el6_4 rh-qpid-cpp-tests-0.18-20.el6 ruby-qpid-qmf-0.18-20.el6 -> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0130.html