Bug 1036650

Summary: Ensure that replaced LVQ messages are deleted from store
Product: Red Hat Enterprise MRG Reporter: Pavel Moravec <pmoravec>
Component: qpid-cppAssignee: Gordon Sim <gsim>
Status: CLOSED ERRATA QA Contact: Leonid Zhaldybin <lzhaldyb>
Severity: high Docs Contact:
Priority: high    
Version: 2.3CC: esammons, iboverma, jross, lzhaldyb, mcressma, sauchter
Target Milestone: 2.4.4Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: qpid-cpp-0.18-19 Doc Type: Bug Fix
Doc Text:
Cause: The logic used to determine whether a message needed to be removed from the store was incorrect. It did not account for the case where the message had already been removed from the in-memory queue, as is the case when an LVQ message is replaced. Consequence: Durable LVQ messages did not get removed from the store and the journal grew until it reached the limit. Fix: The logic was corrected to account for the case where the message had already been removed from the in-memory queue. Result: Durable LVQ messages are removed from the store when replaced.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-11 08:29:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Pavel Moravec 2013-12-02 12:34:35 UTC
Description of problem:
patch for bz1007398 introduced a regression that durable messages discarded due to LVQ limit policy are not removed from journal (and probably not from MessageMap).


Version-Release number of selected component (if applicable):
qpid 0.18-18


How reproducible:
100%


Steps to Reproduce:
rm -rf /var/lib/qpidd/*; service qpidd restart
address="lvq; {create:always, node:{durable:True, x-declare:{arguments:{'qpid.flow_stop_count': 0, 'qpid.max_count': 0, 'qpid.last_value_queue': True, 'qpid.last_value_queue_key': 'qpid.LVQ_key', 'qpid.flow_resume_count': 0, 'qpid.flow_stop_size': 0, 'qpid.max_size': 1024000000, 'qpid.flow_resume_size': 0}}}}"
qpid-send -a "$address" --group-key=qpid.LVQ_key --group-interleave=10 --group-size=3 -m 5000 --durable=yes --content-size=100
qpid-receive -a lvq -m 10000 --print-content=no
qpid-stat -q
service qpidd restart
qpid-stat -q


Actual results:
1st qpid-stat shows 0 msg in the queue
2nd qpid-stat, just after broker restart, shows 1.67k messages there


Expected results:
Both qpid-stats show 0 msg in the queue


Additional info:
1) Really _discarded_ messages are kept in the journal. Check e.g. sn (sequence numbers) of messages received in qpid-receive before restrat and after it. Or, see that after restart, queue has 3.33k msgIn (originally had 5.0k).
2) The bug isn't present in MRG 3.0 pre-release or in upstream. Just in MRG 2.3.6.

Comment 2 Pavel Moravec 2013-12-06 12:08:08 UTC
FYI the bug is _not_ in store module, but in MessageMap <-> Queue call flow for LVQ queue.

See below what happens for LVQ, when enqueueing a new message with an LVQ key of some already enqueued message (ideally, the original message should be removed from MessageMap and from store):

1) Queue::deliver calls Queue::push that:
  a) calls MessageMap::push that calls MessageMap::replace and _removes_ original message from the map (and adds there the new one)
  b) calls Queue::dequeue (as already stored message needs to be dequeued)
2) Queue::dequeue calls "deleted = messages->deleted(msg);" with return value false - as MessageMap already replaced the original message
3) Hence the method evaluates test:

if ((msg.payload->isPersistent() || msg.payload->checkContentReleasable()) && store && deleted)

to false and the original message is _not_ removed from the store.


Crucial question: why value of "deleted" could determine removing the message from store? Is there some use case when messages->deleted(msg) is false and we should keep the message in store? (is it really flow-to-disk feature? if so, can't we add a nasty fix:

deleted = deleted && policy_is_FTD

?)

Comment 3 Justin Ross 2013-12-09 18:55:02 UTC
Pavel, thanks for helping me direct this one to the right place.

Comment 4 Gordon Sim 2013-12-17 17:23:58 UTC
(In reply to Pavel Moravec from comment #2)
> Crucial question: why value of "deleted" could determine removing the
> message from store? Is there some use case when messages->deleted(msg) is
> false and we should keep the message in store?

The test is required to determine whether the message is still on the queue, e.g. when a message on an LVQ is acked after it has been replaced. The QueueuPolicy::isEnqueued() only protects the ring-queue case, not LVQ.

As you point out, in the case that messages are dequeued after being replaced by a new message in push(), this check is not needed and will give the wrong result since the message has already been removed from messages.

I've pushed a branch with a proposed fix: http://git.app.eng.bos.redhat.com/rh-qpid.git/log/?h=0.18-mrg_BZ1036650

Comment 7 Leonid Zhaldybin 2014-01-27 15:36:48 UTC
Tested on RHEL5.10 and RHEL6.5 (both i386 and x86_64). This issue has been fixed.

Packages used for testing:

RHEL5.10
python-qpid-0.18-8.el5_10
python-qpid-qmf-0.18-20.el5_10
qpid-cpp-client-0.18-20.el5_10
qpid-cpp-client-devel-0.18-20.el5_10
qpid-cpp-client-devel-docs-0.18-20.el5_10
qpid-cpp-client-rdma-0.18-20.el5_10
qpid-cpp-client-ssl-0.18-20.el5_10
qpid-cpp-server-0.18-20.el5_10
qpid-cpp-server-cluster-0.18-20.el5_10
qpid-cpp-server-devel-0.18-20.el5_10
qpid-cpp-server-ha-0.18-20.el5_10
qpid-cpp-server-rdma-0.18-20.el5_10
qpid-cpp-server-ssl-0.18-20.el5_10
qpid-cpp-server-store-0.18-20.el5_10
qpid-cpp-server-xml-0.18-20.el5_10
qpid-java-client-0.18-8.el5_9
qpid-java-common-0.18-8.el5_9
qpid-java-example-0.18-8.el5_9
qpid-jca-0.18-8.el5
qpid-jca-xarecovery-0.18-8.el5
qpid-jca-zip-0.18-8.el5
qpid-qmf-0.18-20.el5_10
qpid-qmf-devel-0.18-20.el5_10
qpid-tests-0.18-2.el5
qpid-tools-0.18-10.el5_9
rh-qpid-cpp-tests-0.18-20.el5_10
ruby-qpid-qmf-0.18-20.el5_10

RHEL6.5
python-qpid-0.18-8.el6
python-qpid-qmf-0.18-20.el6
qpid-cpp-client-0.18-20.el6
qpid-cpp-client-devel-0.18-20.el6
qpid-cpp-client-devel-docs-0.18-20.el6
qpid-cpp-client-rdma-0.18-20.el6
qpid-cpp-client-ssl-0.18-20.el6
qpid-cpp-server-0.18-20.el6
qpid-cpp-server-cluster-0.18-20.el6
qpid-cpp-server-devel-0.18-20.el6
qpid-cpp-server-ha-0.18-20.el6
qpid-cpp-server-rdma-0.18-20.el6
qpid-cpp-server-ssl-0.18-20.el6
qpid-cpp-server-store-0.18-20.el6
qpid-cpp-server-xml-0.18-20.el6
qpid-java-client-0.18-8.el6_4
qpid-java-common-0.18-8.el6_4
qpid-java-example-0.18-8.el6_4
qpid-jca-0.18-8.el6
qpid-jca-xarecovery-0.18-8.el6
qpid-jca-zip-0.18-8.el6
qpid-qmf-0.18-20.el6
qpid-qmf-devel-0.18-20.el6
qpid-tests-0.18-2.el6
qpid-tools-0.18-10.el6_4
rh-qpid-cpp-tests-0.18-20.el6
ruby-qpid-qmf-0.18-20.el6

-> VERIFIED

Comment 9 errata-xmlrpc 2014-02-11 08:29:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0130.html