Bug 1479579 - qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a message
qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a me...
Status: CLOSED WONTFIX
Product: Red Hat Satellite 6
Classification: Red Hat
Component: Qpid (Show other bugs)
6.2.10
Unspecified Unspecified
high Severity low (vote)
: Unspecified
: --
Assigned To: Mike Cressman
Katello QA List
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-08 17:03 EDT by Pavel Moravec
Modified: 2017-08-14 09:59 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-10 10:18:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3145381 None None None 2017-08-09 04:50 EDT

  None (edit)
Description Pavel Moravec 2017-08-08 17:03:13 EDT
Description of problem:
Assume ListenOnCandlepinEvents forgets to accept some message it acquires (this has been seen several times, no reproducer thus no bugzilla for this ATM). While subsequent messages are accepted. This triggers(*) memory accumulation in qpidd broker in practically any version I tried (at least 0.34 and 1.36).

Basically passing through properly next messages via that queue (that has acquired but not acknowledged messages) triggers the mem.leak. Standalone reproducer outside Satellite is provided.

(*) among accumulation of journal files for katello_event_queue and thus consuming disk space - due to linearstore limitation "return to EFP just the latest journal file, if it is empty" - just a side effect here


Version-Release number of selected component (if applicable):
qpid-cpp-server 0.34-* and also 1.36-0.6


How reproducible:
100%


Steps to Reproduce:
1. run below commands; basically have 1 producer and 1 active consumer, plus one consumer that gets stuck with some acquired messages (see "kill -SIGSTOP")

qpid-config add queue q
qpid-send --send-rate=100 -a q -m1000000 &
sleep 1
qpid-receive -a q -f --print-content=no &
pid=$! 
sleep 1
qpid-receive -a q -f --print-content=no &
sleep 1
kill -SIGSTOP $pid 

2. monitor RSS of qpidd
3. optionally re-run sender if it finishes


Actual results:
RSS of qpidd grows over time


Expected results:
RSS to be stable


Additional info:
See that the acquired-forever messages can be acquired by the same (in Satellite6 case) or by a different consumer. So the problem shall not be consumer or subscription related.
Comment 3 Gordon Sim 2017-08-10 03:53:44 EDT
this is a result of the way queues are implemented. Basically they are deques, with the ability to define a cursor as an offset into that deque. When messages are deleted from the queue, they are marked as such, but are only deleted from the front of the deque (deleting from the middle would mess up the index).

Therefore if a message remains unacknowledged, the deque will grow until it is acknowledged.

We could introduce a new feature that prevents that acquired message getting stuck/forgotten. E.g. by releasing it (or DLQing it, or even just deleting it) after some time or when the gap in the sequence between acquired/available messages gets to some limit, which would let the message be consumed by another consumer.

However, I don't think this is worth doing just as a solution to this bug. It would be better to just reject or release the messages in question (if rejecting you can set up a DLQ to collect the messages if necessary, but of course there needs to be then some process in place for managing those messages such that they do not build up).

Note You need to log in before you can comment on or make changes to this bug.