Bug 1479579

Summary: qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a message
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: QpidAssignee: Mike Cressman <mcressma>
Status: CLOSED WONTFIX QA Contact: Katello QA List <katello-qa-list>
Severity: low Docs Contact:
Priority: high    
Version: 6.2.10CC: andrew.schofield, cduryee, gsim, jross
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-10 14:18:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2017-08-08 21:03:13 UTC
Description of problem:
Assume ListenOnCandlepinEvents forgets to accept some message it acquires (this has been seen several times, no reproducer thus no bugzilla for this ATM). While subsequent messages are accepted. This triggers(*) memory accumulation in qpidd broker in practically any version I tried (at least 0.34 and 1.36).

Basically passing through properly next messages via that queue (that has acquired but not acknowledged messages) triggers the mem.leak. Standalone reproducer outside Satellite is provided.

(*) among accumulation of journal files for katello_event_queue and thus consuming disk space - due to linearstore limitation "return to EFP just the latest journal file, if it is empty" - just a side effect here


Version-Release number of selected component (if applicable):
qpid-cpp-server 0.34-* and also 1.36-0.6


How reproducible:
100%


Steps to Reproduce:
1. run below commands; basically have 1 producer and 1 active consumer, plus one consumer that gets stuck with some acquired messages (see "kill -SIGSTOP")

qpid-config add queue q
qpid-send --send-rate=100 -a q -m1000000 &
sleep 1
qpid-receive -a q -f --print-content=no &
pid=$! 
sleep 1
qpid-receive -a q -f --print-content=no &
sleep 1
kill -SIGSTOP $pid 

2. monitor RSS of qpidd
3. optionally re-run sender if it finishes


Actual results:
RSS of qpidd grows over time


Expected results:
RSS to be stable


Additional info:
See that the acquired-forever messages can be acquired by the same (in Satellite6 case) or by a different consumer. So the problem shall not be consumer or subscription related.

Comment 3 Gordon Sim 2017-08-10 07:53:44 UTC
this is a result of the way queues are implemented. Basically they are deques, with the ability to define a cursor as an offset into that deque. When messages are deleted from the queue, they are marked as such, but are only deleted from the front of the deque (deleting from the middle would mess up the index).

Therefore if a message remains unacknowledged, the deque will grow until it is acknowledged.

We could introduce a new feature that prevents that acquired message getting stuck/forgotten. E.g. by releasing it (or DLQing it, or even just deleting it) after some time or when the gap in the sequence between acquired/available messages gets to some limit, which would let the message be consumed by another consumer.

However, I don't think this is worth doing just as a solution to this bug. It would be better to just reject or release the messages in question (if rejecting you can set up a DLQ to collect the messages if necessary, but of course there needs to be then some process in place for managing those messages such that they do not build up).