1479579 – qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a message

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1479579 - qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a message

Summary: qpidd memory accumulation after ListenOnCandlepinEvents forgot to accept a me...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Qpid
Sub Component:
Version:	6.2.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	low
Target Milestone:	Unspecified
Assignee:	Mike Cressman
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-08-08 21:03 UTC by Pavel Moravec
Modified:	2020-09-10 11:10 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-08-10 14:18:02 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1440235	0	high	CLOSED	candlepin event listener does not acknowledge every 100th message	2022-07-09 09:21:54 UTC
Red Hat Knowledge Base (Solution)	3145381	0	None	None	None	2017-08-09 08:50:59 UTC

Internal Links: 1440235

Description Pavel Moravec 2017-08-08 21:03:13 UTC

Description of problem:
Assume ListenOnCandlepinEvents forgets to accept some message it acquires (this has been seen several times, no reproducer thus no bugzilla for this ATM). While subsequent messages are accepted. This triggers(*) memory accumulation in qpidd broker in practically any version I tried (at least 0.34 and 1.36).

Basically passing through properly next messages via that queue (that has acquired but not acknowledged messages) triggers the mem.leak. Standalone reproducer outside Satellite is provided.

(*) among accumulation of journal files for katello_event_queue and thus consuming disk space - due to linearstore limitation "return to EFP just the latest journal file, if it is empty" - just a side effect here


Version-Release number of selected component (if applicable):
qpid-cpp-server 0.34-* and also 1.36-0.6


How reproducible:
100%


Steps to Reproduce:
1. run below commands; basically have 1 producer and 1 active consumer, plus one consumer that gets stuck with some acquired messages (see "kill -SIGSTOP")

qpid-config add queue q
qpid-send --send-rate=100 -a q -m1000000 &
sleep 1
qpid-receive -a q -f --print-content=no &
pid=$! 
sleep 1
qpid-receive -a q -f --print-content=no &
sleep 1
kill -SIGSTOP $pid 

2. monitor RSS of qpidd
3. optionally re-run sender if it finishes


Actual results:
RSS of qpidd grows over time


Expected results:
RSS to be stable


Additional info:
See that the acquired-forever messages can be acquired by the same (in Satellite6 case) or by a different consumer. So the problem shall not be consumer or subscription related.

Comment 3 Gordon Sim 2017-08-10 07:53:44 UTC

this is a result of the way queues are implemented. Basically they are deques, with the ability to define a cursor as an offset into that deque. When messages are deleted from the queue, they are marked as such, but are only deleted from the front of the deque (deleting from the middle would mess up the index).

Therefore if a message remains unacknowledged, the deque will grow until it is acknowledged.

We could introduce a new feature that prevents that acquired message getting stuck/forgotten. E.g. by releasing it (or DLQing it, or even just deleting it) after some time or when the gap in the sequence between acquired/available messages gets to some limit, which would let the message be consumed by another consumer.

However, I don't think this is worth doing just as a solution to this bug. It would be better to just reject or release the messages in question (if rejecting you can set up a DLQ to collect the messages if necessary, but of course there needs to be then some process in place for managing those messages such that they do not build up).

Note You need to log in before you can comment on or make changes to this bug.