1368718 – qpid dispatch router on capsule leaking memory at scale

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1368718 - qpid dispatch router on capsule leaking memory at scale

Summary: qpid dispatch router on capsule leaking memory at scale

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	Performance
Sub Component:
Version:	6.2.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	Unspecified
Assignee:	satellite6-bugs
QA Contact:	Pradeep Kumar Surisetty
Docs Contact:
URL:
Whiteboard:	scale_lab
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-20 17:58 UTC by Pradeep Kumar Surisetty
Modified:	2019-09-26 16:18 UTC (History)
CC List:	15 users (show)
Fixed In Version:	qpid-dispatch-0.4-21, qpid-cpp-0.34-25
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1463801 (view as bug list)
Environment:
Last Closed:	2017-08-10 17:02:29 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
qpid dispatch router mem on bigger capsule (67.68 KB, image/png) 2016-08-20 17:58 UTC, Pradeep Kumar Surisetty	no flags	Details
qpid dispatch router mem on smaller capsule (66.05 KB, image/png) 2016-08-20 17:59 UTC, Pradeep Kumar Surisetty	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2978511	0	None	None	None	2017-03-22 20:10:40 UTC
Red Hat Product Errata	RHBA-2017:2466	0	normal	SHIPPED_LIVE	Satellite 6.2.11 Async Release	2017-08-10 21:01:20 UTC

Description Pradeep Kumar Surisetty 2016-08-20 17:58:47 UTC

Created attachment 1192484 [details]
qpid dispatch router mem on bigger capsule

Description of problem:

Have registered 5000 content hosts aginst 2 capsules

1) with recommended h/w ( 2 cpus,  8G mem) 
2) With better config ( 8 cpus, 16G mem)

After registration is done, qpid-disptach-router memory is leaking. 
Its gradually increasing 


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Register content hosts at scale against capsule 
2.
3.

Actual results:

Mmeory leak . qpid-disptach-router keep growing  

Expected results:

lesser qpid-disptach-router memory consumption
Additional info:

Comment 1 Pradeep Kumar Surisetty 2016-08-20 17:59:12 UTC

Created attachment 1192485 [details]
qpid dispatch router mem on smaller capsule

Comment 12 Chris Duryee 2016-09-16 13:14:01 UTC

A user was able to induce a qpidd/qrouterd OOM by registering/unregistering 100x, but with gofer 2.5. The user and myself both tried this with later versions of gofer but did not see any issue. For example, I tried the following which seemed to pass without OOM or other issue:

for i in `seq 1 10000`; do subscription-manager register --username admin --password changeme --environment Library && subscription-manager unregister && sleep 2; done

I will retry with 5k clients.

Comment 13 Chris Duryee 2016-09-16 13:14:59 UTC

Also, the test in #12 was direct against a satellite without any capsules. The 5K test will be against a capsule.

Comment 15 Pavel Moravec 2016-10-20 11:13:07 UTC

(In reply to Chris Duryee from comment #12)
> A user was able to induce a qpidd/qrouterd OOM by registering/unregistering
> 100x, but with gofer 2.5.

Did that test include goferd (re)start? If so, then imho bz1367735 hit.

Comment 16 Chris Duryee 2016-10-20 13:29:21 UTC

re #15, I don't know if goferd was restarted, it may have been. If so it explains the behavior.

Comment 29 Pradeep Kumar Surisetty 2017-03-21 11:07:55 UTC

Havent noticed this lately. 
But noticed this again today

Root cause:  

when we have different versions of qpid-dispatch-router on satellite and capsules, memory of qpid-dispatch-router shoots up


for ex in my case;

satellite : qpid-dispatch-router-0.4-21.el7sat.x86_64

capsuels: 
qpid-dispatch-router-0.6.1-5.el7.x86_64

We need to make sure, we have same version. Then we dont see this issue

Comment 33 Pradeep Kumar Surisetty 2017-08-02 14:01:47 UTC

KCS is updated for this. Thats good enough

Comment 35 errata-xmlrpc 2017-08-10 17:02:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2466

Note You need to log in before you can comment on or make changes to this bug.