Bug 1369835

Summary: qdrouterd memory usage too high - killed by oom-killer (some memory leak?)
Product: Red Hat Satellite Reporter: Jan Hutař <jhutar>
Component: OtherAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: jcallaha
Severity: high Docs Contact:
Priority: urgent    
Version: 6.2.0CC: abalakht, aperotti, bbuckingham, bill.scherer, bkearney, cduryee, egolov, jcallaha, jentrena, jhutar, mcressma, mdekan, mmccune, parmstro, pmoravec, psuriset, rdixon, sauchter
Target Milestone: UnspecifiedKeywords: FieldEngineering, Performance, PrioBumpField, Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-dispatch-0.4-21 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-21 16:54:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jan Hutař 2016-08-24 14:07:14 UTC
Description of problem:
On 3 capsules qdrouterd was killed by oom-killer after 2k (2k on each, 6k in total) clients got errata applied.


Version-Release number of selected component (if applicable):
Satellite 6.2.1


How reproducible:
always on our setup


Steps to Reproduce:
1. Register 6k clients via 3 capsules
2. Apply errata on them


Actual results:
After some time, qdrouterd on capsules gets killed


Expected results:
qdrouterd keeps only memory it requires

Comment 19 Mike Cressman 2017-02-03 15:03:07 UTC
There were various leaks and multiple BZs that we believe are hitting the same issues.  For router, we think this (1369835) and 1368718 have been covered by the fix for 1358948.  This BZ was originally from 6.2.1, while the latest qpid-dispatch (0.4-21) was shipped in December errata https://access.redhat.com/errata/RHBA-2016:2855.  The last scale test showed some bad results, but the bad capsules had the wrong packages installed.  We'd like to see testing with all the latest packages and proceed from there.

Comment 29 Pavel Moravec 2017-12-08 19:29:22 UTC
(In reply to Pavel Moravec from comment #28)
> ..

.. and that resulted into bz1523793 .

When now reviewing setup for this BZ, I tend to state both the BZs describe the same underlying bug. Since here:

- different qdrouterd versions were used, causing the routers were not able to communicate / route messages or link routing requests
  - that is another way of triggering 'qd:no-route-to-dest' or another link error that the router on Capsule sends to the client

While bz1523793 is triggered by inability of proper link routing towards qpidd while (goferd) clients are repeatedly requesting those links.



Since not having any better reproducer or explanation here, I suggest closing either this or the other BZ as a duplicate of the other one.

Comment 30 jcallaha 2018-02-08 20:25:13 UTC
Verified in Satellite 6.3 Snap 35. 

Based on comment https://bugzilla.redhat.com/show_bug.cgi?id=1523793#c5, this has been fixed in qpid-dispatch-router-0.4-28+

Currently packaged version in Sat 6.3 Snap 35 is qpid-dispatch-router-0.8.0-16.

Additionally, there have been no observed breakages during testing.

Comment 31 jcallaha 2018-02-08 20:26:12 UTC
*** Bug 1523793 has been marked as a duplicate of this bug. ***

Comment 32 Satellite Program 2018-02-21 16:54:17 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA.
> 
> For information on the advisory, and where to find the updated files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2018:0336