Bug 1754314
| Summary: | memory leak in qpid-proton 0.28.0-1 libraries used by goferd when conection to qdrouterd is bounced | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Pavel Moravec <pmoravec> | ||||
| Component: | Qpid | Assignee: | Mike Cressman <mcressma> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Radovan Drazny <rdrazny> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 6.5.0 | CC: | aeladawy, bkearney, bvassova, christian.klier, cjansen, dsynk, fhirtz, gkadam, gmurthy, gpadholi, gpayelka, greartes, hmore, jalviso, kagarwal, ktordeur, kupadhya, mawerner, mcressma, mkalyat, mmccune, momran, mschibli, mvanderw, patalber, pcreech, pdwyer, rcavalca, sadas, saydas, shisingh, skudupud, spetrosi, sraut, vmeghana, wclark, whitedm | ||||
| Target Milestone: | 6.7.0 | Keywords: | Regression, Triaged | ||||
| Target Release: | Unused | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | hotfix_delivered | ||||||
| Fixed In Version: | qpid-proton-0.28.0-2.{el6,el7,el8} | Doc Type: | Known Issue | ||||
| Doc Text: |
Satellite hosts that use katello-agent might experience a memory leak caused by the qpid-proton package.
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 1769895 1774268 (view as bug list) | Environment: | |||||
| Last Closed: | 2020-04-14 13:25:37 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Reproducer script outside Satellite:
1) Have qdrouterd with link routing everything (or at least prefix pulp.*) to qpidd.
2) qpidd having queue pulp.agent.TEST.2
3) scenarios:
- A: use SSL in both qdrouterd and the client program (it's code is below), run the client and restart qdrouterd frequently. The client will be reconnecting automatically.
- B: disable SSL in qdrouterd, leave it enabled in client, and run the client; it will be repeatedly failing to connect as qdrouterd will reject "SSL rubbish" on plain AMQP connection.
- C: disable SSL also in the client (just set "SSL = False" in the client code), run the client and restart qdrouterd frequently. The client will be reconnecting automatically.
In either A, B or C scenario:
- when using 0.26.0-3.el7 proton libraries on the client, no memory leak (a tiny mem.growth is observed, sometimes stabilised after 15mins)
- when using 0.28.0-1.el7 proton libraries on the client, evident mem.leak is observed
the script itself:
from proton import Timeout
from proton.utils import BlockingConnection
from proton import SSLDomain
from time import sleep
from uuid import uuid4
from gofer.config import Config
RHSM_CONFIG_PATH = '/etc/rhsm/rhsm.conf'
SSL = True
SSL_S = 'amqps' if SSL else 'proton+amqp'
domain = SSLDomain(SSLDomain.MODE_CLIENT)
domain.set_trusted_ca_db('/etc/rhsm/ca/katello-default-ca.pem')
domain.set_credentials('/etc/pki/consumer/bundle.pem', '/etc/pki/consumer/bundle.pem', None)
domain.set_peer_authentication(SSLDomain.ANONYMOUS_PEER)
rhsm_conf = Config(RHSM_CONFIG_PATH)
ROUTER_ADDRESS = '%s://pmoravec-sat65-on-rhev.gsslab.brq2.redhat.com:5647' % SSL_S
ADDRESS = "pulp.agent.TEST.2"
HEARTBEAT = 5
SLEEP = 5
recv = None
conn = None
while True:
subscribed = False
while not subscribed:
try:
conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=domain if SSL else None, heartbeat=HEARTBEAT)
recv = conn.create_receiver(ADDRESS, name=str(uuid4()), dynamic=False, options=None)
subscribed = True
except Exception, e:
print "received exception %s on connect/subscribe, trying again in 0.5s" % e
sleep(0.5)
print "connected => running"
while subscribed:
try:
print recv.receive(SLEEP)
except Timeout:
pass
except Exception, e:
print e
try:
recv.close()
recv = None
except:
pass
try:
conn.close()
conn = None
except:
pass
subscribed = False
(the A and C reproducer scenarios differ only in usage of SSL - that proves the proton memory leak is not in SSL part of the proton code) Good morning! How goes progress on a test build and/or candidate? Thanks again, Frank. Created attachment 1632607 [details]
RHEL7 Hotfix RPMs
Hotfix is available for RHEL7. To install:
1. Download attached file qpid-proton-HF1754314-RHEL7.tar.gz and extract it
2. Copy the two RPMs inside the archive to each affected RHEL7 gofer client
3. on each client, # yum localinstall ./python-qpid-proton-0.28.0-2.el7.x86_64.rpm ./qpid-proton-c-0.28.0-2.el7.x86_64.rpm
4. on each client, # systemctl restart goferd
Tested with python-qpid-proton-0.28.0-2.el7.x86_64 from the Sat 6.7 Snap 5 Sat Tools using the reproducer from the initial report, using the option 3, and lowered DELAY and MAX_DELAY vars. After a initial memory init, the memory usage settled up, and remained completely constant even after a few hundreds failed attempts to connect. VERIFIED If you report a memory leak on 0.28.0-2 version: 1) ensure what the symptoms are (qdrouterd was restarted? goferd logs like described?) 2) check if https://bugzilla.redhat.com/show_bug.cgi?id=1810549 is not hit, rather (different scenario, present on any recent qpid-proton version) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1454 |
Description of problem: Two scenarios show a memory leak in goferd process that is not present when downgrading qpid-proton libraries from 0.28.0-1.el7 to 0.26.0-3.el7. So the leak is expected to be in python-qpid-proton-0.28.0-1.el7 or qpid-proton-c-0.28.0-1.el7 packages as a regression. Reproducer using Satellite (outside Satellite can be provided later, if needed): either regularly restart qdrouterd, or wrongly set up SSL certs - any connection bounce of goferd consumes 2.5M-3M extra RSS memory. Version-Release number of selected component (if applicable): current Sat6.5 tools, in particular: python-gofer-2.12.5-3.el7sat.noarch gofer-2.12.5-3.el7sat.noarch qpid-proton-c-0.28.0-1.el7.x86_64 python-gofer-proton-2.12.5-3.el7sat.noarch python-qpid-proton-0.28.0-1.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Have a katello agent running. 2. Every 11s, restart qdrouterd where the goferd is connecting to. 3. Alternatively to 2. disable SSL on the qdrouterd listening port and leave goferd trying to connect over SSL (with a failure). To do so, just comment out in /etc/qpid-dispatch/qdrouterd.conf: listener { port: 5647 sasl-mechanisms: ANONYMOUS # ssl-profile: server ### comment out this line } To speed-up reproducers, one can increase frequency of the reconnects by updating in /usr/lib/python2.7/site-packages/gofer/messaging/adapter/connect.py : DELAY = 2 # was 10 MAX_DELAY = 2 # was 90 Actual results: Either 2. or 3. scenario (both are fully independent ones) show the mem.leak on each and every reconnection attempt. Expected results: obviously no mem.leak :) Additional info: