Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1816831 - Memory leak in python-proton library when goferd connections are timeouted due to TCP packets dropped
Summary: Memory leak in python-proton library when goferd connections are timeouted du...
Keywords:
Status: CLOSED DUPLICATE of bug 1810549
Alias: None
Product: Red Hat Satellite
Classification: Red Hat
Component: Qpid
Version: 6.6.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: Unspecified
Assignee: Mike Cressman
QA Contact: vijsingh
URL:
Whiteboard:
: 1816829 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-24 20:09 UTC by Amar Ganbat
Modified: 2023-10-06 19:29 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-04-14 08:01:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Amar Ganbat 2020-03-24 20:09:40 UTC
Description of problem:

The content host's memory consumption is happening and affecting the content host after a while (weeks or months) according to the usage.
Even with the updated the proton packages.  

~~~
qpid-proton-c-0.28.0-2.el7.x86_64
python-qpid-proton-0.28.0-2.el7.x86_64
~~~


How reproducible:
Install the latest version of katello-agent and monitor the usage.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Brad Buckingham 2020-03-27 14:14:59 UTC
*** Bug 1816829 has been marked as a duplicate of this bug. ***

Comment 4 Brad Buckingham 2020-03-27 14:17:40 UTC
Hello Amar,

Can you provide some more information to help with investigating the bugzilla?
Do you have reproducer steps or a reproducer environment?
Are there logs from the server or client that would help with investigation?  (e.g. foreman-debug)
Is there are there system statistics that can be shared? (e.g. memory usage, where did it start, client activities, where did it end...etc?)

Comment 8 Pavel Moravec 2020-04-13 15:33:06 UTC
Here is a reproducer inside Satellite:

1) Start goferd on a client, ensure everything is connected properly (this step is must)

2) Simulate a networking issue where packets towards Sat/Caps port 5647 are dropped:

iptables -I INPUT -p tcp --dport 5647 -j DROP

3) /var/log/messages on the client system notice it:

Apr 13 17:25:53 pmoravec-rhel76 goferd: [WARNING][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.proton.reliability:54 - Connection amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647 disconnected: Condition('amqp:resource-limit-exceeded', 'local-idle-timeout expired')

4) after a short time (probably in the 10s), a reconnect attempt happens, but it is timeouted:
Apr 13 17:26:03 pmoravec-rhel76 goferd: [INFO][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.proton.connection:131 - closed: proton+amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647
Apr 13 17:26:03 pmoravec-rhel76 goferd: [INFO][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647
Apr 13 17:26:03 pmoravec-rhel76 goferd: [INFO][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 - Connection amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647 timed out: Opening connection
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 - Traceback (most recent call last):
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py", line 40, in _fn
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     return fn(*args, **keywords)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py", line 611, in get
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     return self._impl.get(timeout)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/reliability.py", line 44, in _fn
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     repair()
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/consumer.py", line 81, in repair
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     self.connection.open()
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/connect.py", line 29, in inner
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     impl = fn(connection)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/connection.py", line 91, in open
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     ssl_domain=domain)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib64/python2.7/site-packages/proton/_utils.py", line 245, in __init__
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     msg="Opening connection")
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -   File "/usr/lib64/python2.7/site-packages/proton/_utils.py", line 314, in wait
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 -     raise Timeout(txt)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.adapter.model:44 - Timeout: Connection amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647 timed out: Opening connection
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 - pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 - Traceback (most recent call last):
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 98, in read
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -     message, document = reader.next(wait)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py", line 40, in _fn
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -     return fn(*args, **keywords)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py", line 645, in next
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -     message = self.get(timeout)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -   File "/usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py", line 45, in _fn
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 -     raise ModelError(*e.args)
Apr 13 17:26:04 pmoravec-rhel76 goferd: [ERROR][pulp.agent.32279d52-1393-4e0a-b19f-b0258f7d86c2] gofer.messaging.consumer:112 - ModelError: Connection amqps://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5647 timed out: Opening connection

5) after some time, the above logs re-occur

6) meantime, measure memory usage of goferd - it grows very linearly



TODO on me: prepare a reproducer outside Satellite / goferd, purely using proton (here I assume the leak is in proton, which does not need to be the case)

Comment 9 Pavel Moravec 2020-04-13 20:13:12 UTC
And here is a reproducer outside Satellite, against qpid-proton-c-0.28.0-2.el7.x86_64 :

1) Have qpidd running with queue pulp.TEST.0
2) Have qdrouterd link-routing everything to qpidd
3) modify in below script ROUTER_ADDRESS to the qdrouterd (I *think* the script below can connect directly to qpidd and bypass qdrouterd)

---------------8<---------

import random
import threading

from proton import Timeout
from proton.utils import BlockingConnection

from time import sleep
from uuid import uuid4

ROUTER_ADDRESS = 'proton+amqp://pmoravec-sat66-on-rhev.gsslab.brq2.redhat.com:5648'
ADDRESS = "pulp.TEST"
HEARTBEAT = 5
SLEEP = 2.0
THREADS = 1
ADDRESSES = 20
 
class ReceiverThread(threading.Thread):
    def __init__(self, address=ADDRESS):
        super(ReceiverThread, self).__init__()
        self.address = address
        self.running = True

    def subscribe(self):
	self.recv = self.conn.create_receiver('%s' %(self.address), name=str(uuid4()), dynamic=False, options=None)
 
    def connect(self):
        connected = False
        while not connected:
            try:
                self.conn = BlockingConnection(ROUTER_ADDRESS, ssl_domain=None, heartbeat=HEARTBEAT, timeout=3)
                connected = True
            except Exception as e:
                print("received exception %s on connect, trying again in 0.5s" % e)
                sleep(0.5)

	self.subscribe()
 
    def reconnect(self):
        try:
            self.recv.close()
	except:
	    pass
        try:
    	    self.conn.close()
	except:
            pass
        self.connect()
 
    def run(self):
        subscribed = False
        while not subscribed:
            try:
                self.connect()
                subscribed = True
            except:
                print "received exception on connect/subscribe, trying again in 0.5s"
                sleep(0.5)

	print "connected => running"
        while self.running:
	    try:
                msg = None
                print "receive with timeout %s" % SLEEP
		msg = self.recv.receive(timeout=SLEEP)
		if msg:
	            print "(%s): accepting msg %s" % (self.address, msg)
		    self.recv.accept()
		    sleep(timeout)
                else:
                    print "(%s): no msg received" % self.address
	    except KeyboardInterrupt:
                self.running = False
		pass
            except Timeout:
		pass
	    except Exception as e:
		print "(%s): receiver thread: exception %s" % (self.address, e)
		self.reconnect()
		pass
 
    def stop(self):
        self.running = False
 
threads = []
for i in range(THREADS):
  threads.append(ReceiverThread('%s.%s' %(ADDRESS, i%ADDRESSES)))
  threads[i].start()

running = True
while running:
    try:
	sleep(1)
    except KeyboardInterrupt:
        running = False
	pass
#_in = raw_input("Press Enter to exit:")
 
for i in range(THREADS):
    threads[i].stop()
for i in range(THREADS):
    threads[i].join()

---------------8<---------


4) ensure all ports are opened on firewall, and run the python client script
5) Once it connects and logs "receive with timeout 2.0", add iptables rule:

iptables -I INPUT -p tcp --dport 5648 -j DROP

on the system with qdrouterd/qpidd, to drop incoming traffic from the client

6) monitor memory usage of the client, trying to reconnect after a timeout exception - it will grow

7) delete the iptables rule and observe that memory usage of the script is stabilised

8) goto 5) :)

Comment 10 Pavel Moravec 2020-04-14 08:01:38 UTC
.. and a memory leak in my head hid me the fact the same reproducer for same circumstances applied already to https://bugzilla.redhat.com/show_bug.cgi?id=1810549

Closing this BZ as a DUPLICATE of https://bugzilla.redhat.com/show_bug.cgi?id=1810549 .

*** This bug has been marked as a duplicate of bug 1810549 ***


Note You need to log in before you can comment on or make changes to this bug.