Bug 1937824

Summary:	memory leak in goferd / qpid proton 0.33 when dropping outgoing packets
Product:	Red Hat Satellite	Reporter:	Pavel Moravec <pmoravec>
Component:	Qpid	Assignee:	Mike Cressman <mcressma>
Status:	CLOSED ERRATA	QA Contact:	Jitendra Yejare <jyejare>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.9.0	CC:	ahumbe, cjansen, osousa, patrick.andrieux, rdesouza
Target Milestone:	6.10.0	Keywords:	Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	qpid-proton-0.33.0-6.el7_9	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2021-11-16 14:10:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Pavel Moravec 2021-03-11 15:50:02 UTC

Description of problem:
If 6.10 will bounce qpid packages to 0.33 for katello-agent/goferd, then we must fix and backport:

https://issues.apache.org/jira/browse/PROTON-2344

Scenario behind this: if changing default heartbeats in qdrouterd or katello-agent such that agent has lower, and when network drops packets until the client detects timeouted connection, there is a leak in CLOSE_WAIT connection and in memory.


Version-Release number of selected component (if applicable):
katello-agent running on:
    qpid-proton-c-0.33.0-1.el7.x86_64
    python2-qpid-proton-0.33.0-1.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Lower heartbeats of goferd by having

[messaging]
heartbeat=7

in both files:

/etc/gofer/plugins/katello.conf
/etc/gofer/agent.conf

2. Restarting goferd
3. Mimicking packet drops:
a="-I"
while true; do
  echo "$(date): setting $a"
  iptables $a OUTPUT -p tcp --dport 5647 -j DROP
  if [ $a = "-I" ]; then
    a="-D"
  else
    a="-I"
  fi
  sleep 10
done

4. monitor memory usage and connections over port 5647 on the client

(optionally, test the reproducer in the JIRA directly)


Actual results:
4. memory usage grows, CLOSE_WAIT "connections" appear


Expected results:
4. no mem.leak or CLOSE_WAITs


Additional info:

Comment 3 Jitendra Yejare 2021-07-05 19:06:36 UTC

Does the repro steps in this bug summary are enough to verify this bug from QE? Or Do you have any other repro steps ?

Comment 4 Jitendra Yejare 2021-07-06 14:49:56 UTC

Verified!

@Satellite 6.10 snap 7

Steps:
----------
Steps mentioned in the bug description.


Behaviour:
-----------

After running the packet drop script mentioned in the description for one and half hours on the client, I observed:

1. The TOP command doesn't show a memory leak. Following are the top memory hoggers :


PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                              
 2584 root      20   0  878544  33568   8940 S   0.0  0.2   0:22.49 python                                                                                                                               
  614 root      20   0  358748  29528   7056 S   0.0  0.1   0:00.97 firewalld                                                                                                                            
 1144 root      20   0  586428  22120   6728 S   0.0  0.1   0:01.89 tuned                                                                                                                                
  577 polkitd   20   0  613004  13992   4932 S   0.0  0.1   0:00.11 polkitd                                                                                                                              
  856 root      20   0  625932  11008   6852 S   0.0  0.1   0:00.51 NetworkManager                                                                                                                       
    1 root      20   0  193688   6852   4188 S   0.0  0.0   0:03.06 systemd                                                                                                                              
  446 root      20   0   39060   6168   5840 S   0.0  0.0   0:00.81 systemd-journal                                                                                                                      
13729 root      20   0  159076   5864   4408 S   0.0  0.0   0:00.85 sshd                                                                                                                                 
  482 root      20   0   49156   5748   2880 S   0.0  0.0   0:01.38 systemd-udevd                                                                                                                        
 2550 root      20   0  158944   5740   4392 S   0.0  0.0   0:00.48 sshd               


2. There were no 'CLOSE_WAIT' connections for port 5647 on the client.

Comment 5 Pavel Moravec 2021-07-07 10:31:14 UTC

(In reply to Jitendra Yejare from comment #3)
> Does the repro steps in this bug summary are enough to verify this bug from
> QE? Or Do you have any other repro steps ?

Indeed, those steps are sufficient. removing the needinfo on me.

Comment 8 errata-xmlrpc 2021-11-16 14:10:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702