Bug 1937824

Summary: memory leak in goferd / qpid proton 0.33 when dropping outgoing packets
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: QpidAssignee: Mike Cressman <mcressma>
Status: CLOSED ERRATA QA Contact: Jitendra Yejare <jyejare>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.9.0CC: ahumbe, cjansen, osousa, patrick.andrieux, rdesouza
Target Milestone: 6.10.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: qpid-proton-0.33.0-6.el7_9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-16 14:10:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2021-03-11 15:50:02 UTC
Description of problem:
If 6.10 will bounce qpid packages to 0.33 for katello-agent/goferd, then we must fix and backport:

https://issues.apache.org/jira/browse/PROTON-2344

Scenario behind this: if changing default heartbeats in qdrouterd or katello-agent such that agent has lower, and when network drops packets until the client detects timeouted connection, there is a leak in CLOSE_WAIT connection and in memory.


Version-Release number of selected component (if applicable):
katello-agent running on:
    qpid-proton-c-0.33.0-1.el7.x86_64
    python2-qpid-proton-0.33.0-1.el7.x86_64


How reproducible:
100%


Steps to Reproduce:
1. Lower heartbeats of goferd by having

[messaging]
heartbeat=7

in both files:

/etc/gofer/plugins/katello.conf
/etc/gofer/agent.conf

2. Restarting goferd
3. Mimicking packet drops:
a="-I"
while true; do
  echo "$(date): setting $a"
  iptables $a OUTPUT -p tcp --dport 5647 -j DROP
  if [ $a = "-I" ]; then
    a="-D"
  else
    a="-I"
  fi
  sleep 10
done

4. monitor memory usage and connections over port 5647 on the client

(optionally, test the reproducer in the JIRA directly)


Actual results:
4. memory usage grows, CLOSE_WAIT "connections" appear


Expected results:
4. no mem.leak or CLOSE_WAITs


Additional info:

Comment 3 Jitendra Yejare 2021-07-05 19:06:36 UTC
Does the repro steps in this bug summary are enough to verify this bug from QE? Or Do you have any other repro steps ?

Comment 4 Jitendra Yejare 2021-07-06 14:49:56 UTC
Verified!

@Satellite 6.10 snap 7

Steps:
----------
Steps mentioned in the bug description.


Behaviour:
-----------

After running the packet drop script mentioned in the description for one and half hours on the client, I observed:

1. The TOP command doesn't show a memory leak. Following are the top memory hoggers :


PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                              
 2584 root      20   0  878544  33568   8940 S   0.0  0.2   0:22.49 python                                                                                                                               
  614 root      20   0  358748  29528   7056 S   0.0  0.1   0:00.97 firewalld                                                                                                                            
 1144 root      20   0  586428  22120   6728 S   0.0  0.1   0:01.89 tuned                                                                                                                                
  577 polkitd   20   0  613004  13992   4932 S   0.0  0.1   0:00.11 polkitd                                                                                                                              
  856 root      20   0  625932  11008   6852 S   0.0  0.1   0:00.51 NetworkManager                                                                                                                       
    1 root      20   0  193688   6852   4188 S   0.0  0.0   0:03.06 systemd                                                                                                                              
  446 root      20   0   39060   6168   5840 S   0.0  0.0   0:00.81 systemd-journal                                                                                                                      
13729 root      20   0  159076   5864   4408 S   0.0  0.0   0:00.85 sshd                                                                                                                                 
  482 root      20   0   49156   5748   2880 S   0.0  0.0   0:01.38 systemd-udevd                                                                                                                        
 2550 root      20   0  158944   5740   4392 S   0.0  0.0   0:00.48 sshd               


2. There were no 'CLOSE_WAIT' connections for port 5647 on the client.

Comment 5 Pavel Moravec 2021-07-07 10:31:14 UTC
(In reply to Jitendra Yejare from comment #3)
> Does the repro steps in this bug summary are enough to verify this bug from
> QE? Or Do you have any other repro steps ?

Indeed, those steps are sufficient. removing the needinfo on me.

Comment 8 errata-xmlrpc 2021-11-16 14:10:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.10 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4702