Bug 1264461 - goferd connection leak when using heartbeats
goferd connection leak when using heartbeats
Status: CLOSED ERRATA
Product: Red Hat Satellite 6
Classification: Red Hat
Component: katello-agent (Show other bugs)
6.1.1
x86_64 Linux
high Severity urgent (vote)
: 6.1.3
: --
Assigned To: Justin Sherrill
jcallaha
: Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-18 10:01 EDT by Pavel Moravec
Modified: 2017-02-23 14:46 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-15 14:40:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Pavel Moravec 2015-09-18 10:01:22 EDT
Description of problem:
Enabling heartbeats per bz1249890 and passing some traffic through qdrouterd->goferd connection, connection timeouts might happen as expected side effect - see

https://access.redhat.com/solutions/1614693

for reasoning.

BUT goferd does _not_ drop the timeouted TCP connections.


Version-Release number of selected component (if applicable):
gofer-2.6.2-2.el7sat.noarch with fix per bz1249890


How reproducible:
100%


Steps to Reproduce:
1. Have external capsule (content host should be fine as well), ensure heartbeats are enabled per bz1249890
2. on the capsule / content host, run "netstat -anp | grep 5647"
3. synchronize the capsule (or install/remove some package via hammer/WebUI to a content host)
4. Ensure goferd on capsule / content host detects some timeouted connections by logging to /var/log/messages:

Sep 18 15:47:54 my-capsule goferd: [INFO][worker-0] root:525 - Disconnected

5. repeat 2-4 as many times as you wish


Actual results:
- netstat will show increasing number of TCP connections from goferd(on capsule) to qdrouterd(on Satellite). Exactly 1 new connection for each "Disconnected" log.
- old TCP connections will have (sometimes increasing) Recv-Q (2nd column in "netstat -anp" output) - this is due to the fact qdrouterd sends heartbeat every second there, but goferd does not fetch the frames from the socket


Expected results:
- no connection leak
- Recv-Q should be zero almost every time


Additional info:
Comment 2 Pavel Moravec 2015-09-21 03:39:56 EDT
When disabling SSL, qdrouterd drops the connection while goferd keeps the connection in CLOSE_WAIT state.
Comment 5 Gordon Sim 2015-09-22 07:48:21 EDT
(In reply to Pavel Moravec from comment #4)
> Reproduced the bug outside gofer:
> 
> https://issues.apache.org/jira/browse/PROTON-
> 1000?focusedCommentId=14901368&page=com.atlassian.jira.plugin.system.
> issuetabpanels:comment-tabpanel#comment-14901368

I've committed a fix for this: https://git-wip-us.apache.org/repos/asf?p=qpid-proton.git;h=b8b2b34.
Comment 7 Jeff Ortel 2015-09-28 16:21:43 EDT
This is a general bug in the proton BlockingConnection: https://issues.apache.org/jira/browse/PROTON-1010.
Comment 10 Bryan Kearney 2015-09-29 09:51:32 EDT
Errata is created, moving back to NEW.
Comment 12 jcallaha 2015-10-07 12:00:44 EDT
Verified on Satellite-6.1.0-20151006.0

Installed and removed multiple packages on two content hosts with no errors.

[root@ibm-x3550m3-13 ~]# netstat -anp | grep 5647
tcp        0      0 10.34.36.141:54696      10.16.4.111:5647        ESTABLISHED 29378/python        
tcp        1      0 10.34.36.141:54695      10.16.4.111:5647        CLOSE_WAIT  29378/python        
[root@ibm-x3550m3-13 ~]# netstat -anp | grep 5647
tcp        0      0 10.34.36.141:54696      10.16.4.111:5647        ESTABLISHED 29378/python        
tcp        0      1 10.34.36.141:54814      10.16.4.111:5647        SYN_SENT    29378/python        
tcp        1      0 10.34.36.141:54695      10.16.4.111:5647        CLOSE_WAIT  29378/python        
[root@ibm-x3550m3-13 ~]# netstat -anp | grep 5647
tcp        0      0 10.34.36.141:54696      10.16.4.111:5647        ESTABLISHED 29378/python        
tcp       37      0 10.34.36.141:54814      10.16.4.111:5647        ESTABLISHED 29378/python        
tcp        1      0 10.34.36.141:54695      10.16.4.111:5647        CLOSE_WAIT  29378/python        


/var/log/messages output
------------------------
gofer.messaging.adapter.proton.connection:100 - connecting: URL: amqps://rhsm-qe-1.rhq.lab.eng.bos.redhat.com:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
root:485 - connecting to rhsm-qe-1.rhq.lab.eng.bos.redhat.com:5647...
gofer.messaging.adapter.proton.connection:103 - connected: amqps://rhsm-qe-1.rhq.lab.eng.bos.redhat.com:5647
root:505 - connected to rhsm-qe-1.rhq.lab.eng.bos.redhat.com:5647
gofer.rmi.dispatcher:600 - call: Content.install() sn=f73d1f92-c55c-495e-846a-709f99a87a4b data={u'task_id': u'97c5dae5-5eff-4f53-8c2f-a55deb746e84', u'consumer_id': u'4d010848-3728-4c2a-a0a7-d0cecf0f6ea4'}
pulp_rpm.handlers.rpmtools:164 - Installed: firefox-38.3.0-2.el7_1.x86_64
gofer.agent.rmi:129 - sn=f73d1f92-c55c-495e-846a-709f99a87a4b processed in: 1.126 (minutes)
root:525 - Disconnected
Comment 14 errata-xmlrpc 2015-10-15 14:40:02 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2015:1910

Note You need to log in before you can comment on or make changes to this bug.