Bug 1292026

Summary: goferd stuck in connection close, causing no connection to qdrouterd is made
Product: Red Hat Satellite Reporter: Pavel Moravec <pmoravec>
Component: katello-agentAssignee: Katello Bug Bin <katello-bugs>
Status: CLOSED NOTABUG QA Contact: Katello QA List <katello-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1.4   
Target Milestone: Unspecified   
Target Release: Unused   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-16 11:25:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Moravec 2015-12-16 09:25:46 UTC
Description of problem:
Under unclear circumstances (halfly working reproducer so far), goferd fails to close TCP connections to qdrouterd and is stuck in "connection.close()".

That causes no established TCP connection is made to qdrouterd, hence katello-agent functionality is lost.

Relevant backtraces:

 for file /usr/lib64/python2.7/site-packages/proton/__init__.py, line 2458
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 219
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 234
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 220
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/connection.py, line 152
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/consumer.py, line 80
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/reliability.py, line 43
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 614
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 39
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 648
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 39
 for file /usr/lib/python2.7/site-packages/gofer/messaging/consumer.py, line 88
 for file /usr/lib/python2.7/site-packages/gofer/messaging/consumer.py, line 58
 for file /usr/lib/python2.7/site-packages/gofer/common.py, line 267
 for file /usr/lib64/python2.7/threading.py, line 811
 for file /usr/lib64/python2.7/threading.py, line 784


 for file /usr/lib64/python2.7/site-packages/proton/__init__.py, line 2458
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 219
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 234
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 220
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/connection.py, line 152
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/producer.py, line 79
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/reliability.py, line 43
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 842
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 39
 for file /usr/lib/python2.7/site-packages/gofer/agent/rmi.py, line 266
 for file /usr/lib/gofer/plugins/katelloplugin.py, line 228
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpm.py, line 61
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpmtools.py, line 341
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpmtools.py, line 324
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpmtools.py, line 397
 for file /usr/lib/python2.7/site-packages/yum/__init__.py, line 6472
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpmtools.py, line 602
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpmtools.py, line 159
 for file /usr/lib/python2.7/site-packages/pulp_rpm/handlers/rpm.py, line 100
 for file /usr/lib/python2.7/site-packages/pulp/agent/lib/dispatcher.py, line 76
 for file /usr/lib/gofer/plugins/katelloplugin.py, line 372
 for file /usr/lib/python2.7/site-packages/gofer/rmi/dispatcher.py, line 454
 for file /usr/lib/python2.7/site-packages/gofer/rmi/dispatcher.py, line 634
 for file /usr/lib/python2.7/site-packages/gofer/agent/plugin.py, line 381
 for file /usr/lib/python2.7/site-packages/gofer/agent/rmi.py, line 85
 for file /usr/lib/python2.7/site-packages/gofer/threadpool.py, line 138
 for file /usr/lib/python2.7/site-packages/gofer/threadpool.py, line 65
 for file /usr/lib/python2.7/site-packages/gofer/common.py, line 267
 for file /usr/lib64/python2.7/threading.py, line 811
 for file /usr/lib64/python2.7/threading.py, line 784


Version-Release number of selected component (if applicable):
gofer-2.6.6-2.el7sat.noarch
python-qpid-proton-0.9-7.el7.x86_64


How reproducible:
???


Steps to Reproduce:
(below reproducer might not work every time - let me know if better is welcomed)
1. In a loop, install and uninstall a package from Satellite to a content host - just to make goferd somehow busy
2. Try to "freeze" qdrouterd where goferd connects to per bz1281947
3. Time to time, check netstat connections


Actual results:
at a random time, goferd logged lastly:

Dec 16 09:45:33 pmoravec-rhel7 goferd: [INFO][worker-0] root:505 - connected to toledo-capsule.gsslab.brq.redhat.com:5648

but having no TCP connection there. While no "root:525 - Disconnected" or "closed: .." log appears there.


Expected results:
goferd to keep an active connection whenever qdrouterd is running (and goferd not in the process of reconnect attempts)


Additional info:
Will provide coredumps and tcpdump of nonSSL communication shortly

Comment 1 Pavel Moravec 2015-12-16 09:37:53 UTC
Another backtrace of gofer (rather proton reactor) stuck in connection.close():

(reactor.py line 142 or 143)
 for file /usr/lib64/python2.7/site-packages/proton/reactor.py, line 142
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 235
 for file /usr/lib64/python2.7/site-packages/proton/utils.py, line 220
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/connection.py, line 152
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/model.py, line 156
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/reliability.py, line 43
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/proton/model.py, line 293
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 365
 for file /usr/lib/python2.7/site-packages/gofer/messaging/adapter/model.py, line 39
 for file /usr/lib/python2.7/site-packages/gofer/agent/plugin.py, line 488
 for file /usr/lib/python2.7/site-packages/gofer/common.py, line 267
 for file /usr/lib/python2.7/site-packages/gofer/agent/plugin.py, line 316
 for file /usr/lib/python2.7/site-packages/gofer/common.py, line 228
 for file /usr/lib/python2.7/site-packages/gofer/agent/plugin.py, line 51
 for file /usr/lib/python2.7/site-packages/gofer/threadpool.py, line 138
 for file /usr/lib/python2.7/site-packages/gofer/threadpool.py, line 65
 for file /usr/lib/python2.7/site-packages/gofer/common.py, line 267
 for file /usr/lib64/python2.7/threading.py, line 811
 for file /usr/lib64/python2.7/threading.py, line 784

Comment 4 Pavel Moravec 2015-12-16 11:19:26 UTC
Investigating the tcpdump, it is expected behaviour of goferd / qpid proton reactor. If qdrouterd does not react to close AMQP frame, gofer/proton reactor is waiting on it..

This BZ is solely caused by bz1281947, closing it.