1265657 – goferd reconnect attempts too in-frequent, causing katello-agent timeout

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1265657 - goferd reconnect attempts too in-frequent, causing katello-agent timeout

Summary: goferd reconnect attempts too in-frequent, causing katello-agent timeout

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	katello-agent
Sub Component:
Version:	6.1.1
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Chris Roberts
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-23 12:33 UTC by Pavel Moravec
Modified:	2019-11-14 06:59 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-06 16:06:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Foreman Issue Tracker	16707	0	None	None	None	2016-09-27 14:49:54 UTC

Description Pavel Moravec 2015-09-23 12:33:59 UTC

Description of problem:
If connection between goferd and qdrouterd is lost (network issue, qdrouterd restart, whatever), goferd does reconnect attempts based on a retry scheme:

 retry in 10 seconds
 retry in 12 seconds
 retry in 14 seconds
 retry in 17 seconds
..
 retry in 106 seconds

If everything is running but goferd waits, say, 35 seconds for another connection attempt, and if one tries to install a package to this host, it would fail with 20s timeout ("is katello-agent running?"). That isn't too much robust..

Could be goferd retry scheme be with smaller delays between successive reconnect attempts? (i.e. starting from delay 2s, then 3s, then 4s etc. until 10s forever)? Or is there some particular reason for the current delays?


Version-Release number of selected component (if applicable):
gofer-2.6.2-2.el7sat.noarch
python-gofer-2.6.2-2.el7sat.noarch
python-gofer-qpid-2.6.2-2.el7sat.noarch


How reproducible:
100%


Steps to Reproduce:
All on Satellite:
1. service qdrouterd stop
2. sleep 73
3. service qdrouterd start
4. sleep 1
5. hammer content-host package install --content-host-id .. --organization-id 1 --packages sysstat

(ensure the sleep timing such that the host just logged "gofer.messaging.adapter.proton.connection:108 - retry in 20 seconds" before starting qdrouterd)


Actual results:
package install fails with 20s timeout


Expected results:
package install succeeds


Additional info:

Comment 4 Bryan Kearney 2016-07-08 20:39:19 UTC

Per 6.3 planning, moving out non acked bugs to the backlog

Comment 6 Chris Duryee 2016-09-27 14:46:02 UTC

it looks like in the most recent version of gofer, max_delay is 90 sec.

This is configurable in katello via content_action_accept_timeout, but the default should be >90 sec (say, 100 sec) instead of 20 sec.

Comment 8 Chris Roberts 2017-01-06 16:06:10 UTC

After talking to Justin about this, by default the current timeout should be enough or we would have seen more customer issues around this. Going to close this out as WONTFIX.

- Chris Roberts

Comment 9 Justin Sherrill 2017-02-13 03:29:00 UTC

Clearing need info

Note You need to log in before you can comment on or make changes to this bug.