1169397 – gofer takes 100% CPU and does not reconnect after AMQP connection bounced

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1169397 - gofer takes 100% CPU and does not reconnect after AMQP connection bounced

Summary: gofer takes 100% CPU and does not reconnect after AMQP connection bounced

Keywords:
Status:	CLOSED DUPLICATE of bug 1169416
Alias:	None
Product:	Red Hat Satellite
Classification:	Red Hat
Component:	katello-agent
Sub Component:
Version:	6.0.4
Hardware:	All
OS:	All
Priority:	medium
Severity:	medium
Target Milestone:	Unspecified
Assignee:	Katello Bug Bin
QA Contact:	Katello QA List
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-12-01 14:50 UTC by Pavel Moravec
Modified:	2021-08-30 12:31 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-03-13 14:51:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	1295583	0	None	None	None	Never

Internal Links: 1159281

Description Pavel Moravec 2014-12-01 14:50:40 UTC

Description of problem:
When AMQP (qpid) broker bounces - on AMQP level - a gofer connection attempt due to whatever (temporary) reason, gofer tries to reconnect just once after 1 second and then it gives up. While it's CPU usage grows over time until reaching 100%.

In such a scenario of AMQP connection rejection, gofer should be reconnecting with increased delays while it should not consume (much) resources meantime.

Note the connection reject has to be on AMQP level. If the connection is rejected on TCP level (i.e. RST packets received to SYN packets sent by gofer), gofer works properly (and this behaviour should be seen also for rejections on AMQP level).


Version-Release number of selected component (if applicable):
gofer-1.3.0-1.el6sat.noarch


How reproducible:
100%


Steps to Reproduce:
1. Artificially force qpid broker to deny AMQP connection. Either use some wrong SSL certificate by gofer, or play with max-connections qpid option. Put to /etc/qpid/qpidd.conf:
max-connections=5

and restart qpidd service to apply the change. The broker will now accept at most 5 AMQP connections (too low for even 1 content host).

2. Start some content host and restart goferd service there.
3. Observe in /varlog/messages its connection attempts and monitor CPU usage


Actual results:
gofer logs just at the beginning:

Nov 30 13:26:33 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:407 - reconnect succeeded: pmoravec-rhel7-sat6.gsslab.brq.redhat.com:5671
Nov 30 13:33:46 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:453 - recoverable error[attempt 0]: connection aborted
Nov 30 13:33:46 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:455 - sleeping 1 seconds

and does not try further. CPU usage of goferd process grows over time until getting 100%.


Expected results:
gofer logs something similar to:
Dec  1 13:01:45 pmoravec-rhel6-3 goferd: [WARNING][MainThread] qpid.messaging:455 - sleeping 1 seconds
Dec  1 13:01:46 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:537 - trying: pmoravec-rhel7-sat6.gsslab.brq.redhat.com:5671
Dec  1 13:01:47 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:453 - recoverable error[attempt 2]: [Errno 111] Connection refused
Dec  1 13:01:47 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:455 - sleeping 2 seconds
Dec  1 13:01:49 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:537 - trying: pmoravec-rhel7-sat6.gsslab.brq.redhat.com:5671
Dec  1 13:01:50 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:453 - recoverable error[attempt 3]: [Errno 111] Connection refused
Dec  1 13:01:50 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:455 - sleeping 4 seconds
Dec  1 13:01:54 pmoravec-rhel6-3 goferd: [WARNING][Thread-2] qpid.messaging:537 - trying: pmoravec-rhel7-sat6.gsslab.brq.redhat.com:5671

and CPU usage of the process stays stable low.


Additional info:
Despite this use case is a wrong configuration (of qpid broker, most probably), gofer should be robust enough to handle it. It should not simply makes the content host unuseable by consuming whole CPU resource, and it should try to recover itself from the situation.

Comment 1 RHEL Program Management 2014-12-01 14:53:43 UTC

Since this issue was entered in Red Hat Bugzilla, the release flag has been
set to ? to ensure that it is properly evaluated for this release.

Comment 3 Pavel Moravec 2015-03-13 14:51:28 UTC

In fact duplicate of 1169416

*** This bug has been marked as a duplicate of bug 1169416 ***

Note You need to log in before you can comment on or make changes to this bug.