Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1334845

Summary:	A backend service [ Candlepin ] is unreachable
Product:	Red Hat Satellite	Reporter:	Roman Plevka <rplevka>
Component:	Subscription Management	Assignee:	Tom McKay <tomckay>
Status:	CLOSED ERRATA	QA Contact:	Katello QA List <katello-qa-list>
Severity:	high	Docs Contact:
Priority:	unspecified
Version:	6.2.0	CC:	bbuckingham, bcourt, bkearney, rplevka, tomckay
Target Milestone:	Unspecified	Keywords:	Triaged
Target Release:	Unused
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-07-27 11:24:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Roman Plevka 2016-05-10 15:56:22 UTC

Description of problem:
A candlepin service seems to die on its own after some time. This is a second occurrence for sat 6.2.0 Beta (GA10.1).

I don't have a clear replication steps, so I'll attach the foreman-debug tarball.
This happend on CI machine long time (days) after the automation finished


Version-Release number of selected component (if applicable):
6.2.0 Beta (GA10.1)


Actual results:
"Oops, we're sorry but something went wrong A backend service [ Candlepin ] is unreachable"

# hammer -u admin -p changeme ping
candlepin:      
    Status:          FAIL
    Server Response:
candlepin_auth: 
    Status:          FAIL
    Server Response:
pulp:           
    Status:          ok
    Server Response: Duration: 380ms
foreman_tasks:  
    Status:          ok
    Server Response: Duration: 12ms

production.log:
2016-05-10 11:53:08 [app] [W] Connection refused - connect(2) for "localhost" port 8443

Comment 2 Brad Buckingham 2016-05-11 15:16:21 UTC

Hi Roman, Can you attach the debug?  Thanks!

Comment 5 Roman Plevka 2016-05-11 15:23:05 UTC

accidentally cleared the needinfo flag, setting it back

Comment 6 Barnaby Court 2016-05-11 15:30:01 UTC

I don't see any specific errors in Candlepin however it looks like the katello_event_queue is not being drained of the messages that candlepin is sending. 

From the qpid_stat_queues file in the foreman debug info. 2.4k messages in the katello_event_queue and no connections draining them. 

Brad, how should that be routed?

Comment 7 Tom McKay 2016-05-25 17:36:48 UTC

Perhaps related (fixed?) by https://bugzilla.redhat.com/show_bug.cgi?id=1283582

Can the upstream patch be applied and tested?
https://github.com/Katello/katello/pull/6065

Comment 9 Tom McKay 2016-06-06 14:38:19 UTC

Moving to ON_QA to allow retest with the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1283582 .

Comment 10 Roman Plevka 2016-06-21 08:37:19 UTC

VERIFIED

on sat6.2.0 beta (GA16.0).

I no longer the issue.
after collecting the dynflow_executor PID data for 6.5 hours, It seems the memory leak is gone.

VmPeak:	 2977916 kB
VmSize:	 2912376 kB
VmLck:	       0 kB
VmHWM:	 1312892 kB
VmRSS:	 1310272 kB
VmData:	 2540460 kB
VmStk:	   10244 kB
VmExe:	       4 kB
VmLib:	   30008 kB
VmPTE:	    3316 kB
VmSwap:	       8 kB

# uptime
 04:36:45 up 3 days, 18:17,  1 user,  load average: 1.26, 1.04, 0.90
# hammer ping
candlepin:      
    Status:          ok
    Server Response: Duration: 30ms
candlepin_auth: 
    Status:          ok
    Server Response: Duration: 36ms
pulp:           
    Status:          ok
    Server Response: Duration: 46ms
foreman_tasks:  
    Status:          ok
    Server Response: Duration: 14ms

Comment 11 Bryan Kearney 2016-07-27 11:24:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1501