Bug 1334845
| Summary: | A backend service [ Candlepin ] is unreachable | ||
|---|---|---|---|
| Product: | Red Hat Satellite | Reporter: | Roman Plevka <rplevka> |
| Component: | Subscription Management | Assignee: | Tom McKay <tomckay> |
| Status: | CLOSED ERRATA | QA Contact: | Katello QA List <katello-qa-list> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2.0 | CC: | bbuckingham, bcourt, bkearney, rplevka, tomckay |
| Target Milestone: | Unspecified | Keywords: | Triaged |
| Target Release: | Unused | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-07-27 11:24:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Hi Roman, Can you attach the debug? Thanks! accidentally cleared the needinfo flag, setting it back I don't see any specific errors in Candlepin however it looks like the katello_event_queue is not being drained of the messages that candlepin is sending. From the qpid_stat_queues file in the foreman debug info. 2.4k messages in the katello_event_queue and no connections draining them. Brad, how should that be routed? Perhaps related (fixed?) by https://bugzilla.redhat.com/show_bug.cgi?id=1283582 Can the upstream patch be applied and tested? https://github.com/Katello/katello/pull/6065 Moving to ON_QA to allow retest with the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1283582 . VERIFIED
on sat6.2.0 beta (GA16.0).
I no longer the issue.
after collecting the dynflow_executor PID data for 6.5 hours, It seems the memory leak is gone.
VmPeak: 2977916 kB
VmSize: 2912376 kB
VmLck: 0 kB
VmHWM: 1312892 kB
VmRSS: 1310272 kB
VmData: 2540460 kB
VmStk: 10244 kB
VmExe: 4 kB
VmLib: 30008 kB
VmPTE: 3316 kB
VmSwap: 8 kB
# uptime
04:36:45 up 3 days, 18:17, 1 user, load average: 1.26, 1.04, 0.90
# hammer ping
candlepin:
Status: ok
Server Response: Duration: 30ms
candlepin_auth:
Status: ok
Server Response: Duration: 36ms
pulp:
Status: ok
Server Response: Duration: 46ms
foreman_tasks:
Status: ok
Server Response: Duration: 14ms
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1501 |
Description of problem: A candlepin service seems to die on its own after some time. This is a second occurrence for sat 6.2.0 Beta (GA10.1). I don't have a clear replication steps, so I'll attach the foreman-debug tarball. This happend on CI machine long time (days) after the automation finished Version-Release number of selected component (if applicable): 6.2.0 Beta (GA10.1) Actual results: "Oops, we're sorry but something went wrong A backend service [ Candlepin ] is unreachable" # hammer -u admin -p changeme ping candlepin: Status: FAIL Server Response: candlepin_auth: Status: FAIL Server Response: pulp: Status: ok Server Response: Duration: 380ms foreman_tasks: Status: ok Server Response: Duration: 12ms production.log: 2016-05-10 11:53:08 [app] [W] Connection refused - connect(2) for "localhost" port 8443