Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
Red Hat Satellite engineering is moving the tracking of its product development work on Satellite to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "Satellite project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs will be migrated starting at the end of May. If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "Satellite project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/SAT-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1890487

Summary: Candlepin services gets down after upgrade
Product: Red Hat Satellite Reporter: Devendra Singh <desingh>
Component: InfrastructureAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED NOTABUG QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.8.0CC: bcourt, ehelms, inecas, smallamp, zhunting
Target Milestone: 6.9.0Keywords: Regression, Triaged, Upgrades
Target Release: Unused   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-18 15:43:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Devendra Singh 2020-10-22 11:39:11 UTC
Description of problem: Candlepin services get down after upgrade


Version-Release number of selected component (if applicable):
6.8 Snap20

How reproducible:
1/1

Steps to Reproduce:
1. Upgrade Satellite and Capsule from 6.7.4 to 6.8.0 Snap20
2. Execute Test Suit to validate the components. 
3. During the execution of the test cases Candlepin services get downstate.

candlepin:        
    Status:          FAIL
    Server Response: Message: Failed to open TCP connection to localhost:8443 (Connection refused - connect(2) for "localhost" port 8443)
candlepin_events:
    Status:          FAIL
    message:         Not running
    Server Response: Duration: 6ms
candlepin_auth:  
    Status:          FAIL
    Server Response: Message: A backend service [ Candlepin ] is unreachable


Actual results:
Candlepin services get down after upgrade.

Expected results:
Candlepin services should not go down after the upgrade. 

Additional info:

Comment 2 Sudhir Mallamprabhakara 2020-10-22 12:22:42 UTC
adding regression keyword. snap 19 looked good.

Comment 4 Zach Huntington-Meath 2020-11-02 15:44:47 UTC
Was this fixed in any candlepin version?

Comment 7 Eric Helms 2020-11-18 02:59:11 UTC
The most common cause of this is the system hitting an 'Out of memory' exception, tomcat is the first service to be killed when that happens on a Satellite. Looking through the sosreport I see that condition:

var/log/messages-20201115:Nov 14 05:01:27 qe-sat6-upgrade-rhel7 kernel: Out of memory: Kill process 30358 (java) score 84 or sacrifice child


Devendra, can you re-test this and if you run into the same Candlepin connection failure check to see if an OOM condition was encountered? If so, then I would opt to close this not a bug.

Comment 8 Devendra Singh 2020-11-23 05:25:26 UTC
(In reply to Eric Helms from comment #7)
> The most common cause of this is the system hitting an 'Out of memory'
> exception, tomcat is the first service to be killed when that happens on a
> Satellite. Looking through the sosreport I see that condition:
> 
> var/log/messages-20201115:Nov 14 05:01:27 qe-sat6-upgrade-rhel7 kernel: Out
> of memory: Kill process 30358 (java) score 84 or sacrifice child
> 
> 
> Devendra, can you re-test this and if you run into the same Candlepin
> connection failure check to see if an OOM condition was encountered? If so,
> then I would opt to close this not a bug.

I re-tested but didn't see the OOM problem, last time I saw it in 6.8.0 Snap20.

Comment 9 Zach Huntington-Meath 2020-11-23 15:13:34 UTC
If this issue is not seen since then is this still an issue or can it be reproduced still?

Comment 10 Devendra Singh 2020-11-23 15:36:21 UTC
(In reply to Zach Huntington-Meath from comment #9)
> If this issue is not seen since then is this still an issue or can it be
> reproduced still?

No, The issue is intermittent, I saw it in 6.8 Snap20 and 6.8.1 Snap3(not seen in 6.8.1 snap1, snap2, and snap 4)
I tested the upgrade with a recent 6.8.1 snap(6.7.4 -->6.8.1 Snap4 and 6.8.0 -->6.8.1 Snap4) but didn't see this issue.