Bug 2070620

Summary: After upgrading to 6.11 ping check fails with "Some components are failing: katello_agent"
Product: Red Hat Satellite Reporter: Lukas Pramuk <lpramuk>
Component: InstallerAssignee: Evgeni Golov <egolov>
Status: CLOSED ERRATA QA Contact: Lukas Pramuk <lpramuk>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.11.0CC: egolov, ehelms, gtalreja
Target Milestone: 6.11.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-katello-21.4.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-07-05 14:34:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukas Pramuk 2022-03-31 14:56:15 UTC
This bug was initially created as a copy of Bug #2053395

I am copying this bug because: 
the original bug fixes only the messaging from:
 
"Couldn't connect to the server: undefined method `to_sym' for nil:NilClass"

to 

"Some components are failing: katello_agent"


Description of problem: After upgrading to 6.11 ping check fails with "Some components are failing: katello_agent"

Version-Release number of selected component (if applicable):
6.11.0 Snap14

How reproducible:
deterministic with certain db backup

Steps to Reproduce:
1.Restore a certain customer DB backup to 6.10.z
2.Check Satellite status before upgrade
# hammer ping
...
katello_agent:    
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms

3. Upgrade to 6.11.0
# satellite-maintain upgrade run --target-version 6.11 -w repositories-validate,repositories-setup -y
...
Check whether all services are running using the ping call:           [FAIL]
Some components are failing: katello_agent
...

4. Check Satellite status after upgrade
# hammer ping
...
katello_agent:    
    Status:          FAIL
    message:         Not running
    Server Response: Duration: 1ms

Comment 3 Evgeni Golov 2022-03-31 15:29:23 UTC
Okay, after some great mystery hunt with Justin and Eric, here is what we found out:

* Katello in 6.11+ does verify the certificate presented by qpidd, while in 6.10 and earlier it did not (the change came in via https://projects.theforeman.org/issues/33496)
* When doing so, Katello currently uses /etc/foreman/proxy_ca.pem (ssl_ca_file from foreman's settings.yaml) which is the "server ca" (aka Custom CA)
* The qpidd certificate is signed by the "default ca" (aka Katello CA)
* Obviously, using the wrong CA for verification doesn't work, and things explode.

The issue is not tied to a specific customer backup, but to the fact that this backup is using a custom certificate (as supported and documented in https://access.redhat.com/documentation/en-us/red_hat_satellite/6.10/html/installing_satellite_server_from_a_connected_network/performing-additional-configuration#configuring-satellite-custom-server-certificate_satellite)

The correct fix is to make Katello use the right CA file (/etc/pki/katello/certs/katello-default-ca.crt) for verifying this.

Comment 4 Evgeni Golov 2022-03-31 15:30:46 UTC
I'd argue, this is a bug in the installer, which should configure the "agent" section of /etc/foreman/plugins/katello.yaml to look more like the "candlepin_events" section, explicitly setting the right cert files:

  :candlepin_events:
    :ssl_cert_file: /etc/foreman/client_cert.pem
    :ssl_key_file: /etc/foreman/client_key.pem
    :ssl_ca_file: /etc/pki/katello/certs/katello-default-ca.crt

Comment 5 Evgeni Golov 2022-03-31 15:42:48 UTC
In my reproducer, the working settings look like this:

  :agent:
    :enabled: true
    :broker_url: amqps://localhost:5671
    :event_queue_name: katello.agent
    :broker_ssl_cert_file: /etc/foreman/client_cert.pem
    :broker_ssl_key_file: /etc/foreman/client_key.pem
    :broker_ssl_ca_file: /etc/pki/katello/certs/katello-default-ca.crt

Comment 6 Evgeni Golov 2022-03-31 18:10:39 UTC
Created redmine issue https://projects.theforeman.org/issues/34708 from this bug

Comment 7 Bryan Kearney 2022-03-31 20:05:55 UTC
Upstream bug assigned to egolov

Comment 8 Bryan Kearney 2022-03-31 20:05:57 UTC
Upstream bug assigned to egolov

Comment 9 Bryan Kearney 2022-04-01 20:04:53 UTC
Moving this bug to POST for triage into Satellite since the upstream issue https://projects.theforeman.org/issues/34708 has been resolved.

Comment 10 Lukas Pramuk 2022-04-18 20:36:53 UTC
VERIFIED.

@Satellite 6.11.0 Snap16
foreman-installer-3.1.2.2-2.el7sat.noarch

by the following reproducer:

1) Restore a certain customer DB backup to 6.10.z

2) Check Satellite status before upgrade

# hammer ping
...
katello_agent:    
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 0ms

3) Upgrade to 6.11.0

# satellite-maintain upgrade run --target-version 6.11 -w repositories-validate,repositories-setup -y

>>> successful upgrade

4) Check Satellite status after upgrade

# hammer ping
...
katello_agent:
    Status:          ok
    message:         0 Processed, 0 Failed
    Server Response: Duration: 2ms
 
>>> katello_agent status after upgrade is OK

Comment 13 errata-xmlrpc 2022-07-05 14:34:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Satellite 6.11 Release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5498