Bug 1985999

Summary: Octavia client certs are not updated uniformly across all nodes on update/upgrade.
Product: Red Hat OpenStack Reporter: Brian J. Atkisson <batkisso>
Component: tripleo-ansibleAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Bruna Bonguardo <bbonguar>
Severity: high Docs Contact:
Priority: urgent    
Version: 16.1 (Train)CC: ahasson, averi, beagles, cmuresan, gthiemon, ihrachys, lpeer, majopela, mchappel, michjohn, mturner, njohnston, oschwart, pveiga, scohen, ykaul
Target Milestone: z8Keywords: Triaged
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: tripleo-ansible-0.5.1-1.20211124153402.902c3c8.el8ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2017829 (view as bug list) Environment:
Last Closed: 2022-03-24 11:00:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2017829    

Description Brian J. Atkisson 2021-07-26 13:20:01 UTC
Description of problem:

We had an outage wherein no user could create a new Octavia load balancer. Existing load-balancers continued to function correctly.

The root cause relates to the nature and design of Octavia load balancer resource updates. The Octavia worker (living on controllers, networkers) send calls to the Octavia Amphora API (living on each hosted Amphora) using a mutual two-ways TLS connection/authentication (see https://docs.openstack.org/octavia/latest/admin/guides/certificates.html#two-way-tls-authentication-in-octavia and subsequent points). The culprit lies in how PKI is managed for the authentication to work as expected. Specifically the entire PKI (from a product perspective) is managed through a self-signed CA which comes with no automation and/or monitoring to automatically rotate the required certificates. During this specific service degradation the client side certificate used by the Octavia worker to authenticate to the Amphoras expired leaving no good way for the workers to process or communicate securely with the Amphoras. The expiration date was set to the 16th of July 2021, which tells us the service degradation started at that time.


Version-Release number of selected component (if applicable):
16.1.6

How reproducible:
Install an OpenStack cluster with TLS-everywhere IPA integration (NovaJoin) and wait a year.


Steps to Reproduce:
1. Install an OpenStack cluster with TLS-everywhere IPA integration (NovaJoin) 
2. Wait a year
3. Watch /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/client.pem expire on the controllers.

Actual results:

[2021-07-21 13:15:36 -0400] [1248] [DEBUG] Failed to send error message.                                                                                 
[2021-07-21 13:15:41 -0400] [1248] [DEBUG] Error processing SSL request.                                                                                  
[2021-07-21 13:15:41 -0400] [1248] [DEBUG] Invalid request from ip=::ffff:172.24.3.245: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:2354) 

[root@controller-2 certs]# cat client.pem.backup                                                                                                              
Certificate:                                                                                                                                                  
    Data:                                                                                                                                                     
        Version: 3 (0x2)                                                                                                                                      
        Serial Number: 1 (0x1)                                                                                                                                
        Signature Algorithm: sha256WithRSAEncryption                                                                                                          
        Issuer: C=US, ST=Denial, L=Springfield, O=Dis, CN=www.example.com                                                                                     
        Validity                                                                                                                                              
            Not Before: Jul 16 12:22:00 2020 GMT                                                                                                              
            Not After : Jul 16 12:22:00 2021 GMT                                                                                                              
        Subject: C=US, ST=Denial, O=Dis, CN=www.example.com        


Expected results:

Either:
IPA to issue these clients certs (which I think might still be a roadmap item with IPA) and renew them automatically
-or-
Some mechanism to auto-renew these clients certs prior to expiration.


Additional info:
https://docs.google.com/document/d/1Jeok-VWayejYJnAW_z5lDmwm8MK-Bf1aTa-JrjXLuPk/edit#

Comment 4 Michael Johnson 2021-07-28 20:49:37 UTC
Here is a little background information on how the mutual-authentication TLS works in Octavia and OSP.

Communication between the control plane and the amphora (load balancing service VMs) is over a TLS connection using mutual authentication. This means that the control plane authenticates certificates issued to the amphora and the amphora authenticate certificates provided by the control plane.

They are only used for service-to-service communication.

In the case of the amphora certificates, they are issued at boot time and the Octavia housekeeping process rotates them as necessary based on the configuration settings.

On the control plane side, in the case of RHOSP, the certificates are created and managed by tripleo/director.

We are looking into why that "client" certificate has incorrect information on it.

Comment 6 Ashraf Hasson 2021-07-29 17:03:43 UTC
Brent, please find the needed info in https://access.redhat.com/support/cases/#/case/02996461

Comment 29 errata-xmlrpc 2022-03-24 11:00:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986