Bug 1985999 - Octavia client certs are not updated uniformly across all nodes on update/upgrade.
Summary: Octavia client certs are not updated uniformly across all nodes on update/upg...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: tripleo-ansible
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: z8
: 16.1 (Train on RHEL 8.2)
Assignee: Brent Eagles
QA Contact: Bruna Bonguardo
URL:
Whiteboard:
Depends On:
Blocks: 2017829
TreeView+ depends on / blocked
 
Reported: 2021-07-26 13:20 UTC by Brian J. Atkisson
Modified: 2024-10-01 19:04 UTC (History)
16 users (show)

Fixed In Version: tripleo-ansible-0.5.1-1.20211124153402.902c3c8.el8ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2017829 (view as bug list)
Environment:
Last Closed: 2022-03-24 11:00:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1939445 0 None None None 2021-08-10 19:21:19 UTC
OpenStack gerrit 804132 0 None MERGED Use different name to distinguish between facts and vars in octavia 2022-01-18 08:05:25 UTC
Red Hat Issue Tracker OSP-6436 0 None None None 2021-11-15 13:03:10 UTC
Red Hat Knowledge Base (Solution) 6572321 0 None None None 2021-12-08 20:21:23 UTC
Red Hat Product Errata RHBA-2022:0986 0 None None None 2022-03-24 11:00:37 UTC

Description Brian J. Atkisson 2021-07-26 13:20:01 UTC
Description of problem:

We had an outage wherein no user could create a new Octavia load balancer. Existing load-balancers continued to function correctly.

The root cause relates to the nature and design of Octavia load balancer resource updates. The Octavia worker (living on controllers, networkers) send calls to the Octavia Amphora API (living on each hosted Amphora) using a mutual two-ways TLS connection/authentication (see https://docs.openstack.org/octavia/latest/admin/guides/certificates.html#two-way-tls-authentication-in-octavia and subsequent points). The culprit lies in how PKI is managed for the authentication to work as expected. Specifically the entire PKI (from a product perspective) is managed through a self-signed CA which comes with no automation and/or monitoring to automatically rotate the required certificates. During this specific service degradation the client side certificate used by the Octavia worker to authenticate to the Amphoras expired leaving no good way for the workers to process or communicate securely with the Amphoras. The expiration date was set to the 16th of July 2021, which tells us the service degradation started at that time.


Version-Release number of selected component (if applicable):
16.1.6

How reproducible:
Install an OpenStack cluster with TLS-everywhere IPA integration (NovaJoin) and wait a year.


Steps to Reproduce:
1. Install an OpenStack cluster with TLS-everywhere IPA integration (NovaJoin) 
2. Wait a year
3. Watch /var/lib/config-data/puppet-generated/octavia/etc/octavia/certs/client.pem expire on the controllers.

Actual results:

[2021-07-21 13:15:36 -0400] [1248] [DEBUG] Failed to send error message.                                                                                 
[2021-07-21 13:15:41 -0400] [1248] [DEBUG] Error processing SSL request.                                                                                  
[2021-07-21 13:15:41 -0400] [1248] [DEBUG] Invalid request from ip=::ffff:172.24.3.245: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:2354) 

[root@controller-2 certs]# cat client.pem.backup                                                                                                              
Certificate:                                                                                                                                                  
    Data:                                                                                                                                                     
        Version: 3 (0x2)                                                                                                                                      
        Serial Number: 1 (0x1)                                                                                                                                
        Signature Algorithm: sha256WithRSAEncryption                                                                                                          
        Issuer: C=US, ST=Denial, L=Springfield, O=Dis, CN=www.example.com                                                                                     
        Validity                                                                                                                                              
            Not Before: Jul 16 12:22:00 2020 GMT                                                                                                              
            Not After : Jul 16 12:22:00 2021 GMT                                                                                                              
        Subject: C=US, ST=Denial, O=Dis, CN=www.example.com        


Expected results:

Either:
IPA to issue these clients certs (which I think might still be a roadmap item with IPA) and renew them automatically
-or-
Some mechanism to auto-renew these clients certs prior to expiration.


Additional info:
https://docs.google.com/document/d/1Jeok-VWayejYJnAW_z5lDmwm8MK-Bf1aTa-JrjXLuPk/edit#

Comment 4 Michael Johnson 2021-07-28 20:49:37 UTC
Here is a little background information on how the mutual-authentication TLS works in Octavia and OSP.

Communication between the control plane and the amphora (load balancing service VMs) is over a TLS connection using mutual authentication. This means that the control plane authenticates certificates issued to the amphora and the amphora authenticate certificates provided by the control plane.

They are only used for service-to-service communication.

In the case of the amphora certificates, they are issued at boot time and the Octavia housekeeping process rotates them as necessary based on the configuration settings.

On the control plane side, in the case of RHOSP, the certificates are created and managed by tripleo/director.

We are looking into why that "client" certificate has incorrect information on it.

Comment 6 Ashraf Hasson 2021-07-29 17:03:43 UTC
Brent, please find the needed info in https://access.redhat.com/support/cases/#/case/02996461

Comment 29 errata-xmlrpc 2022-03-24 11:00:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986


Note You need to log in before you can comment on or make changes to this bug.