Bug 2115398 - TLS-E certificates are not properly applied when certmonger renew them
Summary: TLS-E certificates are not properly applied when certmonger renew them
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 16.2 (Train)
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: z2
: ---
Assignee: Grzegorz Grasza
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On:
Blocks: 2115405 2130994
TreeView+ depends on / blocked
 
Reported: 2022-08-04 15:17 UTC by Nicolas Bourgeois
Modified: 2023-08-07 13:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2130994 (view as bug list)
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-18031 0 None None None 2022-08-04 15:19:41 UTC

Description Nicolas Bourgeois 2022-08-04 15:17:27 UTC
Description of problem:
The certificates managed by Certmonger are not properly applied.
- some certificates don't have a "post-save command:" to notify the service using the certificate to reload it's configuration; the command tries to restart a service using systemctl while the service is now containerized
- some mappings are made to (container)/var/lib/kolla and no mechanism copies it in /etc/pki/certs to have a chance to use the new certificate

The certificates are present in the host's /etc/pki/tls/certs directory. So, if a customer updates Openstack, new containers are spawned, and the renewed certificates are used.  
For users who have 3rd parties constraints (ie Juniper Contrail) and are stuck to Openstack specific versions, they could not update their platform within 2 years, and have issues (a customer had 2 of their Openstack 13 infra unavailable due to this bug, and discovered that it wouldn't work either on OSP16)

Version-Release number of selected component (if applicable):
Red Hat OpenStack Platform release 16.2.3 (Train)


How reproducible:
run "getcert resubmit -i <certificate>" by choosing a certificate from the list below 



Actual results:

- On the controllers

  - Galera: 
     - certificates are binded on  /var/lib/kolla/config_files/src-tls/etc/pki/tls/certs/
     - no "post-save" command for copying the certs to /etc/pki/tls/certs and reload the service

  - RabbitMQ: seems OK. I don't know how to check

  - novnc-proxy
     - certificate renewed in /etc/pki/tls/...
     - no post-save command. Is it required to reload a service?

  - ovsdb-server, ovn_controller
     - certificate renewed in /etc/pki/tls/...
     - no post-save command. Is it required to reload a service?

  - haproxy
     - certificates renewed in /etc/pki/tls/...
     - Although the post-save command, "$container_cli kill --signal HUP "$haproxy_container_name"" is launched, the old certificate is still presented. Even after restarting the container, the old certificate is still used
          
       (container)# ls -l /etc/pki/tls/certs/haproxy/     <= the certificates are renewed
       total 52
       -rw-------. 1 haproxy haproxy 2000 Aug  4 14:04 overcloud-haproxy-ctlplane.crt
       -rw-------. 1 haproxy haproxy 5343 Aug  4 14:04 overcloud-haproxy-ctlplane.pem
       -rw-------. 1 haproxy haproxy 1891 Aug  4 14:00 overcloud-haproxy-external.crt
       -rw-------. 1 haproxy haproxy 2021 Aug  4 14:04 overcloud-haproxy-internal_api.crt
       -rw-r-----. 1 haproxy haproxy 5364 Aug  4 14:55 overcloud-haproxy-internal_api.pem
       -rw-------. 1 haproxy haproxy 1996 Aug  4 13:58 overcloud-haproxy-storage.crt
       -rw-------. 1 haproxy haproxy 2021 Aug  4 13:58 overcloud-haproxy-storage_mgmt.crt
       -rw-------. 1 haproxy haproxy 5364 Aug  4 13:58 overcloud-haproxy-storage_mgmt.pem
       -rw-------. 1 haproxy haproxy 5339 Aug  4 13:58 overcloud-haproxy-storage.pem

       (container haproxy)[root@controller-1 /]# openssl x509 -enddate -noout -in  /etc/pki/tls/certs/haproxy/overcloud-haproxy-internal_api.pem
        notAfter=Aug  4 14:04:46 2024 GMT                 <= we observe their expiration date

       (host)openssl s_client -connect 172.17.1.150:9292 2>/dev/null | openssl x509 -noout -dates
       notBefore=Aug  2 15:07:17 2022 GMT
       notAfter=Aug  2 15:07:17 2024 GMT                  <= the old certificate is still used

  - libvirt-vnc-client-cert
    - certificates renewed in /etc/pki/
    - wrong post-save command: systemctl reload libvirtd    

  - httpd-ctlplane, httpd-external, httpd-internal_api, httpd-storage, httpd-storage_mgmt, libvirt-vnc-client-cert
    - I can't find exactly which container use those certificate, I let you check
    - post-save command: pkill -USR1 httpd  => to check if it's working


- On the compute nodes

  - ovn_controller, ovn_metadata
     - certificate renewed in /etc/pki/tls/...
     - no post-save command. Is it required to reload a service?

  - libvirt-client-cert, libvirt-server-cert
    - certificates renewed in /etc/pki/libvirt
    - wrong post-save command: systemctl reload libvirtd 

  - qemu-nbd-client-cert
    - certificates renewed in /etc/pki/libvirt-nbd
    - wrong post-save command: systemctl reload libvirtd 

  - libvirt-vnc-server-cert, qemu-server-cert
    - the certificate files are binded into the container; not the directories that contain them. The files are not updated until we restart the container
  


Expected results:
Certificates refreshed

Comment 1 Cédric Jeanneret 2022-08-05 09:41:53 UTC
While I'm pretty sure it's not only on Security, they are the right ones to actually organize and lead this effort imho.

Comment 2 Grzegorz Grasza 2022-08-09 10:24:42 UTC
I'll address individual services 

(In reply to Nicolas Bourgeois from comment #0)
> 
> Actual results:
> 
> - On the controllers
> 
>   - Galera: 
>      - certificates are binded on 
> /var/lib/kolla/config_files/src-tls/etc/pki/tls/certs/
>      - no "post-save" command for copying the certs to /etc/pki/tls/certs
> and reload the service

There is an RFE to implement this:
https://bugzilla.redhat.com/show_bug.cgi?id=2071582

A rolling restart is needed so that there is no disruption of service.

> 
>   - RabbitMQ: seems OK. I don't know how to check
> 
>   - novnc-proxy
>      - certificate renewed in /etc/pki/tls/...
>      - no post-save command. Is it required to reload a service?

It is not required, the certificate is not cached, so the new one is directly read from disk.

> 
>   - ovsdb-server, ovn_controller
>      - certificate renewed in /etc/pki/tls/...
>      - no post-save command. Is it required to reload a service?

Yes, it should be covered by the previously mentioned RFE
https://bugzilla.redhat.com/show_bug.cgi?id=2071582

> 
>   - haproxy
>      - certificates renewed in /etc/pki/tls/...
>      - Although the post-save command, "$container_cli kill --signal HUP
> "$haproxy_container_name"" is launched, the old certificate is still
> presented. Even after restarting the container, the old certificate is still
> used
>           
>        (container)# ls -l /etc/pki/tls/certs/haproxy/     <= the
> certificates are renewed
>        total 52
>        -rw-------. 1 haproxy haproxy 2000 Aug  4 14:04
> overcloud-haproxy-ctlplane.crt
>        -rw-------. 1 haproxy haproxy 5343 Aug  4 14:04
> overcloud-haproxy-ctlplane.pem
>        -rw-------. 1 haproxy haproxy 1891 Aug  4 14:00
> overcloud-haproxy-external.crt
>        -rw-------. 1 haproxy haproxy 2021 Aug  4 14:04
> overcloud-haproxy-internal_api.crt
>        -rw-r-----. 1 haproxy haproxy 5364 Aug  4 14:55
> overcloud-haproxy-internal_api.pem
>        -rw-------. 1 haproxy haproxy 1996 Aug  4 13:58
> overcloud-haproxy-storage.crt
>        -rw-------. 1 haproxy haproxy 2021 Aug  4 13:58
> overcloud-haproxy-storage_mgmt.crt
>        -rw-------. 1 haproxy haproxy 5364 Aug  4 13:58
> overcloud-haproxy-storage_mgmt.pem
>        -rw-------. 1 haproxy haproxy 5339 Aug  4 13:58
> overcloud-haproxy-storage.pem
> 
>        (container haproxy)[root@controller-1 /]# openssl x509 -enddate
> -noout -in  /etc/pki/tls/certs/haproxy/overcloud-haproxy-internal_api.pem
>         notAfter=Aug  4 14:04:46 2024 GMT                 <= we observe
> their expiration date
> 
>        (host)openssl s_client -connect 172.17.1.150:9292 2>/dev/null |
> openssl x509 -noout -dates
>        notBefore=Aug  2 15:07:17 2022 GMT
>        notAfter=Aug  2 15:07:17 2024 GMT                  <= the old
> certificate is still used

This should work, I need to investigate it.

> 
>   - libvirt-vnc-client-cert
>     - certificates renewed in /etc/pki/
>     - wrong post-save command: systemctl reload libvirtd    

I think the service name was changed to tripleo_nova_libvirt.

> 
>   - httpd-ctlplane, httpd-external, httpd-internal_api, httpd-storage,
> httpd-storage_mgmt, libvirt-vnc-client-cert
>     - I can't find exactly which container use those certificate, I let you
> check
>     - post-save command: pkill -USR1 httpd  => to check if it's working
> 
> 
> - On the compute nodes
> 
>   - ovn_controller, ovn_metadata
>      - certificate renewed in /etc/pki/tls/...
>      - no post-save command. Is it required to reload a service?

Yes, see the previously mentioned RFE.

> 
>   - libvirt-client-cert, libvirt-server-cert
>     - certificates renewed in /etc/pki/libvirt
>     - wrong post-save command: systemctl reload libvirtd 
> 
>   - qemu-nbd-client-cert
>     - certificates renewed in /etc/pki/libvirt-nbd
>     - wrong post-save command: systemctl reload libvirtd 
> 
>   - libvirt-vnc-server-cert, qemu-server-cert
>     - the certificate files are binded into the container; not the
> directories that contain them. The files are not updated until we restart
> the container
>   

There is an additional RFE to support reloading of the certificates, but only for 
libvirtd daemons certs. QEMU is still not supported.
https://bugzilla.redhat.com/show_bug.cgi?id=2071584

Comment 6 Grzegorz Grasza 2022-12-23 14:01:30 UTC
Hi, this looks correct, thanks for preparing this!


Note You need to log in before you can comment on or make changes to this bug.