Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1765839

Summary: openstack undercloud install don't update the Certificate
Product: Red Hat OpenStack Reporter: Luigi Tamagnone <ltamagno>
Component: openstack-tripleoAssignee: Roger Heslop <rheslop>
Status: CLOSED CURRENTRELEASE QA Contact: Jeremy Agee <jagee>
Severity: medium Docs Contact:
Priority: medium    
Version: 13.0 (Queens)CC: alee, dwilde, ggrasza, jslagle, mburns, moguimar, rmascena
Target Milestone: z13Keywords: Documentation, Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-28 17:11:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Luigi Tamagnone 2019-10-26 15:16:27 UTC
Description of problem:
During undercloud restore openstack undercloud install failed with:

2019-10-22 06:33:28,166 INFO: Failed to discover available identity versions when contacting https://x.x.x
.x:13000/. Attempting to parse version from URL.
2019-10-22 06:33:28,166 INFO: Could not determine a suitable URL for the plugin
2019-10-22 06:33:28,218 INFO: [2019-10-22 06:33:28,217] (os-refresh-config) [ERROR] during post-configure pha
se. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returned non-zero exit st
atus 1]
2019-10-22 06:33:28,218 INFO: 
2019-10-22 06:33:28,218 INFO: [2019-10-22 06:33:28,217] (os-refresh-config) [ERROR] Aborting...
2019-10-22 06:33:28,227 DEBUG: An exception occurred
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 2432, in install
    _run_orc(instack_env)
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 1610, in _run_orc
    _run_live_command(args, instack_env, 'os-refresh-config')
  File "/usr/lib/python2.7/site-packages/instack_undercloud/undercloud.py", line 673, in _run_live_command
    raise RuntimeError('%s failed. See log for details.' % name)
RuntimeError: os-refresh-config failed. See log for details.
2019-10-22 06:33:28,228 ERROR: 
#############################################################################
Undercloud install failed.

Version-Release number of selected component (if applicable):
RHOSP13:
ansible-tripleo-ipsec-8.1.1-0.20190513184007.7eb892c.el7ost.noarch Mon Oct 21 10:33:22 2019
openstack-tripleo-common-8.6.8-11.el7ost.noarch             Mon Oct 21 10:33:27 2019
openstack-tripleo-common-containers-8.6.8-11.el7ost.noarch  Mon Oct 21 10:33:24 2019
openstack-tripleo-heat-templates-8.3.1-54.el7ost.noarch     Mon Oct 21 10:33:27 2019
openstack-tripleo-image-elements-8.0.2-1.el7ost.noarch      Mon Oct 21 10:33:29 2019
openstack-tripleo-puppet-elements-8.0.2-2.el7ost.noarch     Mon Oct 21 10:33:03 2019
openstack-tripleo-ui-8.3.2-3.el7ost.noarch                  Mon Oct 21 10:41:24 2019
openstack-tripleo-validations-8.4.5-1.el7ost.noarch         Mon Oct 21 10:33:04 2019
puppet-tripleo-8.4.1-20.el7ost.noarch                       Mon Oct 21 10:33:25 2019
python-tripleoclient-9.2.7-9.el7ost.noarch                  Mon Oct 21 10:33:29 2019

How reproducible:
I also try on our env. I tried to change the certificate following our guide[1]. I set a wrong CN and I receive a close error, next I regenerate the certificate with the right CN and nothing changes during undercloud install. But If I create a cert automatically with certificate_generation_ca and generate_service_certificate the certificate change.

Steps to Reproduce:
1. On a pre-deployed env generate[1] a wrong certificate and run:
  openstack undercloud install
  the command fail with:
  Could not determine a suitable URL for the plugin
2. check the certificate with:
  openssl s_client -connect <IP>:13000
  the certificate has a wrong CN or AltName or something else
3. generate a new correct certificate and run:
  openstack undercloud install
  the command fails with:
  Could not determine a suitable URL for the plugin
4. check the certificate with:
  openssl s_client -connect <IP>:13000
  the certificate doesn't change

Actual results:
The certificate doesn't change for the same CA if you generate a new one

Expected results:
The certificate change if you generate a new one

Additional info:
[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/appe-ssltls_certificate_configuration

Comment 1 Dougal Matthews 2019-10-28 12:24:40 UTC
I'm not sure about this one, but I think we could do with input from DFG:Security. Let me know if this is wrong!

Comment 8 Dave Wilde 2020-04-16 19:56:25 UTC
I've done some more research into this issue and I believe I have isolated the root cause, my notes from my investigation are here[1] for the curious but the TLDR is that I was mistaken in my original assumption that haproxy was not being restarted, the issue lies in the bind mounted certificate file that haproxy is using.  What I found is that the PEM file is being bind mounted into the haproxy container and due to the nature of bind mounts this is essentially acting as a hard link, liking the inode of the original file to the target inside of the container.  Even though the original file is correctly replaced with the new certificate the container is still maintaining a link to the old inode:

(undercloud) [stack@undercloud ~]$ sudo stat /etc/pki/tls/private/overcloud_endpoint.pem
  File: ‘/etc/pki/tls/private/overcloud_endpoint.pem’
  Size: 3381            Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 71375612    Links: 1
Access: (0440/-r--r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:cert_t:s0
Access: 2020-04-16 18:03:57.123790389 +0000
Modify: 2020-04-16 18:03:57.011785706 +0000
Change: 2020-04-16 18:03:57.329799001 +0000
 Birth: -
(undercloud) [stack@undercloud ~]$ sudo podman exec -it $(sudo podman ps | awk '/haproxy/ { print $1}') stat /var/lib/kolla/config_files/src-tls//etc/pki/tls/private/overcloud_endpoint.pem
  File: ‘/var/lib/kolla/config_files/src-tls//etc/pki/tls/private/overcloud_endpoint.pem’
  Size: 7236            Blocks: 16         IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 46261016    Links: 0
Access: (0440/-r--r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2020-04-16 18:03:54.787692730 +0000
Modify: 2020-04-15 21:31:43.557632361 +0000
Change: 2020-04-16 18:03:57.123790389 +0000
 Birth: -

From what I can tell the only way to update that bind mounted link is to restart the container (which is why we don't see this on master as it fully restarts the haproxy container).  I can see two possible fixes for this, the first is to change the postsave_cmd in haproxy-public-tls-certmonger.yaml[2] from the reload action to the restart action, or to bind mount a directory for the certificates rather than the file itself.  The /etc/pki/tls/private directory can be problematic for bind mounting as there can be other keys in that directory that could be exposed should haproxy be compromised.  There is an /etc/pki/tls/private/haproxy directory that is created that we could possibly bind mount and store the certificate in.

[1]: https://hackmd.io/U56lZfNMRZmbO3P3eCMouw
[2]: https://github.com/openstack/tripleo-heat-templates/blob/stable/train/deployment/haproxy/haproxy-public-tls-certmonger.yaml#L80

Comment 15 Roger Heslop 2020-07-28 17:11:35 UTC
Changed running text to include restarting of HAproxy | Changes published: https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html/director_installation_and_usage/appe-ssltls_certificate_configuration