Bug 1722089
Summary: | With TLS everywhere, unable to connect to VNC consoles of VMs. | |||
---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | David Hill <dhill> | |
Component: | openstack-tripleo-heat-templates | Assignee: | Douglas Mendizábal <dmendiza> | |
Status: | CLOSED WONTFIX | QA Contact: | Jeremy Agee <jagee> | |
Severity: | low | Docs Contact: | ||
Priority: | medium | |||
Version: | 13.0 (Queens) | CC: | aschultz, asimonel, dwilde, hrybacki, jhardee, jthomas, lyarwood, mburns, moguimar, mschuppe, mwitt, nkinder, pkesavar, pmannidi, pmorey, sputhenp | |
Target Milestone: | zstream | Keywords: | ZStream | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | openstack-tripleo-heat-templates-8.3.1-64.el7ost | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1736672 (view as bug list) | Environment: | ||
Last Closed: | 2021-05-26 18:09:41 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: |
Description
David Hill
2019-06-19 13:32:57 UTC
Additional info: If we look at the configuration it's as if the following code wasn't applied on the computes: nova::vncproxy::allow_vencrypt: true nova::vncproxy::allow_noauth: {if: [allow_noauth, true, false]} nova::vncproxy::vencrypt_key: /etc/pki/libvirt-vnc/client-key.pem nova::vncproxy::vencrypt_cert: /etc/pki/libvirt-vnc/client-cert.pem nova::vncproxy::vencrypt_ca: /etc/pki/libvirt-vnc/ca-cert.pem nova::ssl_only: true nova::cert: /etc/pki/tls/certs/novnc_proxy.crt nova::key: /etc/pki/tls/private/novnc_proxy.key generate_service_certificates: true as the controllers do have those values above. And we do have the following configured in qemu.conf: vnc_tls = 1 vnc_tls_x509_verify = 1 but if I'm not mistaken, we don't have any certificates on the compute in the /var/lib/config-data/puppet-generated/nova_libvirt/etc/pki/libvirt-vnc ... And looking at this, it might be as simple as the following bind: "HostConfig": { "Binds": [ "/etc/pki/libvirt/:/var/lib/kolla/config_files/src-tls/etc/pki/libvirt/:ro", Shouldn't it be in /etc/pki/libvirt-vnc ? It's there too finally: "/etc/pki/libvirt-vnc:/var/lib/kolla/config_files/src-libvirt-vnc-pki:ro", I turned on debugging as per this chunk of code: try: compute_sock = scheme.security_handshake(compute_sock) except exception.RFBAuthHandshakeFailed as e: # Intentionally don't tell client what really failed # as that's information leakage self._fail(tenant_sock, None, _("Unable to negotiate security with server")) LOG.debug("Auth failed %s", six.text_type(e)) raise exception.SecurityProxyNegotiationFailed( reason=_("Auth handshake failed")) and got this extra logging output which tells us exactly what the problem is: 2019-06-20 18:20:41.321 56 DEBUG nova.console.rfb.authvencrypt [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Server accepted the requested sub-auth type security_handshake /usr/lib/python2.7/site-packages/nova/console/rfb/authvencrypt.py:126 2019-06-20 18:20:41.328 56 DEBUG nova.console.securityproxy.rfb [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Auth failed Failed to complete auth handshake: Error establishing TLS connection to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618) connect /usr/lib/python2.7/site-packages/nova/console/securityproxy/rfb.py:190 2019-06-20 18:20:41.328 56 ERROR nova.console.websocketproxy [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Unable to perform security proxying, shutting down connection: SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed If we look at the certificate itself, it looks like it has the hostname in it instead of the IP address so when the SSL validation is executed in the client which used the IP address to establish the connection will not fail to notice the certificate name is for another host (unless I'm totally wrong here). So either we generate certificates containing IPs and name or we establish the SSL connection using the hostname instead of the IP. Perhaps 'host' should be the hostname here instead of an IP: 2019-06-20 18:31:19.555 385 INFO nova.console.websocketproxy [req-d8706654-f41a-4f5a-9316-b31b02b7d435 - - - - -] 35: connect info: {u'instance_uuid': u'31ac4e68-4f07-4e65-837b-43c1f614fb70', u'internal_access_path': None, u'last_activity_at': 1561055251.62715, u'console_type': u'novnc', u'host': u'10.10.10.10', u'token': u'26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'access_url': u'https://horizon.local:13080/vnc_auto.html?token=26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'port': u'5901'} Thanks for providing all the helpful debug logs and info. The SSL cert validation between the console proxy host and the guest is failing. And I think you're probably correct in what you said in comment 5, that the issue is that it's unable to match the IP address that's being connected to with the hostname that's in the cert (this is expected). I think the IP address you're seeing being connected to comes from the nova configuration option on the compute host: [vnc]server_proxyclient_address. This can be configured as an IP address or a hostname, and I think it should be set to a hostname [that will match the SSL cert]. Can you verify whether it's set to a hostname in your environment? I think [vnc]server_proxyclient_address should be set to a hostname, otherwise the alternative is as you suggested, to add an IP address in the SAN of the SSL cert, which would require regeneration of the cert any time the IP address changes in the future. Use of a hostname keeps things working across IP address changes that may occur. If we copy /etc/ipa/ca.crt to /etc/pki/libvirt-vnc/ca-cert.pem , it solves this issue. It appears like the /etc/pki/libvirt-vnc/ca-cert.pem is lacking the IPA issuer certificate . This issue looks similar to this one https://bugzilla.redhat.com/show_bug.cgi?id=1661621 ... I'm wondering if they hit this issue with the consoles or didn't notice it. You can set LibvirtVncCACert to /etc/ipa/ca.crt to overwrite the default CA file linked to /etc/pki/libvirt-vnc/ca-cert.pem [1]. With this you get /etc/ipa/ca.crt added to the kolla config files and places /etc/pki/libvirt-vnc/ca-cert.pem on the next container restart. Since there is no config change, you'd need to restart nova_libvirt container manually on existing computes. [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/nova-libvirt.yaml#L356 Hi Dave, can you clarify how the certificate structure looks like? In my env /etc/ipa/ca.crt and /etc/pki/CA/certs/vnc.crt are the same: [root@compute-0]# md5sum /etc/ipa/ca.crt 5e8dd534ead12311ebd2203f6bd41ac3 /etc/ipa/ca.crt [root@compute-0]# md5sum /etc/pki/CA/certs/vnc.crt 5e8dd534ead12311ebd2203f6bd41ac3 /etc/pki/CA/certs/vnc.crt Can we get the output of: openssl x509 -in /etc/ipa/ca.crt -noout -text openssl x509 -in /etc/pki/CA/certs/vnc.crt -noout -text Hey Martin, In /etc/ipa/ca.crt , we have 2 certificiates and in /etc/pki/CA/certs/vnc/crt, we only have 1. It really looks like we're missing an intermediate certificate in the chain and that's why it's failing to validate. Is it possible that the customer signed the IPA certificate with another authority and we need that certificate as well ? If this is needed, I can ask the customer. Thank you very much, David Hill (In reply to David Hill from comment #23) > Hey Martin, > > In /etc/ipa/ca.crt , we have 2 certificiates and in > /etc/pki/CA/certs/vnc/crt, we only have 1. > It really looks like we're missing an intermediate certificate in the chain > and that's why it's > failing to validate. Is it possible that the customer signed the IPA > certificate with another > authority and we need that certificate as well ? If this is needed, I can > ask the customer. Yes, this is possible. Maybe we could just get those to files which should help to answer our questions. Cheers, Martin We were able to reproduce the issue today. Setting up FreeIPA as a sub-ca did the trick. Marking the bug as triaged and requesting ACKs. Our next step is tracing how/where that certificate is being generated to determine why the root is being excluded. We have traced the issue back to an issue with certmonger[1]. tl;dr there is a known issue with certmonger that stopping it from pulling down all CA certificates. Once that RHBZ is resolved, this issue will go away. However, we do not now how long it will take before that RHBZ is closed out. After discussing options with Nate, we've opted to follow Martin's note in comment#27 -- we will (for TLS-E deployments) point the InternalTLSVncCAFile at /etc/ipa/ca.crt. Martin, I'm going to grab this RHBZ and start working on a fix. I will add you to reviews as they come out. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1710632 Preemptive backport proposed downstream. Downstream backport has merged. Proposed build created. Updating FIV and moving RHBZ to MODIFIED. Attaching upstream revert. We believe that this change likely induced a regression breaking tripleo TLS-E deployments[1]. Moving RHBZ back to ASSIGNED while we investigate further. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1743485 |