Description of problem: With TLS everywhere, unable to connect to VNC consoles of VMs. When trying to access the console with TLS everywhere enabled we are getting a 1006 error in the console and in the nova-novncproxy.log it states the handshake failed, see below. Vencrypt is enabled and if I understand correctly the backend shouldn't to the compute nodes shouldn't be using ssl? 2019-05-29 13:27:36.150 55 INFO nova.console.websocketproxy [-] 10.10.10.10 - - [29/May/2019 13:27:36] 10.10.10.10: Path: '/websockify?token=c24dc66f-8ad4-4186-ac88-5ea4a0b44c50' 2019-05-29 13:27:36.155 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_port" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url). Its value may be silently ignored in the future. 2019-05-29 13:27:36.156 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_userid" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url). Its value may be silently ignored in the future. 2019-05-29 13:27:36.156 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_password" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url). Its value may be silently ignored in the future. 2019-05-29 13:27:36.455 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] 4: connect info: {u'instance_uuid': u'b33bbaa5-9044-483f-887e-2cbf17d28630', u'internal_access_path': None, u'last_activity_at': 1559136455.572782, u'console_type': u'novnc', u'host': u'10.10.10.11', u'token': u'c24dc66f-8ad4-4186-ac88-5ea4a0b44c50', u'access_url': u'https://horizon.local:13080/vnc_auto.html?token=c24dc66f-8ad4-4186-ac88-5ea4a0b44c50', u'port': u'5904'} 2019-05-29 13:27:36.456 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] 4: connecting to: 10.10.10.11:5904 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Unable to perform security proxying, shutting down connection: SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy Traceback (most recent call last): 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/nova/console/websocketproxy.py", line 215, in new_websocket_client 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy tsock = self.server.security_proxy.connect(tenant_sock, tsock) 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy File "/usr/lib/python2.7/site-packages/nova/console/securityproxy/rfb.py", line 192, in connect 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy reason=_("Auth handshake failed")) 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed 2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy 2019-05-29 13:27:36.477 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] handler exception: Failed to negotiate security type with server: Auth handshake failed Version-Release number of selected component (if applicable): latest container releases with fixes from this BZ https://bugzilla.redhat.com/1613161 How reproducible: Always Steps to Reproduce: 1. Try to connect to the VNC console 2. 3. Actual results: Fails with error above Expected results: Succeeds Additional info: If we look at the configuration it's as if the following code wasn't applied: nova::vncproxy::allow_vencrypt: true nova::vncproxy::allow_noauth: {if: [allow_noauth, true, false]} nova::vncproxy::vencrypt_key: /etc/pki/libvirt-vnc/client-key.pem nova::vncproxy::vencrypt_cert: /etc/pki/libvirt-vnc/client-cert.pem nova::vncproxy::vencrypt_ca: /etc/pki/libvirt-vnc/ca-cert.pem nova::ssl_only: true nova::cert: /etc/pki/tls/certs/novnc_proxy.crt nova::key: /etc/pki/tls/private/novnc_proxy.key generate_service_certificates: true
Additional info: If we look at the configuration it's as if the following code wasn't applied on the computes: nova::vncproxy::allow_vencrypt: true nova::vncproxy::allow_noauth: {if: [allow_noauth, true, false]} nova::vncproxy::vencrypt_key: /etc/pki/libvirt-vnc/client-key.pem nova::vncproxy::vencrypt_cert: /etc/pki/libvirt-vnc/client-cert.pem nova::vncproxy::vencrypt_ca: /etc/pki/libvirt-vnc/ca-cert.pem nova::ssl_only: true nova::cert: /etc/pki/tls/certs/novnc_proxy.crt nova::key: /etc/pki/tls/private/novnc_proxy.key generate_service_certificates: true as the controllers do have those values above.
And we do have the following configured in qemu.conf: vnc_tls = 1 vnc_tls_x509_verify = 1 but if I'm not mistaken, we don't have any certificates on the compute in the /var/lib/config-data/puppet-generated/nova_libvirt/etc/pki/libvirt-vnc ...
And looking at this, it might be as simple as the following bind: "HostConfig": { "Binds": [ "/etc/pki/libvirt/:/var/lib/kolla/config_files/src-tls/etc/pki/libvirt/:ro", Shouldn't it be in /etc/pki/libvirt-vnc ?
It's there too finally: "/etc/pki/libvirt-vnc:/var/lib/kolla/config_files/src-libvirt-vnc-pki:ro",
I turned on debugging as per this chunk of code: try: compute_sock = scheme.security_handshake(compute_sock) except exception.RFBAuthHandshakeFailed as e: # Intentionally don't tell client what really failed # as that's information leakage self._fail(tenant_sock, None, _("Unable to negotiate security with server")) LOG.debug("Auth failed %s", six.text_type(e)) raise exception.SecurityProxyNegotiationFailed( reason=_("Auth handshake failed")) and got this extra logging output which tells us exactly what the problem is: 2019-06-20 18:20:41.321 56 DEBUG nova.console.rfb.authvencrypt [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Server accepted the requested sub-auth type security_handshake /usr/lib/python2.7/site-packages/nova/console/rfb/authvencrypt.py:126 2019-06-20 18:20:41.328 56 DEBUG nova.console.securityproxy.rfb [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Auth failed Failed to complete auth handshake: Error establishing TLS connection to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618) connect /usr/lib/python2.7/site-packages/nova/console/securityproxy/rfb.py:190 2019-06-20 18:20:41.328 56 ERROR nova.console.websocketproxy [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Unable to perform security proxying, shutting down connection: SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed If we look at the certificate itself, it looks like it has the hostname in it instead of the IP address so when the SSL validation is executed in the client which used the IP address to establish the connection will not fail to notice the certificate name is for another host (unless I'm totally wrong here). So either we generate certificates containing IPs and name or we establish the SSL connection using the hostname instead of the IP.
Perhaps 'host' should be the hostname here instead of an IP: 2019-06-20 18:31:19.555 385 INFO nova.console.websocketproxy [req-d8706654-f41a-4f5a-9316-b31b02b7d435 - - - - -] 35: connect info: {u'instance_uuid': u'31ac4e68-4f07-4e65-837b-43c1f614fb70', u'internal_access_path': None, u'last_activity_at': 1561055251.62715, u'console_type': u'novnc', u'host': u'10.10.10.10', u'token': u'26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'access_url': u'https://horizon.local:13080/vnc_auto.html?token=26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'port': u'5901'}
Thanks for providing all the helpful debug logs and info. The SSL cert validation between the console proxy host and the guest is failing. And I think you're probably correct in what you said in comment 5, that the issue is that it's unable to match the IP address that's being connected to with the hostname that's in the cert (this is expected). I think the IP address you're seeing being connected to comes from the nova configuration option on the compute host: [vnc]server_proxyclient_address. This can be configured as an IP address or a hostname, and I think it should be set to a hostname [that will match the SSL cert]. Can you verify whether it's set to a hostname in your environment? I think [vnc]server_proxyclient_address should be set to a hostname, otherwise the alternative is as you suggested, to add an IP address in the SAN of the SSL cert, which would require regeneration of the cert any time the IP address changes in the future. Use of a hostname keeps things working across IP address changes that may occur.
If we copy /etc/ipa/ca.crt to /etc/pki/libvirt-vnc/ca-cert.pem , it solves this issue.
It appears like the /etc/pki/libvirt-vnc/ca-cert.pem is lacking the IPA issuer certificate .
This issue looks similar to this one https://bugzilla.redhat.com/show_bug.cgi?id=1661621 ... I'm wondering if they hit this issue with the consoles or didn't notice it.
You can set LibvirtVncCACert to /etc/ipa/ca.crt to overwrite the default CA file linked to /etc/pki/libvirt-vnc/ca-cert.pem [1]. With this you get /etc/ipa/ca.crt added to the kolla config files and places /etc/pki/libvirt-vnc/ca-cert.pem on the next container restart. Since there is no config change, you'd need to restart nova_libvirt container manually on existing computes. [1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/nova-libvirt.yaml#L356
Hi Dave, can you clarify how the certificate structure looks like? In my env /etc/ipa/ca.crt and /etc/pki/CA/certs/vnc.crt are the same: [root@compute-0]# md5sum /etc/ipa/ca.crt 5e8dd534ead12311ebd2203f6bd41ac3 /etc/ipa/ca.crt [root@compute-0]# md5sum /etc/pki/CA/certs/vnc.crt 5e8dd534ead12311ebd2203f6bd41ac3 /etc/pki/CA/certs/vnc.crt Can we get the output of: openssl x509 -in /etc/ipa/ca.crt -noout -text openssl x509 -in /etc/pki/CA/certs/vnc.crt -noout -text
Hey Martin, In /etc/ipa/ca.crt , we have 2 certificiates and in /etc/pki/CA/certs/vnc/crt, we only have 1. It really looks like we're missing an intermediate certificate in the chain and that's why it's failing to validate. Is it possible that the customer signed the IPA certificate with another authority and we need that certificate as well ? If this is needed, I can ask the customer. Thank you very much, David Hill
(In reply to David Hill from comment #23) > Hey Martin, > > In /etc/ipa/ca.crt , we have 2 certificiates and in > /etc/pki/CA/certs/vnc/crt, we only have 1. > It really looks like we're missing an intermediate certificate in the chain > and that's why it's > failing to validate. Is it possible that the customer signed the IPA > certificate with another > authority and we need that certificate as well ? If this is needed, I can > ask the customer. Yes, this is possible. Maybe we could just get those to files which should help to answer our questions. Cheers, Martin
We were able to reproduce the issue today. Setting up FreeIPA as a sub-ca did the trick. Marking the bug as triaged and requesting ACKs. Our next step is tracing how/where that certificate is being generated to determine why the root is being excluded.
We have traced the issue back to an issue with certmonger[1]. tl;dr there is a known issue with certmonger that stopping it from pulling down all CA certificates. Once that RHBZ is resolved, this issue will go away. However, we do not now how long it will take before that RHBZ is closed out. After discussing options with Nate, we've opted to follow Martin's note in comment#27 -- we will (for TLS-E deployments) point the InternalTLSVncCAFile at /etc/ipa/ca.crt. Martin, I'm going to grab this RHBZ and start working on a fix. I will add you to reviews as they come out. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1710632
Preemptive backport proposed downstream.
Downstream backport has merged. Proposed build created. Updating FIV and moving RHBZ to MODIFIED.
Attaching upstream revert. We believe that this change likely induced a regression breaking tripleo TLS-E deployments[1]. Moving RHBZ back to ASSIGNED while we investigate further. [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1743485