Bug 1722089 - With TLS everywhere, unable to connect to VNC consoles of VMs.
Summary: With TLS everywhere, unable to connect to VNC consoles of VMs.
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: zstream
: ---
Assignee: Douglas Mendizábal
QA Contact: Jeremy Agee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-06-19 13:32 UTC by David Hill
Modified: 2023-10-06 18:22 UTC (History)
16 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.3.1-64.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1736672 (view as bug list)
Environment:
Last Closed: 2021-05-26 18:09:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 673890 0 'None' MERGED Point InternalTLSVncCAFile to /etc/ipa/ca.crt 2021-02-15 15:28:46 UTC
OpenStack gerrit 677551 0 'None' MERGED Revert "Point InternalTLSVncCAFile to /etc/ipa/ca.crt" 2021-02-15 15:28:46 UTC
Red Hat Issue Tracker OSP-150 0 None None None 2022-10-03 16:38:42 UTC
Red Hat Knowledge Base (Solution) 4180891 0 Troubleshoot None Horizon console fails to connect when TLS everywhere is enabled 2019-07-09 23:08:25 UTC

Description David Hill 2019-06-19 13:32:57 UTC
Description of problem:

With TLS everywhere, unable to connect to VNC consoles of VMs.

When trying to access the console with TLS everywhere enabled we are getting a 1006 error in the console and in the nova-novncproxy.log it states the handshake failed, see below. Vencrypt is enabled and if I understand correctly the backend shouldn't to the compute nodes shouldn't be using ssl?

2019-05-29 13:27:36.150 55 INFO nova.console.websocketproxy [-] 10.10.10.10 - - [29/May/2019 13:27:36] 10.10.10.10: Path: '/websockify?token=c24dc66f-8ad4-4186-ac88-5ea4a0b44c50'
2019-05-29 13:27:36.155 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_port" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url).  Its value may be silently ignored in the future.
2019-05-29 13:27:36.156 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_userid" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url).  Its value may be silently ignored in the future.
2019-05-29 13:27:36.156 55 WARNING oslo_config.cfg [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Option "rabbit_password" from group "oslo_messaging_rabbit" is deprecated for removal (Replaced by [DEFAULT]/transport_url).  Its value may be silently ignored in the future.
2019-05-29 13:27:36.455 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -]   4: connect info: {u'instance_uuid': u'b33bbaa5-9044-483f-887e-2cbf17d28630', u'internal_access_path': None, u'last_activity_at': 1559136455.572782, u'console_type': u'novnc', u'host': u'10.10.10.11', u'token': u'c24dc66f-8ad4-4186-ac88-5ea4a0b44c50', u'access_url': u'https://horizon.local:13080/vnc_auto.html?token=c24dc66f-8ad4-4186-ac88-5ea4a0b44c50', u'port': u'5904'}
2019-05-29 13:27:36.456 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -]   4: connecting to: 10.10.10.11:5904
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] Unable to perform security proxying, shutting down connection: SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy Traceback (most recent call last):
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/nova/console/websocketproxy.py", line 215, in new_websocket_client
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy     tsock = self.server.security_proxy.connect(tenant_sock, tsock)
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy   File "/usr/lib/python2.7/site-packages/nova/console/securityproxy/rfb.py", line 192, in connect
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy     reason=_("Auth handshake failed"))
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed
2019-05-29 13:27:36.475 55 ERROR nova.console.websocketproxy
2019-05-29 13:27:36.477 55 INFO nova.console.websocketproxy [req-114edc96-7e5e-4f01-b1c6-569249727861 - - - - -] handler exception: Failed to negotiate security type with server: Auth handshake failed

Version-Release number of selected component (if applicable):
latest container releases with fixes from this BZ https://bugzilla.redhat.com/1613161

How reproducible:
Always

Steps to Reproduce:
1. Try to connect to the VNC console
2.
3.

Actual results:
Fails with error above

Expected results:
Succeeds

Additional info:
If we look at the configuration it's as if the following code wasn't applied:

                nova::vncproxy::allow_vencrypt: true
                nova::vncproxy::allow_noauth: {if: [allow_noauth, true, false]}
                nova::vncproxy::vencrypt_key: /etc/pki/libvirt-vnc/client-key.pem
                nova::vncproxy::vencrypt_cert: /etc/pki/libvirt-vnc/client-cert.pem
                nova::vncproxy::vencrypt_ca: /etc/pki/libvirt-vnc/ca-cert.pem
                nova::ssl_only: true
                nova::cert: /etc/pki/tls/certs/novnc_proxy.crt
                nova::key: /etc/pki/tls/private/novnc_proxy.key
                generate_service_certificates: true

Comment 1 David Hill 2019-06-19 13:39:36 UTC
Additional info:
If we look at the configuration it's as if the following code wasn't applied on the computes:

                nova::vncproxy::allow_vencrypt: true
                nova::vncproxy::allow_noauth: {if: [allow_noauth, true, false]}
                nova::vncproxy::vencrypt_key: /etc/pki/libvirt-vnc/client-key.pem
                nova::vncproxy::vencrypt_cert: /etc/pki/libvirt-vnc/client-cert.pem
                nova::vncproxy::vencrypt_ca: /etc/pki/libvirt-vnc/ca-cert.pem
                nova::ssl_only: true
                nova::cert: /etc/pki/tls/certs/novnc_proxy.crt
                nova::key: /etc/pki/tls/private/novnc_proxy.key
                generate_service_certificates: true

as the controllers do have those values above.

Comment 2 David Hill 2019-06-19 13:44:01 UTC
And we do have the following configured in qemu.conf:

vnc_tls = 1
vnc_tls_x509_verify = 1


but if I'm not mistaken, we don't have any certificates on the compute in the /var/lib/config-data/puppet-generated/nova_libvirt/etc/pki/libvirt-vnc ...

Comment 3 David Hill 2019-06-19 13:47:10 UTC
And looking at this, it might be as simple as the following bind:

        "HostConfig": {
            "Binds": [
                "/etc/pki/libvirt/:/var/lib/kolla/config_files/src-tls/etc/pki/libvirt/:ro",


Shouldn't it be in /etc/pki/libvirt-vnc ?

Comment 4 David Hill 2019-06-19 13:48:49 UTC
It's there too finally:
                "/etc/pki/libvirt-vnc:/var/lib/kolla/config_files/src-libvirt-vnc-pki:ro",

Comment 5 David Hill 2019-06-20 18:27:10 UTC
I turned on debugging as per this chunk of code:
        try:
            compute_sock = scheme.security_handshake(compute_sock)
        except exception.RFBAuthHandshakeFailed as e:
            # Intentionally don't tell client what really failed
            # as that's information leakage
            self._fail(tenant_sock, None,
                       _("Unable to negotiate security with server"))
            LOG.debug("Auth failed %s", six.text_type(e))
            raise exception.SecurityProxyNegotiationFailed(
                reason=_("Auth handshake failed"))

and got this extra logging output which tells us exactly what the problem is:

2019-06-20 18:20:41.321 56 DEBUG nova.console.rfb.authvencrypt [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Server accepted the requested sub-auth type security_handshake /usr/lib/python2.7/site-packages/nova/console/rfb/authvencrypt.py:126
2019-06-20 18:20:41.328 56 DEBUG nova.console.securityproxy.rfb [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Auth failed Failed to complete auth handshake: Error establishing TLS connection to server: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:618) connect /usr/lib/python2.7/site-packages/nova/console/securityproxy/rfb.py:190
2019-06-20 18:20:41.328 56 ERROR nova.console.websocketproxy [req-8194d8b1-360a-4cae-a9b3-94620ec1bf34 - - - - -] Unable to perform security proxying, shutting down connection: SecurityProxyNegotiationFailed: Failed to negotiate security type with server: Auth handshake failed


If we look at the certificate itself, it looks like it has the hostname in it instead of the IP address so when the SSL validation is executed in the client which used the IP address to establish the connection will not fail to notice the certificate name is for another host (unless I'm totally wrong here).

So either we generate certificates containing IPs and name or we establish the SSL connection using the hostname instead of the IP.

Comment 6 David Hill 2019-06-20 18:38:22 UTC
Perhaps 'host' should be the hostname here instead of an IP:

2019-06-20 18:31:19.555 385 INFO nova.console.websocketproxy [req-d8706654-f41a-4f5a-9316-b31b02b7d435 - - - - -]  35: connect info: {u'instance_uuid': u'31ac4e68-4f07-4e65-837b-43c1f614fb70', u'internal_access_path': None, u'last_activity_at': 1561055251.62715, u'console_type': u'novnc', u'host': u'10.10.10.10', u'token': u'26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'access_url': u'https://horizon.local:13080/vnc_auto.html?token=26b06dcd-6a43-411f-bac2-5b30f4432ef0', u'port': u'5901'}

Comment 10 melanie witt 2019-06-21 01:30:51 UTC
Thanks for providing all the helpful debug logs and info.

The SSL cert validation between the console proxy host and the guest is failing. And I think you're probably correct in what you said in comment 5, that the issue is that it's unable to match the IP address that's being connected to with the hostname that's in the cert (this is expected).

I think the IP address you're seeing being connected to comes from the nova configuration option on the compute host: [vnc]server_proxyclient_address. This can be configured as an IP address or a hostname, and I think it should be set to a hostname [that will match the SSL cert]. Can you verify whether it's set to a hostname in your environment?

I think [vnc]server_proxyclient_address should be set to a hostname, otherwise the alternative is as you suggested, to add an IP address in the SAN of the SSL cert, which would require regeneration of the cert any time the IP address changes in the future. Use of a hostname keeps things working across IP address changes that may occur.

Comment 18 David Hill 2019-07-09 21:36:26 UTC
If we copy /etc/ipa/ca.crt to /etc/pki/libvirt-vnc/ca-cert.pem , it solves this issue.

Comment 19 David Hill 2019-07-09 23:03:03 UTC
It appears like the /etc/pki/libvirt-vnc/ca-cert.pem is lacking the IPA issuer certificate .

Comment 20 David Hill 2019-07-09 23:41:10 UTC
This issue looks similar to this one https://bugzilla.redhat.com/show_bug.cgi?id=1661621 ... I'm wondering if they hit this issue with the consoles or didn't notice it.

Comment 21 Martin Schuppert 2019-07-10 12:35:09 UTC
You can set LibvirtVncCACert to /etc/ipa/ca.crt to overwrite the default CA file linked to /etc/pki/libvirt-vnc/ca-cert.pem [1].
With this you get /etc/ipa/ca.crt added to the kolla config files and places /etc/pki/libvirt-vnc/ca-cert.pem on the next container restart.
Since there is no config change, you'd need to restart nova_libvirt container manually on existing computes.

[1] https://github.com/openstack/tripleo-heat-templates/blob/stable/queens/docker/services/nova-libvirt.yaml#L356

Comment 22 Martin Schuppert 2019-07-11 12:29:25 UTC
Hi Dave, 

can you clarify how the certificate structure looks like? In my env /etc/ipa/ca.crt and /etc/pki/CA/certs/vnc.crt are the same:

[root@compute-0]# md5sum /etc/ipa/ca.crt
5e8dd534ead12311ebd2203f6bd41ac3  /etc/ipa/ca.crt

[root@compute-0]# md5sum /etc/pki/CA/certs/vnc.crt
5e8dd534ead12311ebd2203f6bd41ac3  /etc/pki/CA/certs/vnc.crt


Can we get the output of:
openssl x509 -in /etc/ipa/ca.crt -noout -text
openssl x509 -in /etc/pki/CA/certs/vnc.crt -noout -text

Comment 23 David Hill 2019-07-11 12:40:54 UTC
Hey Martin,

   In /etc/ipa/ca.crt , we have 2 certificiates and in /etc/pki/CA/certs/vnc/crt, we only have 1.
It really looks like we're missing an intermediate certificate in the chain and that's why it's
failing to validate.   Is it possible that the customer signed the IPA certificate with another
authority and we need that certificate as well ?   If this is needed, I can ask the customer.

Thank you very much,

David Hill

Comment 24 Martin Schuppert 2019-07-11 12:52:01 UTC
(In reply to David Hill from comment #23)
> Hey Martin,
> 
>    In /etc/ipa/ca.crt , we have 2 certificiates and in
> /etc/pki/CA/certs/vnc/crt, we only have 1.
> It really looks like we're missing an intermediate certificate in the chain
> and that's why it's
> failing to validate.   Is it possible that the customer signed the IPA
> certificate with another
> authority and we need that certificate as well ?   If this is needed, I can
> ask the customer.

Yes, this is possible. Maybe we could just get those to files which should help
to answer our questions.

Cheers,
Martin

Comment 28 Harry Rybacki 2019-07-24 21:26:00 UTC
We were able to reproduce the issue today. Setting up FreeIPA as a sub-ca did the trick. Marking the bug as triaged and requesting ACKs.

Our next step is tracing how/where that certificate is being generated to determine why the root is being excluded.

Comment 30 Harry Rybacki 2019-07-25 17:02:09 UTC
We have traced the issue back to an issue with certmonger[1]. tl;dr there is a known issue with certmonger that stopping it from pulling down all CA certificates.

Once that RHBZ is resolved, this issue will go away. However, we do not now how long it will take before that RHBZ is closed out.

After discussing options with Nate, we've opted to follow Martin's note in comment#27 -- we will (for TLS-E deployments) point the InternalTLSVncCAFile at /etc/ipa/ca.crt.

Martin, I'm going to grab this RHBZ and start working on a fix. I will add you to reviews as they come out.


[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1710632

Comment 32 Harry Rybacki 2019-08-01 17:30:21 UTC
Preemptive backport proposed downstream.

Comment 33 Harry Rybacki 2019-08-01 19:47:32 UTC
Downstream backport has merged. Proposed build created. Updating FIV and moving RHBZ to MODIFIED.

Comment 40 Harry Rybacki 2019-08-20 19:26:34 UTC
Attaching upstream revert. We believe that this change likely induced a regression breaking tripleo TLS-E deployments[1].

Moving RHBZ back to ASSIGNED while we investigate further.


[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1743485


Note You need to log in before you can comment on or make changes to this bug.