Bug 1945760
| Summary: | [OSP16.2] TLS-e live volume back migration failure, with error 'blockdev-add Cannot Read from TLS channel' | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | James Parker <jparker> | ||||||
| Component: | openstack-tripleo-heat-templates | Assignee: | David Vallee Delisle <dvd> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | James Parker <jparker> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | 16.2 (Train) | CC: | berrange, dasmith, dvd, eglynn, jhakimra, kchamart, mburns, mschuppe, sbauza, sgordon, vromanso | ||||||
| Target Milestone: | beta | Keywords: | Patch, Triaged, UpgradeBlocker | ||||||
| Target Release: | 16.2 (Train on RHEL 8.4) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | openstack-tripleo-heat-templates-11.5.1-2.20210430004816.cbef0f2.el8ost puppet-nova-15.7.1-2.20210423004733.43cd2b4.el8ost | Doc Type: | If docs needed, set a value | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-09-15 07:13:42 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | Train | ||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1965124 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1770071 [details]
Destination target node's libvirtd logs with debug level enabled
Something is off in the TLS setup. Can you please test with the following?
Both on your compute nodes you are missing the below config attributes
in your /etc/libvirt/qemu.conf:
default_tls_x509_cert_dir = "/etc/pki/qemu"
default_tls_x509_verify = 1
(See: https://docs.openstack.org/nova/latest/admin/secure-live-migration-with-qemu-native-tls.html)
And this is what you have on your source and destination 'nova_libvirt'
containers:
[root@compute-1 /]# egrep -v '^$|^#' /etc/libvirt/qemu.conf
max_files = 32768
max_processes = 131072
vnc_tls = 1
vnc_tls_x509_verify = 1
nbd_tls = 1
migration_port_min = 61152
migration_port_max = 61215
Note the missing default_tls_* attributes.
- - -
Also, running `virt-pki-validate` fails on both source and destination
'nova_libvirt' containers:
[root@compute-0 qemu]# virt-pki-validate
Found /usr/bin/certtool
Found CA certificate /etc/pki/CA/cacert.pem for Certificate Authority
The CA certificate and the client certificate do not match
CA organization: Certificate Authority
Client organization: REDHAT.LOCAL
Found client certificate /etc/pki/libvirt/clientcert.pem for compute-0.ctlplane.redhat.local
Found client private key /etc/pki/libvirt/private/clientkey.pem
The client private key need to be read by client tools
as root do: chmod 644 /etc/pki/libvirt/private/clientkey.pem
The CA certificate and the server certificate do not match
CA organization: Certificate Authority
Server organization: REDHAT.LOCAL
The server certificate does not seem to match the host name
hostname: "compute-0.redhat.local"
Server certificate CN: "compute-0.ctlplane.redhat.local"
Found server certificate /etc/pki/libvirt/servercert.pem for compute-0.ctlplane.redhat.local
Found server private key /etc/pki/libvirt/private/serverkey.pem
Make sure /etc/sysconfig/libvirtd is setup to listen to
TCP/IP connections and restart the libvirtd service
Make sure /etc/sysconfig/iptables is setup to allow
incoming TCP/IP connections on port 16514 and
restart the iptables service
[root@compute-0 qemu]#
And similar the other compute.
@David: Please see my comment#6 — seems like TripleO is not setting the required config attribute: 'default_tls_x509_cert_dir'? Dan, any thoughts on this 'blockdev-add' failure that seems to be coming
from QEMU's I/O channels TLS driver?
Context: The OpenStack test that is failing here is live-migrating an
instance with a disk attached to it in a non-shared storage, and TLS
(misconfigured?). So NBD is involved here.
And here's how 'blockdev-add' is erroring out:
-----------------------------------------------------------------------
...
2021-04-07 23:43:51.427+0000: 23837: debug : qemuMonitorJSONCheckErrorFull:404 : unable to execute QEMU command {"execute":"blockdev-add","arguments":{"driver":"nbd","server":{"type":"inet","host":"compute-0.ctlplane.redhat.local","port":"61153"},"export":"drive-virtio-disk0","tls-creds":"objlibvirt_migrate_tls0","node-name":"migration-vda-storage","read-only":false,"discard":"unmap"},"id":"libvirt-388"}: {"id":"libvirt-388","error":{"class":"GenericError","desc":"Failed to read option reply: Cannot read from TLS channel: Software caused connection abort"}}
2021-04-07 23:43:51.427+0000: 23837: error : qemuMonitorJSONCheckErrorFull:418 : internal error: unable to execute QEMU command 'blockdev-add': Failed to read option reply: Cannot read from TLS channel: Software caused connection abort
...
-----------------------------------------------------------------------
I see the "check TLS authorization" part from the QEMU I/O test
233.out matches the above signature:
https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=tests/qemu-iotests/233.out#l61
Basing on that, I'm deducing that TLS setup here in this QE env is
broken. And the `virt-pki-validate` output in comment#6 seems to
indicate that too.
- - -
Meanwhile, here are the config settings in qemu.conf on both source and
destination compute nodes:
$> egrep -v '^$|^#' /etc/libvirt/qemu.conf
max_files = 32768
max_processes = 131072
vnc_tls = 1
vnc_tls_x509_verify = 1
nbd_tls = 1
migration_port_min = 61152
migration_port_max = 61215
# these two were added in a subsequent test; but even without these
# the migration fails the same
default_tls_x509_cert_dir = "/etc/pki/qemu"
default_tls_x509_verify = 1
And libvirtd.conf from both source and destination:
$> egrep -v '^$|^#' /etc/libvirt/libvirtd.conf
listen_tls=1
listen_tcp=0
listen_addr="192.168.24.37"
unix_sock_group="libvirt"
unix_sock_ro_perms="0777"
unix_sock_rw_perms="0770"
auth_unix_ro="none"
auth_unix_rw="none"
auth_tls="sasl"
tls_priority="NORMAL:-VERS-SSL3.0:-VERS-TLS-ALL:+VERS-TLS1.2"
Note virt-pki-validate is validating *libvirt's* TLS setup. The problem here is with *QEMU's* TLS setup - this is files in /etc/pki/qemu. There is only ca-cert.pem and server-cert.pem in /etc/pki/qemu The QMP command shown here is a NBD client connection failing. I can't see how the NBD server is configured, but if the server is attempting todo client certificate validtion, then it will fail, because there's no client-cert.pem for the client to send. This is a plausible reason why you'd see this error message on the client, as the server will ungracefully drop the connection after the TLS handshake when it finds no client cert was sent . Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483 |
Created attachment 1770070 [details] Source Migration libvirtd logs with debug level enabled