Created attachment 1770070 [details] Source Migration libvirtd logs with debug level enabled
Created attachment 1770071 [details] Destination target node's libvirtd logs with debug level enabled
Something is off in the TLS setup. Can you please test with the following? Both on your compute nodes you are missing the below config attributes in your /etc/libvirt/qemu.conf: default_tls_x509_cert_dir = "/etc/pki/qemu" default_tls_x509_verify = 1 (See: https://docs.openstack.org/nova/latest/admin/secure-live-migration-with-qemu-native-tls.html) And this is what you have on your source and destination 'nova_libvirt' containers: [root@compute-1 /]# egrep -v '^$|^#' /etc/libvirt/qemu.conf max_files = 32768 max_processes = 131072 vnc_tls = 1 vnc_tls_x509_verify = 1 nbd_tls = 1 migration_port_min = 61152 migration_port_max = 61215 Note the missing default_tls_* attributes. - - - Also, running `virt-pki-validate` fails on both source and destination 'nova_libvirt' containers: [root@compute-0 qemu]# virt-pki-validate Found /usr/bin/certtool Found CA certificate /etc/pki/CA/cacert.pem for Certificate Authority The CA certificate and the client certificate do not match CA organization: Certificate Authority Client organization: REDHAT.LOCAL Found client certificate /etc/pki/libvirt/clientcert.pem for compute-0.ctlplane.redhat.local Found client private key /etc/pki/libvirt/private/clientkey.pem The client private key need to be read by client tools as root do: chmod 644 /etc/pki/libvirt/private/clientkey.pem The CA certificate and the server certificate do not match CA organization: Certificate Authority Server organization: REDHAT.LOCAL The server certificate does not seem to match the host name hostname: "compute-0.redhat.local" Server certificate CN: "compute-0.ctlplane.redhat.local" Found server certificate /etc/pki/libvirt/servercert.pem for compute-0.ctlplane.redhat.local Found server private key /etc/pki/libvirt/private/serverkey.pem Make sure /etc/sysconfig/libvirtd is setup to listen to TCP/IP connections and restart the libvirtd service Make sure /etc/sysconfig/iptables is setup to allow incoming TCP/IP connections on port 16514 and restart the iptables service [root@compute-0 qemu]# And similar the other compute.
@David: Please see my comment#6 — seems like TripleO is not setting the required config attribute: 'default_tls_x509_cert_dir'?
Dan, any thoughts on this 'blockdev-add' failure that seems to be coming from QEMU's I/O channels TLS driver? Context: The OpenStack test that is failing here is live-migrating an instance with a disk attached to it in a non-shared storage, and TLS (misconfigured?). So NBD is involved here. And here's how 'blockdev-add' is erroring out: ----------------------------------------------------------------------- ... 2021-04-07 23:43:51.427+0000: 23837: debug : qemuMonitorJSONCheckErrorFull:404 : unable to execute QEMU command {"execute":"blockdev-add","arguments":{"driver":"nbd","server":{"type":"inet","host":"compute-0.ctlplane.redhat.local","port":"61153"},"export":"drive-virtio-disk0","tls-creds":"objlibvirt_migrate_tls0","node-name":"migration-vda-storage","read-only":false,"discard":"unmap"},"id":"libvirt-388"}: {"id":"libvirt-388","error":{"class":"GenericError","desc":"Failed to read option reply: Cannot read from TLS channel: Software caused connection abort"}} 2021-04-07 23:43:51.427+0000: 23837: error : qemuMonitorJSONCheckErrorFull:418 : internal error: unable to execute QEMU command 'blockdev-add': Failed to read option reply: Cannot read from TLS channel: Software caused connection abort ... ----------------------------------------------------------------------- I see the "check TLS authorization" part from the QEMU I/O test 233.out matches the above signature: https://git.qemu.org/gitweb.cgi?p=qemu.git;a=blob;f=tests/qemu-iotests/233.out#l61 Basing on that, I'm deducing that TLS setup here in this QE env is broken. And the `virt-pki-validate` output in comment#6 seems to indicate that too. - - - Meanwhile, here are the config settings in qemu.conf on both source and destination compute nodes: $> egrep -v '^$|^#' /etc/libvirt/qemu.conf max_files = 32768 max_processes = 131072 vnc_tls = 1 vnc_tls_x509_verify = 1 nbd_tls = 1 migration_port_min = 61152 migration_port_max = 61215 # these two were added in a subsequent test; but even without these # the migration fails the same default_tls_x509_cert_dir = "/etc/pki/qemu" default_tls_x509_verify = 1 And libvirtd.conf from both source and destination: $> egrep -v '^$|^#' /etc/libvirt/libvirtd.conf listen_tls=1 listen_tcp=0 listen_addr="192.168.24.37" unix_sock_group="libvirt" unix_sock_ro_perms="0777" unix_sock_rw_perms="0770" auth_unix_ro="none" auth_unix_rw="none" auth_tls="sasl" tls_priority="NORMAL:-VERS-SSL3.0:-VERS-TLS-ALL:+VERS-TLS1.2"
Note virt-pki-validate is validating *libvirt's* TLS setup. The problem here is with *QEMU's* TLS setup - this is files in /etc/pki/qemu. There is only ca-cert.pem and server-cert.pem in /etc/pki/qemu The QMP command shown here is a NBD client connection failing. I can't see how the NBD server is configured, but if the server is attempting todo client certificate validtion, then it will fail, because there's no client-cert.pem for the client to send. This is a plausible reason why you'd see this error message on the client, as the server will ungracefully drop the connection after the TLS handshake when it finds no client cert was sent .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform (RHOSP) 16.2 enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2021:3483