Description of problem: vms fails to boot with: 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [req-3ce380fc-fe8d-45fc-8744-a7b2b225a55e a2b591bd57684f4d83120e71d8894249 7b80a25744704e20bc15d55e762706b0 - default default] [instance: 4304c399-2309-4b59-88fe-e0265763591e] Failed to build and run instance: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2022-04-27T10:32:48.596278Z qemu-kvm: -object tls-creds-x509,id=vnc-tls-creds0,dir=/etc/pki/libvirt-vnc,endpoint=server,verify-peer=yes: Cannot load certificate '/etc/pki/libvirt-vnc/server-cert.pem' & key '/etc/pki/libvirt-vnc/server-key.pem': Error while reading file. 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] Traceback (most recent call last): 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2485, in _build_and_run_instance 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] block_device_info=block_device_info) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3780, in spawn 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] cleanup_instance_disks=created_disks) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6683, in _create_domain_and_network 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] cleanup_instance_disks=cleanup_instance_disks) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] self.force_reraise() 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] six.reraise(self.type_, self.value, self.tb) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] raise value 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6649, in _create_domain_and_network 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] post_xml_callback=post_xml_callback) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6578, in _create_domain 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] guest.launch(pause=pause) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 149, in launch 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] self._encoded_xml, errors='ignore') 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] self.force_reraise() 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] six.reraise(self.type_, self.value, self.tb) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] raise value 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 144, in launch 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] return self._domain.createWithFlags(flags) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] result = proxy_call(self._autowrap, f, *args, **kwargs) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] rv = execute(f, *args, **kwargs) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] six.reraise(c, e, tb) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] raise value 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] rv = meth(*args, **kwargs) 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1385, in createWithFlags 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] raise libvirtError('virDomainCreateWithFlags() failed') 2022-04-27 10:32:51.600 7 ERROR nova.compute.manager [instance: 4304c399-2309-4b59-88fe-e0265763591e] libvirt.libvirtError: internal error: process exited while connecting to monitor: 2022-04-27T10:32:48.596278Z qemu-kvm: -object tls-creds-x509,id=vnc-tls-creds0,dir=/etc/pki/libvirt-vnc,endpoint=server,verify-peer=yes: Cannot load certificate '/etc/pki/libvirt-vnc/server-cert.pem' & key '/etc/pki/libvirt-vnc/server-key.pem': Error while reading file. /etc/pki/libvirt-vnc/server-key.pem on the host is 0600 root:root. Most likely it should be 0640 root:qemu or 0600 qemu:qemu. NOTE: libvirtd running as uid=0 in the container, but looks like libvirtd has child whit effective uid/guid qemu:qemu . Version-Release number of selected component (if applicable): 16.2_20220418.1 openstack-tripleo-heat-templates-11.6.1-2.20220409014848.7c89b16.el8ost.noarch How reproducible: always similar issue form the past: https://bugzilla.redhat.com/show_bug.cgi?id=1917443 We have 107 in the /etc/passwd now.
On the testing env that James had prepared yesterday and ran tempest tests there to see how the proposed fix behaves, there were no signs of Cannot load certificate '/etc/pki/libvirt-vnc/server-cert.pem'. But the similar errors for qemu certs now (for both computes): [heat-admin@compute-1 ~]$ sudo grep -rI 'Cannot load certificate' /var/log/containers/ /var/log/containers/libvirt/libvirtd.log.1:2022-05-04 18:08:06.381+0000: 28953: error : virNetClientProgramDispatchError:172 : internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. /var/log/containers/libvirt/libvirtd.log.1:2022-05-04 18:08:42.907+0000: 28951: error : virNetClientProgramDispatchError:172 : internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. /var/log/containers/stdouts/nova_compute.log.1:2022-05-04T18:08:06.411077834+00:00 stderr F libvirt.libvirtError: internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. /var/log/containers/stdouts/nova_compute.log.1:2022-05-04T18:08:42.918997327+00:00 stderr F libvirt.libvirtError: internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. [heat-admin@compute-0 ~]$ sudo grep -rI 'Cannot load certificate' /var/log/containers/ /var/log/containers/libvirt/libvirtd.log.1:2022-05-04 18:08:05.947+0000: 29043: error : qemuMonitorJSONCheckErrorFull:418 : internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. /var/log/containers/libvirt/libvirtd.log.1:2022-05-04 18:08:42.644+0000: 29043: error : qemuMonitorJSONCheckErrorFull:418 : internal error: unable to execute QEMU command 'object-add': Cannot load certificate '/etc/pki/qemu/server-cert.pem' & key '/etc/pki/qemu/server-key.pem': Error while reading file. So I presume the fix might need to be extended to the qemu case as well. And *maybe* for other services touched by https://review.opendev.org/c/openstack/puppet-tripleo/+/822244 ?
Testing is complete, the fix should work. Let's have it merged upstream first. If I break permissions for the key file and rerun tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops - it fails as reported. If I manually apply puppet with the fixed manifest, tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops passes. I think we can consider it as PASSing.
*** Bug 2083485 has been marked as a duplicate of this bug. ***
*** Bug 2093108 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.3 (Train)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:4793