Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1856461

Summary: [OSP16.1] After a complete OC reboot it is impossible to spawn new VMs (qemu-kvm permission denied)
Product: Red Hat OpenStack Reporter: Michele Baldessari <michele>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED DUPLICATE QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 16.1 (Train)CC: cjeanner, dasmith, eglynn, jhakimra, kchamart, sbauza, sgordon, vromanso
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-07-15 06:32:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michele Baldessari 2020-07-13 17:29:51 UTC
Description of problem:
compose: RHOS-16.1-RHEL-8-20200710.n.0

After rebooting the whole overcloud with infrared on a largish composable HA overcloud (3xctrl, 3xdb, 3xnetworker, 3xmessaging, 2xcomputes) we observe that spinning up a new VM is failing with:
nova-conductor.log:2020-07-13 11:25:56.482 22 ERROR nova.scheduler.utils [req-f0bc5b86-155e-4bb4-b4f6-a57525896804 e2a8ef60be7e41e7ad3bc071842df17d 66d436466a8043aa8aac6fcc85a35ca2 - default default] [instance: fd15ea72-95d9-4f62-b982-29dec5895750] Error from last host: compute-0.redhat.local (node compute-0.redhat.local): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n    block_device_info=block_device_info)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3647, in spawn\n    cleanup_instance_disks=created_disks)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6473, in _create_domain_and_network\n    cleanup_instance_disks=cleanup_instance_disks)\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6439, in _create_domain_and_network\n    post_xml_callback=post_xml_callback)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6368, in _create_domain\n    guest.launch(pause=pause)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 143, in launch\n    self._encoded_xml, errors=\'ignore\')\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 138, in launch\n    return self._domain.createWithFlags(flags)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit\n    result = proxy_call(self._autowrap, f, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call\n    rv = execute(f, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute\n    six.reraise(c, e, tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker\n    rv = meth(*args, **kwargs)\n', '  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1265, in createWithFlags\n    if ret == -1: raise libvirtError (\'virDomainCreateWithFlags() failed\', dom=self)\n', 'libvirt.libvirtError: internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', 'nova.exception.RescheduledException: Build of instance fd15ea72-95d9-4f62-b982-29dec5895750 was re-scheduled: internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\

And on compute-0 I see:
2020-07-13 17:14:07.630+0000: 3595: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2020-07-13 17:14:07.631+0000: 3595: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 17:14:07.631+0000: 3730: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied


It does seem to me that selinux is hating us:
[root@compute-0 libvirt]# perl -pe 's/(\d+)/localtime/e' /var/log/audit/audit.log|grep denied
type=AVC msg=audit(Mon Jul 13 17:23:51 2020.703:18435): avc:  denied  { entrypoint } for  pid=101351 comm="libvirtd" path="/usr/libexec/qemu-kvm" dev="overlay" ino=68136 scontext=system_u:system_r:svirt_t:s0:c174,c324 tcontext=system_u:object_r:container
_file_t:s0:c358,c374 tclass=file permissive=0
type=AVC msg=audit(Mon Jul 13 17:23:51 2020.337:28226): avc:  denied  { entrypoint } for  pid=145579 comm="libvirtd" path="/usr/libexec/qemu-kvm" dev="overlay" ino=68136 scontext=system_u:system_r:svirt_t:s0:c98,c1009 tcontext=system_u:object_r:container
_file_t:s0:c358,c374 tclass=file permissive=0


What I don't understand is that on my node, I do see the same permission denied in libvirt.log before the problem:
2020-07-13 11:25:54.181+0000: 57688: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 11:25:54.184+0000: 57695: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 12:10:25.318+0000: 3595: info : libvirt version: 6.0.0, package: 17.2.module+el8.2.0+6629+3fc0f2c2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-05-14-11:03:02, )
2020-07-13 12:10:25.318+0000: 3595: info : hostname: compute-0.redhat.local
2020-07-13 12:10:25.318+0000: 3595: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2020-07-13 12:10:25.321+0000: 3595: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 12:10:25.322+0000: 3727: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied

Yet at those times (11:24 and 12:10) tempest ran just fine *and* I can't find any corresponding selinux denied entries.

I'll leave a NEEDINFO on me while I narrow the scope of this issue. I raise it now mainly for awareness and to see if anyone else is experiencing this.