Bug 1856461 - [OSP16.1] After a complete OC reboot it is impossible to spawn new VMs (qemu-kvm permission denied)
Summary: [OSP16.1] After a complete OC reboot it is impossible to spawn new VMs (qemu-...
Keywords:
Status: CLOSED DUPLICATE of bug 1851843
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: OSP DFG:Compute
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-13 17:29 UTC by Michele Baldessari
Modified: 2023-03-21 19:36 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-07-15 06:32:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-23546 0 None None None 2023-03-21 19:36:11 UTC

Description Michele Baldessari 2020-07-13 17:29:51 UTC
Description of problem:
compose: RHOS-16.1-RHEL-8-20200710.n.0

After rebooting the whole overcloud with infrared on a largish composable HA overcloud (3xctrl, 3xdb, 3xnetworker, 3xmessaging, 2xcomputes) we observe that spinning up a new VM is failing with:
nova-conductor.log:2020-07-13 11:25:56.482 22 ERROR nova.scheduler.utils [req-f0bc5b86-155e-4bb4-b4f6-a57525896804 e2a8ef60be7e41e7ad3bc071842df17d 66d436466a8043aa8aac6fcc85a35ca2 - default default] [instance: fd15ea72-95d9-4f62-b982-29dec5895750] Error from last host: compute-0.redhat.local (node compute-0.redhat.local): ['Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance\n    block_device_info=block_device_info)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3647, in spawn\n    cleanup_instance_disks=created_disks)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6473, in _create_domain_and_network\n    cleanup_instance_disks=cleanup_instance_disks)\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6439, in _create_domain_and_network\n    post_xml_callback=post_xml_callback)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6368, in _create_domain\n    guest.launch(pause=pause)\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 143, in launch\n    self._encoded_xml, errors=\'ignore\')\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__\n    self.force_reraise()\n', '  File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 138, in launch\n    return self._domain.createWithFlags(flags)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit\n    result = proxy_call(self._autowrap, f, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call\n    rv = execute(f, *args, **kwargs)\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute\n    six.reraise(c, e, tb)\n', '  File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise\n    raise value\n', '  File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker\n    rv = meth(*args, **kwargs)\n', '  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1265, in createWithFlags\n    if ret == -1: raise libvirtError (\'virDomainCreateWithFlags() failed\', dom=self)\n', 'libvirt.libvirtError: internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\n', '\nDuring handling of the above exception, another exception occurred:\n\n', 'Traceback (most recent call last):\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance\n    filter_properties, request_spec)\n', '  File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2537, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', 'nova.exception.RescheduledException: Build of instance fd15ea72-95d9-4f62-b982-29dec5895750 was re-scheduled: internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied\

And on compute-0 I see:
2020-07-13 17:14:07.630+0000: 3595: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2020-07-13 17:14:07.631+0000: 3595: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 17:14:07.631+0000: 3730: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied


It does seem to me that selinux is hating us:
[root@compute-0 libvirt]# perl -pe 's/(\d+)/localtime/e' /var/log/audit/audit.log|grep denied
type=AVC msg=audit(Mon Jul 13 17:23:51 2020.703:18435): avc:  denied  { entrypoint } for  pid=101351 comm="libvirtd" path="/usr/libexec/qemu-kvm" dev="overlay" ino=68136 scontext=system_u:system_r:svirt_t:s0:c174,c324 tcontext=system_u:object_r:container
_file_t:s0:c358,c374 tclass=file permissive=0
type=AVC msg=audit(Mon Jul 13 17:23:51 2020.337:28226): avc:  denied  { entrypoint } for  pid=145579 comm="libvirtd" path="/usr/libexec/qemu-kvm" dev="overlay" ino=68136 scontext=system_u:system_r:svirt_t:s0:c98,c1009 tcontext=system_u:object_r:container
_file_t:s0:c358,c374 tclass=file permissive=0


What I don't understand is that on my node, I do see the same permission denied in libvirt.log before the problem:
2020-07-13 11:25:54.181+0000: 57688: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 11:25:54.184+0000: 57695: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 12:10:25.318+0000: 3595: info : libvirt version: 6.0.0, package: 17.2.module+el8.2.0+6629+3fc0f2c2 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-05-14-11:03:02, )
2020-07-13 12:10:25.318+0000: 3595: info : hostname: compute-0.redhat.local
2020-07-13 12:10:25.318+0000: 3595: error : qemuMonitorIORead:511 : Unable to read from monitor: Connection reset by peer
2020-07-13 12:10:25.321+0000: 3595: error : qemuProcessReportLogError:2064 : internal error: qemu unexpectedly closed the monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied
2020-07-13 12:10:25.322+0000: 3727: error : qemuProcessReportLogError:2064 : internal error: process exited while connecting to monitor: libvirt:  error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied

Yet at those times (11:24 and 12:10) tempest ran just fine *and* I can't find any corresponding selinux denied entries.

I'll leave a NEEDINFO on me while I narrow the scope of this issue. I raise it now mainly for awareness and to see if anyone else is experiencing this.


Note You need to log in before you can comment on or make changes to this bug.