Description of problem: When attempting to spawn an instance with DPDK interface attached, the creation fails and the following error appears in nova-compute.log on the compute node: 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [req-99508106-719a-4ca0-b23b-190a9b66c8a1 604e614dde284a959f6834a07cf15040 a6e139dcad2943a38a178635c74abd45 - default default] [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] Instance failed to spawn: libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-07-16T16:19:32.662500Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhuca5e1e91-5e,server: Failed to bind socket to /var/lib/vhost_sockets/vhuca5e1e91-5e: Permission denied 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] Traceback (most recent call last): 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2663, in _build_resources 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] yield resources 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] block_device_info=block_device_info) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3647, in spawn 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] cleanup_instance_disks=created_disks) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6473, in _create_domain_and_network 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] cleanup_instance_disks=cleanup_instance_disks) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] self.force_reraise() 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] six.reraise(self.type_, self.value, self.tb) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] raise value 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6439, in _create_domain_and_network 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] post_xml_callback=post_xml_callback) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 6368, in _create_domain 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] guest.launch(pause=pause) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 143, in launch 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] self._encoded_xml, errors='ignore') 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] self.force_reraise() 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] six.reraise(self.type_, self.value, self.tb) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] raise value 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 138, in launch 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] return self._domain.createWithFlags(flags) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] result = proxy_call(self._autowrap, f, *args, **kwargs) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] rv = execute(f, *args, **kwargs) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] six.reraise(c, e, tb) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] raise value 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] rv = meth(*args, **kwargs) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 1265, in createWithFlags 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-07-16T16:19:32.662500Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhuca5e1e91-5e,server: Failed to bind socket to /var/lib/vhost_sockets/vhuca5e1e91-5e: Permission denied 2020-07-16 16:19:33.700 7 ERROR nova.compute.manager [instance: a3fafcd5-99a2-4f8a-b6b7-768c4631411d] This is a regression and was working on previous 16.1 composes. Version-Release number of selected component (if applicable): Compose: RHOS-16.1-RHEL-8-20200714.n.0 network-scripts-openvswitch2.13-2.13.0-25.el8fdp.1.x86_64 openvswitch-selinux-extra-policy-1.0-22.el8fdp.noarch openvswitch2.13-2.13.0-25.el8fdp.1.x86_64 rhosp-openvswitch-2.13-8.el8ost.noarch dpdk-19.11-4.el8.x86_64 puppet-nova-15.5.1-0.20200608173427.c15e37c.el8ost.noarch python3-novaclient-15.1.0-0.20200310162625.cd396b8.el8ost.noarch puppet-neutron-15.5.1-0.20200514103419.0a45ec7.el8ost.noarch python3-neutronclient-6.14.0-0.20200310192910.115f60f.el8ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy an a cloud with DPDK capabilities 2. Attempt to spawn an instance Actual results: Creation of instance with DPDK ports is failing. Expected results: Instance are created successfully. Additional info: Will attach sos reports in comments.
Created attachment 1701677 [details] selinux_audit_1858553.log This is an SElinux issue. Once I set to Permissive mode, I was able to spawn an instance. Attaching audit.log from compute node, search for 'vhufb58f8d2-e0' to see the failing messages.
Sorry, in regards to comment #2, please grep for 'vhost' and not 'vhufb58f8d2-e0'
audit.log for denial: --------------------- type=AVC msg=audit(1595173011.584:8090): avc: denied { search } for pid=8069 comm="vhost_reconn" name="vhost_sockets" dev="sda2" ino=1342177472 scontext=system_u:system_r:openvswitch_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=dir permissive=0 openstack-selinux commit for this specific directory: ----------------------------------------------------- https://github.com/redhat-openstack/openstack-selinux/commit/b09ec67785aa3f3a8751b15ac695effa17151876 Expected tcontext - system_u:object_r:virt_cache_t:s0 Actual tcontenst - system_u:object_r:container_ro_file_t:s0 As per this code in THT, the tcontext should have "setype" as "virt_cache_t" which is expected value. But it is not applied or modified by some other operation to "container_ro_file_t", which is causing this issue. @Vadim, do you have the cluster to take a look? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - name: create directory for vhost-user sockets with qemu ownership file: path: /var/lib/vhost_sockets state: directory owner: qemu group: {get_attr: [RoleParametersValue, value, vhostuser_socket_group]} setype: virt_cache_t seuser: system_u ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using 'setype' in THT only applies the label once, something like a restorecon would revert it. Ideally this kind of labels should be applied with sefcontext [1] to make the changes persistent. There has been a number of recent changes around libvirt to resolve other issues (e.g. bug 1841822). However looking at the related patch it doesn't look like /var/lib/vhost_sockets was directly affected so I'm not sure why the label would be different now. Cedric, does anything ring a bell that'd cause that directory to go from virt_cache_t to container_ro_file_t? [1] https://docs.ansible.com/ansible/latest/modules/sefcontext_module.html
Note that the logs provided are enforcing. If we decide to go with adding the equivalent rules for openvswitch_t + container_ro_file_t rather than dropping :z, the issue should be reproduced in permissive mode first to ensure we have all the denials covered. Thanks! The commit linked to in comment 4 indicates there's probably at least one more...
(In reply to Cédric Jeanneret from comment #9) > @Saravanan, @Vadim: would you be able to deploy with a small edition in the > tripleo-heat-templates? > If so, please drop the ":z" here: > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/ > deployment/nova/nova-libvirt-container-puppet.yaml#L722 > Yes, it works. I have removed ":z" and restarted the container with restorecon and I am able to create the VM even in enforcing mode. Now, I am re-deploying the node by remove ":z" in the templates to confirm the fix. Will update once confirmed. Do you see any root cause of why this change is causing issue only on the recent puddles? Looking back at the history of adding ":z", it was added more than a year ago. I am wondering what triggered this issue with recent puddle.
(In reply to Saravanan KR from comment #11) > (In reply to Cédric Jeanneret from comment #9) > > > @Saravanan, @Vadim: would you be able to deploy with a small edition in the > > tripleo-heat-templates? > > If so, please drop the ":z" here: > > https://opendev.org/openstack/tripleo-heat-templates/src/branch/stable/train/ > > deployment/nova/nova-libvirt-container-puppet.yaml#L722 > > > > Yes, it works. I have removed ":z" and restarted the container with > restorecon and I am able to create the VM even in enforcing mode. > Now, I am re-deploying the node by remove ":z" in the templates to confirm > the fix. Will update once confirmed. That's a great news already - lemme know how it goes once you get the full deploy :). But I'm pretty sure it will be OK, seeing the usage of this location. Julie is preparing the policy in parallel, "just in case". The good news is, we probably won't need to block GA for that issue. Once I get your go, I can prepare the upstream patch and prepare the downstream backport in order to be ready once we get the accepted blocker flag. Cheers, C.
(In reply to Julie Pichon from comment #6) > Using 'setype' in THT only applies the label once, something like a > restorecon would revert it. Ideally this kind of labels should be applied > with sefcontext [1] to make the changes persistent. Scratch this comment, we are already setting it in policy: https://github.com/redhat-openstack/openstack-selinux/blob/master/local_settings.sh.in#L80
Saravanan has redeployed the deployment with ":z" removed in the tripleo templates. The deployment has passed successfully and we were able to spawn instances with DPDK ports attached while compute nodes are enforcing SELinux. Thanks everyone for the help!
Blocker requested in // - we should get things sorted out today hopefully. Upstream patch is already done, and there's an LP in order to make the backport easier upstream. Downstream backport will start as soon as we get the missing flags.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3148