Description of problem: This is a DPDK environment. When attempting to migrate an instance (using shared block storage) between compute nodes the following error are logged: /var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.752 2 ERROR nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Live Migration failure: operation failed: domain is no longer running: libvirt.libvirtError: operation failed: domain is no longer running /var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.753 2 DEBUG nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Migration operation thread notification thread_finished /usr/lib/python3.9/site-packages/nova/virt/lib virt/driver.py:9958 /var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.914 2 DEBUG nova.virt.libvirt.migration [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] VM running on src, migration failed _log /usr/lib/python3.9/site-packages/nova/virt/libvirt/migrati on.py:432 /var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.914 2 DEBUG nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python3.9/site-packages/nova/virt/li bvirt/driver.py:9772 /var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.915 2 ERROR nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Migration operation has aborted Version-Release number of selected component (if applicable): Compose: RHOS-17.0-RHEL-9-20220811.n.0 python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch python3-nova-23.2.2-0.20220720130412.7074ac0.el9ost.noarch openstack-nova-common-23.2.2-0.20220720130412.7074ac0.el9ost.noarch openstack-nova-compute-23.2.2-0.20220720130412.7074ac0.el9ost.noarch openstack-nova-migration-23.2.2-0.20220720130412.7074ac0.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy environment (with DPDK enabled). 2. Spawn virtual machine. 3. Attempt to migrate instance. Actual results: Live migration fails. Expected results: Live migration succeeds. Additional info: Will share sosreports.
from reviewing the logs there failure is definitely within qemu/libvirt and not within nova. from the qemu instance logs on the source i see 022-08-17 05:52:52.492+0000: initiating migration 2022-08-17T05:52:52.505293Z qemu-kvm: Unable to write to socket: Bad file descriptor 2022-08-17T05:54:21.120530Z qemu-kvm: terminating on signal 15 from pid 25107 (/usr/sbin/virtqemud) 2022-08-17 05:54:21.320+0000: shutting down, reason=destroyed and on the destination we see 2022-08-17T05:52:49.353093Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhu4931fbc4-5e,server=on: info: QEMU waiting for connection on: disconnected:unix:/var/lib/vhost_sockets/vhu4931fbc4-5e,server=on char device redirected to /dev/pts/0 (label charserial0) 2022-08-17T05:52:52.499931Z qemu-kvm: vhost_user_postcopy_advise: Failed to get ufd 2022-08-17T05:52:52.500398Z qemu-kvm: load of migration failed: Operation not permitted 2022-08-17 05:52:52.706+0000: shutting down, reason=crashed so this looks like even though we have now enabled vm.unprivileged_userfaultfd the destination host is getting a permission error https://bugzilla.redhat.com/show_bug.cgi?id=1945420 ~/Downloads/bugs/bz-2118908/sosreport-computeovsdpdksriov-0-2022-08-17-hvhcncm [12:26:34]➜ cat proc/sys/vm/unprivileged_userfaultfd 1 ~/Downloads/bugs/bz-2118908/sosreport-computeovsdpdksriov-1-2022-08-17-drkmsfr [12:25:39]➜ cat proc/sys/vm/unprivileged_userfaultfd 1 i can confirm that it is enable correctly on both hosts. we also enable postcopy in ovs https://bugzilla.redhat.com/show_bug.cgi?id=1986567 that is indeed enabled [12:28:44]➜ cat ovs-vsctl_-t_5_list_Open_vSwitch _uuid : 9a867504-61fa-4cd5-bc5f-a894b0742f40 bridges : [8342be05-c132-43e6-a3d9-f07821204d5e, 91bff8f4-a4ea-4f90-8c29-ffc04f36569b, 9a2ba1b0-584d-4f46-af88-03d0eab38705, eebb5203-5d44-43ba-b2e8-871ebef1b180] cur_cfg : 173 datapath_types : [netdev, system] datapaths : {netdev=dea29e9c-36c0-422b-9bf4-d7cb75b87a9c} db_version : "8.3.0" dpdk_initialized : true dpdk_version : "DPDK 21.11.0" external_ids : {hostname=computeovsdpdksriov-0.localdomain, ovn-bridge=br-int, ovn-bridge-datapath-type=netdev, ovn-bridge-mappings="dpdk-mgmt:br-link0,dpdk-data0:br-dpdk0,dpdk-data1:br-dpdk1", ovn-chassis-mac-mappings="dpdk-data0:fa:16:3e:71:0a:43,dpdk-data1:fa:16:3e:a3:8f:eb,dpdk-mgmt:fa:16:3e:31:d0:1a", ovn-encap-ip="10.10.121.135", ovn-encap-type=geneve, ovn-match-northd-version="true", ovn-monitor-all="true", ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.120.122:6642,tcp:10.10.120.133:6642,tcp:10.10.120.142:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="1234734c-826a-47ab-bb43-3eb54c04d7c3"} iface_types : [bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan] manager_options : [1d0a5ebf-090a-45b3-9c8e-2b13d4a53c7c] next_cfg : 173 other_config : {dpdk-extra=" -n 4", dpdk-init="true", dpdk-lcore-mask="300003", dpdk-socket-mem="4096,1024", pmd-cpu-mask=c, vhost-postcopy-support="true", vlan-limit="0"} ovs_version : "2.17.3" ssl : [] statistics : {} system_type : rhel system_version : "9.0" bugs/bz-2118908/sosreport-computeovsdpdksriov-1-2022-08-17-drkmsfr/sos_commands/openvswitch [12:30:01]➜ cat ovs-vsctl_-t_5_list_Open_vSwitch _uuid : 6bacfbb2-cc8d-456f-879b-77efcce8ef55 bridges : [3a061001-e15b-4a2c-a9b7-681bf4733905, 3c04f9a7-8067-4645-877d-76465e43cefc, bba16127-3b94-496d-a6c5-84fc667a1290, e7f1dcad-65bd-41ab-a4ab-7493ed394a99] cur_cfg : 126 datapath_types : [netdev, system] datapaths : {netdev=93057a4c-6474-4ed0-b4a4-e9a556a11e65} db_version : "8.3.0" dpdk_initialized : true dpdk_version : "DPDK 21.11.0" external_ids : {hostname=computeovsdpdksriov-1.localdomain, ovn-bridge=br-int, ovn-bridge-datapath-type=netdev, ovn-bridge-mappings="dpdk-mgmt:br-link0,dpdk-data0:br-dpdk0,dpdk-data1:br-dpdk1", ovn-chassis-mac-mappings="dpdk-data0:fa:16:3e:db:75:08,dpdk-data1:fa:16:3e:a2:4c:f8,dpdk-mgmt:fa:16:3e:de:99:23", ovn-encap-ip="10.10.121.101", ovn-encap-type=geneve, ovn-match-northd-version="true", ovn-monitor-all="true", ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.120.122:6642,tcp:10.10.120.133:6642,tcp:10.10.120.142:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="dda99cc3-35c1-4e16-be41-ea60cb3f0189"} iface_types : [bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan] manager_options : [03f63b03-c3ec-4e37-84a5-dad7a83d777e] next_cfg : 126 other_config : {dpdk-extra=" -n 4", dpdk-init="true", dpdk-lcore-mask="300003", dpdk-socket-mem="4096,1024", pmd-cpu-mask=c, vhost-postcopy-support="true", vlan-limit="0"} ovs_version : "2.17.3" ssl : [] statistics : {} system_type : rhel system_version : "9.0" this looks to me like an selinux issue type=AVC msg=audit(1660677036.342:18684): avc: denied { read write } for pid=102328 comm="qemu-kvm" path="anon_inode:[userfaultfd]" dev="anon_inodefs" ino=4815860 scontext=system_u:system_r:svirt_t:s0:c375,c668 tcontext=system_u:object_r:openvswitch_t:s0 tclass=anon_inode permissive=0 so I'm going to update the component i do not think this rises to the level of a blocker for the 17.0 ga release as the simple workaourd is to disable post copy live migration via the THT paramtmert so we can recorded this as a know issue with dpdk and fix it in the openstack_selinux package. i will however request this as an exception as if it can be fixed quickly we should. i don't think this is something the compute dfg is really able to help with as we do not have the ability to deploy ovs-dpdk in general so changing the DFG to NFV.
can we retest with selinux disable to verify 100% this is selinux issue
Once it's confirmed if disabling SELinux resolves the problem, permissive audit logs to check the extent of the denials would also be helpful.
Instead of "disabling", please just pass it to "permissive" - that way we'll get logs directly :).
Hi all, I will redeploy an environment and will test with SELinux set to permissive.
Hi, This is indeed an SELinux issue, setting `permissive` on compute nodes resolved this and migration is working. Thanks a lot for looking into this.
Could you attach the permissive audit logs from around the time the issue was reproduced? Thank you.
Thank you for the logs! Looking at audit2allow, there are two types of denials in the logs: allow init_t container_ro_file_t:filesystem remount; allow svirt_t openvswitch_t:anon_inode { read write }; However, the init_t one is unrelated to the current issue, it looks to be due to attempting to write coredumps on a read-only container filesystem: type=AVC msg=audit(1661250704.013:38889): avc: denied { remount } for pid=319633 comm="(coredump)" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=filesystem permissive=1 So we can ignore it here. The main issue is the following denial, as previously reported, and it doesn't seem to hide any additional denials: type=AVC msg=audit(1661252189.728:38998): avc: denied { read write } for pid=328264 comm="qemu-kvm" path="anon_inode:[userfaultfd]" dev="anon_inodefs" ino=11082987 scontext=system_u:system_r:svirt_t:s0:c277,c570 tcontext=system_u:object_r:openvswitch_t:s0 tclass=anon_inode permissive=1 Interestingly, it seems to actually be allowed on my Fedora system (libselinux-3.3-4, selinux-policy-36.14-1): #!!!! This avc is allowed in the current policy allow svirt_t openvswitch_t:anon_inode { read write }; It's allowed thanks to this rule: $ sesearch -A -s svirt_t -t openvswitch_t -c anon_inode allow domain domain:anon_inode { create getattr ioctl map read write }; Added in https://github.com/fedora-selinux/selinux-policy/commit/8a1746 (userfaultfd_anon_inode_perms: https://github.com/fedora-selinux/selinux-policy/blob/8a1746/policy/support/obj_perm_sets.spt#L283) Because of this, I don't think we need to hide the new rule behind a boolean because it will come in via the main policy given time anyway. That main policy rule is way too broad for us to carry in openstack-selinux, however I think it's fine for us to add a "allow svirt_t openvswitch_t:anon_inode { read write };" rule to resolve the current issue. I will prepare a patch.
The patch works fine on 9, however the new policy fails to build on rhel8 as the anon_inode class was added recently in 34.22 ( https://github.com/fedora-selinux/selinux-policy/commit/86327c ) and doesn't exist on 8. Looking into the current options. We may be able to propose the patch downstream to unblock this while figuring out a better long-term solution for the upstream repo separately. os-ovs.te:134:ERROR 'unknown class anon_inode' at token ';' on line 4523: allow svirt_t openvswitch_t:anon_inode { read write };
Still working on a solution for the single branch build to avoid breaking on rhel8, however I was wrong in comment 13, the domain rule is also present on the 9 system just works differently. On Fedora there is only one rule for anon_inode classes: $ sesearch -A -c anon_inode allow domain domain:anon_inode { create getattr ioctl map read write }; While on 9 they are more specific: [...] allow openvswitch_t openvswitch_t:anon_inode { create getattr ioctl read write }; [...] allow svirt_t svirt_t:anon_inode { create getattr ioctl read write }; [...] So if we want svirt_t to work with openvswitch anon inodes, we'll have to keep the rule in our package which isn't unusual looking at os-ovs.te.
Vadim, would you be able to verify that this works in Enforcing mode now when using the new package? Thank you.
Hi Julie, I can confirm that on compose `RHOS-17.0-RHEL-9-20220825.n.1` live migration indeed works with SELinux enabled. Marking this as `VERIFIED`.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2022:6543