Bug 2118908 - [RHOSP17.0] Live Migration Fails With Live Migration failure: operation failed: domain is no longer running: libvirt.libvirtError: operation
Summary: [RHOSP17.0] Live Migration Fails With Live Migration failure: operation fail...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-selinux
Version: 17.0 (Wallaby)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ga
: 17.0
Assignee: Julie Pichon
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks: 2121117
TreeView+ depends on / blocked
 
Reported: 2022-08-17 06:03 UTC by Vadim Khitrin
Modified: 2022-09-21 12:25 UTC (History)
18 users (show)

Fixed In Version: openstack-selinux-0.8.34-0.20220711150342.a82a63a.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2121117 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:24:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker NFV-2605 0 None None None 2022-08-18 11:42:43 UTC
Red Hat Issue Tracker OSP-18225 0 None None None 2022-08-17 06:17:03 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:25:01 UTC

Description Vadim Khitrin 2022-08-17 06:03:12 UTC
Description of problem:
This is a DPDK environment.
When attempting to migrate an instance (using shared block storage) between compute nodes the following error are logged:
/var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.752 2 ERROR nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Live Migration failure: operation failed: domain is no longer running: libvirt.libvirtError: operation
 failed: domain is no longer running
/var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.753 2 DEBUG nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Migration operation thread notification thread_finished /usr/lib/python3.9/site-packages/nova/virt/lib
virt/driver.py:9958
/var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.914 2 DEBUG nova.virt.libvirt.migration [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] VM running on src, migration failed _log /usr/lib/python3.9/site-packages/nova/virt/libvirt/migrati
on.py:432
/var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.914 2 DEBUG nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Fixed incorrect job type to be 4 _live_migration_monitor /usr/lib/python3.9/site-packages/nova/virt/li
bvirt/driver.py:9772
/var/log/containers/nova/nova-compute.log:2022-08-17 05:52:52.915 2 ERROR nova.virt.libvirt.driver [-] [instance: 59073e3a-6fe9-41a6-98a7-a11c68ebced6] Migration operation has aborted

Version-Release number of selected component (if applicable):
Compose: RHOS-17.0-RHEL-9-20220811.n.0
python3-novaclient-17.4.0-0.20210812172018.54d4da1.el9ost.noarch
python3-nova-23.2.2-0.20220720130412.7074ac0.el9ost.noarch
openstack-nova-common-23.2.2-0.20220720130412.7074ac0.el9ost.noarch
openstack-nova-compute-23.2.2-0.20220720130412.7074ac0.el9ost.noarch
openstack-nova-migration-23.2.2-0.20220720130412.7074ac0.el9ost.noarch

How reproducible: Always


Steps to Reproduce:
1. Deploy environment (with DPDK enabled).
2. Spawn virtual machine.
3. Attempt to migrate instance.

Actual results:
Live migration fails.

Expected results:
Live migration succeeds.

Additional info:
Will share sosreports.

Comment 2 smooney 2022-08-18 11:38:15 UTC
from reviewing the logs there failure is definitely within qemu/libvirt and not within nova.

from the qemu instance logs on the source i see

022-08-17 05:52:52.492+0000: initiating migration
2022-08-17T05:52:52.505293Z qemu-kvm: Unable to write to socket: Bad file descriptor
2022-08-17T05:54:21.120530Z qemu-kvm: terminating on signal 15 from pid 25107 (/usr/sbin/virtqemud)
2022-08-17 05:54:21.320+0000: shutting down, reason=destroyed


and on the destination we see

2022-08-17T05:52:49.353093Z qemu-kvm: -chardev socket,id=charnet0,path=/var/lib/vhost_sockets/vhu4931fbc4-5e,server=on: info: QEMU waiting for connection on: disconnected:unix:/var/lib/vhost_sockets/vhu4931fbc4-5e,server=on
char device redirected to /dev/pts/0 (label charserial0)
2022-08-17T05:52:52.499931Z qemu-kvm: vhost_user_postcopy_advise: Failed to get ufd
2022-08-17T05:52:52.500398Z qemu-kvm: load of migration failed: Operation not permitted
2022-08-17 05:52:52.706+0000: shutting down, reason=crashed


so this looks like even though we have now enabled vm.unprivileged_userfaultfd  the destination host is getting a permission error https://bugzilla.redhat.com/show_bug.cgi?id=1945420

~/Downloads/bugs/bz-2118908/sosreport-computeovsdpdksriov-0-2022-08-17-hvhcncm 
[12:26:34]➜ cat proc/sys/vm/unprivileged_userfaultfd 
1

~/Downloads/bugs/bz-2118908/sosreport-computeovsdpdksriov-1-2022-08-17-drkmsfr 
[12:25:39]➜ cat proc/sys/vm/unprivileged_userfaultfd 
1

i can confirm that it is enable correctly on both hosts.

we also enable postcopy in ovs https://bugzilla.redhat.com/show_bug.cgi?id=1986567

that is indeed enabled

[12:28:44]➜ cat ovs-vsctl_-t_5_list_Open_vSwitch
_uuid               : 9a867504-61fa-4cd5-bc5f-a894b0742f40
bridges             : [8342be05-c132-43e6-a3d9-f07821204d5e, 91bff8f4-a4ea-4f90-8c29-ffc04f36569b, 9a2ba1b0-584d-4f46-af88-03d0eab38705, eebb5203-5d44-43ba-b2e8-871ebef1b180]
cur_cfg             : 173
datapath_types      : [netdev, system]
datapaths           : {netdev=dea29e9c-36c0-422b-9bf4-d7cb75b87a9c}
db_version          : "8.3.0"
dpdk_initialized    : true
dpdk_version        : "DPDK 21.11.0"
external_ids        : {hostname=computeovsdpdksriov-0.localdomain, ovn-bridge=br-int, ovn-bridge-datapath-type=netdev, ovn-bridge-mappings="dpdk-mgmt:br-link0,dpdk-data0:br-dpdk0,dpdk-data1:br-dpdk1", ovn-chassis-mac-mappings="dpdk-data0:fa:16:3e:71:0a:43,dpdk-data1:fa:16:3e:a3:8f:eb,dpdk-mgmt:fa:16:3e:31:d0:1a", ovn-encap-ip="10.10.121.135", ovn-encap-type=geneve, ovn-match-northd-version="true", ovn-monitor-all="true", ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.120.122:6642,tcp:10.10.120.133:6642,tcp:10.10.120.142:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="1234734c-826a-47ab-bb43-3eb54c04d7c3"}
iface_types         : [bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : [1d0a5ebf-090a-45b3-9c8e-2b13d4a53c7c]
next_cfg            : 173
other_config        : {dpdk-extra=" -n 4", dpdk-init="true", dpdk-lcore-mask="300003", dpdk-socket-mem="4096,1024", pmd-cpu-mask=c, vhost-postcopy-support="true", vlan-limit="0"}
ovs_version         : "2.17.3"
ssl                 : []
statistics          : {}
system_type         : rhel
system_version      : "9.0"

bugs/bz-2118908/sosreport-computeovsdpdksriov-1-2022-08-17-drkmsfr/sos_commands/openvswitch 
[12:30:01]➜ cat ovs-vsctl_-t_5_list_Open_vSwitch
_uuid               : 6bacfbb2-cc8d-456f-879b-77efcce8ef55
bridges             : [3a061001-e15b-4a2c-a9b7-681bf4733905, 3c04f9a7-8067-4645-877d-76465e43cefc, bba16127-3b94-496d-a6c5-84fc667a1290, e7f1dcad-65bd-41ab-a4ab-7493ed394a99]
cur_cfg             : 126
datapath_types      : [netdev, system]
datapaths           : {netdev=93057a4c-6474-4ed0-b4a4-e9a556a11e65}
db_version          : "8.3.0"
dpdk_initialized    : true
dpdk_version        : "DPDK 21.11.0"
external_ids        : {hostname=computeovsdpdksriov-1.localdomain, ovn-bridge=br-int, ovn-bridge-datapath-type=netdev, ovn-bridge-mappings="dpdk-mgmt:br-link0,dpdk-data0:br-dpdk0,dpdk-data1:br-dpdk1", ovn-chassis-mac-mappings="dpdk-data0:fa:16:3e:db:75:08,dpdk-data1:fa:16:3e:a2:4c:f8,dpdk-mgmt:fa:16:3e:de:99:23", ovn-encap-ip="10.10.121.101", ovn-encap-type=geneve, ovn-match-northd-version="true", ovn-monitor-all="true", ovn-openflow-probe-interval="60", ovn-remote="tcp:10.10.120.122:6642,tcp:10.10.120.133:6642,tcp:10.10.120.142:6642", ovn-remote-probe-interval="60000", rundir="/var/run/openvswitch", system-id="dda99cc3-35c1-4e16-be41-ea60cb3f0189"}
iface_types         : [bareudp, dpdk, dpdkvhostuser, dpdkvhostuserclient, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : [03f63b03-c3ec-4e37-84a5-dad7a83d777e]
next_cfg            : 126
other_config        : {dpdk-extra=" -n 4", dpdk-init="true", dpdk-lcore-mask="300003", dpdk-socket-mem="4096,1024", pmd-cpu-mask=c, vhost-postcopy-support="true", vlan-limit="0"}
ovs_version         : "2.17.3"
ssl                 : []
statistics          : {}
system_type         : rhel
system_version      : "9.0"



this looks to me like an selinux issue

type=AVC msg=audit(1660677036.342:18684): avc:  denied  { read write } for  pid=102328 comm="qemu-kvm" path="anon_inode:[userfaultfd]" dev="anon_inodefs" ino=4815860 scontext=system_u:system_r:svirt_t:s0:c375,c668 tcontext=system_u:object_r:openvswitch_t:s0 tclass=anon_inode permissive=0 


so I'm going to update the component

i do not think this rises to the level of a blocker for the 17.0 ga release 

as the simple workaourd is to disable post copy live migration via the THT paramtmert
so we can recorded this as a know issue with dpdk and fix it in the openstack_selinux package. 

i will however request this as an exception as if it can be fixed quickly we should.

i don't think this is something the compute dfg is really able to help with as we do not have the ability to deploy ovs-dpdk
in general so changing the DFG to NFV.

Comment 4 nlevinki 2022-08-18 12:16:50 UTC
can we retest with selinux disable to verify 100% this is selinux issue

Comment 5 Julie Pichon 2022-08-18 12:45:12 UTC
Once it's confirmed if disabling SELinux resolves the problem, permissive audit logs to check the extent of the denials would also be helpful.

Comment 6 Cédric Jeanneret 2022-08-18 13:06:41 UTC
Instead of "disabling", please just pass it to "permissive" - that way we'll get logs directly :).

Comment 7 Vadim Khitrin 2022-08-21 04:07:08 UTC
Hi all,

I will redeploy an environment and will test with SELinux set to permissive.

Comment 8 Vadim Khitrin 2022-08-22 20:55:59 UTC
Hi,

This is indeed an SELinux issue, setting `permissive` on compute nodes resolved this and migration is working.
Thanks a lot for looking into this.

Comment 9 Julie Pichon 2022-08-23 08:24:39 UTC
Could you attach the permissive audit logs from around the time the issue was reproduced? Thank you.

Comment 13 Julie Pichon 2022-08-23 12:58:45 UTC
Thank you for the logs!

Looking at audit2allow, there are two types of denials in the logs:

allow init_t container_ro_file_t:filesystem remount;
allow svirt_t openvswitch_t:anon_inode { read write };

However, the init_t one is unrelated to the current issue, it looks to be due to attempting to write coredumps on a read-only container filesystem:

type=AVC msg=audit(1661250704.013:38889): avc:  denied  { remount } for  pid=319633 comm="(coredump)" scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:container_ro_file_t:s0 tclass=filesystem permissive=1

So we can ignore it here.

The main issue is the following denial, as previously reported, and it doesn't seem to hide any additional denials:

type=AVC msg=audit(1661252189.728:38998): avc:  denied  { read write } for  pid=328264 comm="qemu-kvm" path="anon_inode:[userfaultfd]" dev="anon_inodefs" ino=11082987 scontext=system_u:system_r:svirt_t:s0:c277,c570 tcontext=system_u:object_r:openvswitch_t:s0 tclass=anon_inode permissive=1

Interestingly, it seems to actually be allowed on my Fedora system (libselinux-3.3-4, selinux-policy-36.14-1):

#!!!! This avc is allowed in the current policy
allow svirt_t openvswitch_t:anon_inode { read write };

It's allowed thanks to this rule:

$ sesearch -A -s svirt_t -t openvswitch_t -c anon_inode 
allow domain domain:anon_inode { create getattr ioctl map read write };

Added in https://github.com/fedora-selinux/selinux-policy/commit/8a1746 (userfaultfd_anon_inode_perms: https://github.com/fedora-selinux/selinux-policy/blob/8a1746/policy/support/obj_perm_sets.spt#L283)

Because of this, I don't think we need to hide the new rule behind a boolean because it will come in via the main policy given time anyway. That main policy rule is way too broad for us to carry in openstack-selinux, however I think it's fine for us to add a "allow svirt_t openvswitch_t:anon_inode { read write };" rule to resolve the current issue. I will prepare a patch.

Comment 14 Julie Pichon 2022-08-23 13:30:02 UTC
The patch works fine on 9, however the new policy fails to build on rhel8 as the anon_inode class was added recently in 34.22 ( https://github.com/fedora-selinux/selinux-policy/commit/86327c ) and doesn't exist on 8. Looking into the current options. We may be able to propose the patch downstream to unblock this while figuring out a better long-term solution for the upstream repo separately.

os-ovs.te:134:ERROR 'unknown class anon_inode' at token ';' on line 4523:
allow svirt_t openvswitch_t:anon_inode { read write };

Comment 15 Julie Pichon 2022-08-23 15:55:23 UTC
Still working on a solution for the single branch build to avoid breaking on rhel8, however I was wrong in comment 13, the domain rule is also present on the 9 system just works differently. On Fedora there is only one rule for anon_inode classes:

$ sesearch -A -c anon_inode
allow domain domain:anon_inode { create getattr ioctl map read write };

While on 9 they are more specific:
[...]
allow openvswitch_t openvswitch_t:anon_inode { create getattr ioctl read write };
[...]
allow svirt_t svirt_t:anon_inode { create getattr ioctl read write };
[...]

So if we want svirt_t to work with openvswitch anon inodes, we'll have to keep the rule in our package which isn't unusual looking at os-ovs.te.

Comment 25 Julie Pichon 2022-08-26 12:44:04 UTC
Vadim, would you be able to verify that this works in Enforcing mode now when using the new package? Thank you.

Comment 26 Vadim Khitrin 2022-08-30 15:11:41 UTC
Hi Julie,

I can confirm that on compose `RHOS-17.0-RHEL-9-20220825.n.1` live migration indeed works with SELinux enabled.
Marking this as `VERIFIED`.

Comment 30 errata-xmlrpc 2022-09-21 12:24:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.