Description of problem: VirtualDomain live migration does not work and always ends up with vm being rebooted. It seems to me that the qemu command for live migration fails and causes this: Oct 05 10:19:14 VirtualDomain(pool-10-37-167-241)[1104724]: INFO: pool-10-37-167-241: Starting live migration to picard-02 (using: virsh --connect=qemu:///system --quiet migrate --live pool-10-37-167-241 qemu+ssh://picard-02/system ). Oct 05 10:19:21 VirtualDomain(pool-10-37-167-241)[1104724]: ERROR: pool-10-37-167-241: live migration to picard-02 failed: 1 Oct 05 10:19:21 VirtualDomain(pool-10-37-167-241)[1104977]: INFO: Issuing graceful shutdown request for domain pool-10-37-167-241. Version-Release number of selected component (if applicable): pacemaker-2.1.4-5.el9.x86_64 resource-agents-4.10.0-23.el9.x86_64 libvirt-8.5.0-7.el9_1.x86_64 How reproducible: always Steps to Reproduce: 1. pcs resource move pool-10-37-167-241 2. 3. Actual results: offline migration Expected results: live migration Additional info: running "virsh --connect=qemu:///system --quiet migrate --live pool-10-37-167-241 qemu+ssh://picard-02/system" manually will just end up with error: [root@riker-01 ~]# virsh --connect=qemu:///system migrate --live pool-10-37-167-242 qemu+ssh://10.37.165.56/system --verbose Migration: [ 94 %]error: operation failed: job 'migration out' unexpectedly failed
I tried to reproduce this without libvirt, but it works: on riker: /usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}' on picard: /usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}' -incoming tcp::4444 let the VM boot on the source, then ctrl-a c (qemu) migrate_set_parameter max-bandwidth 10G (qemu) migrate -d tcp:picard-02:4444 and wait a few seconds and the VM is happy on the destination; I bounced it back in the other direction as well. So the basics seem OK for tht stripped down set.
Seen on the source: 2022-11-10 14:00:44.189+0000: 127919: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}}] 2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}} 2022-11-10 14:00:44.234+0000: 127919: info : qemuMonitorIOWrite:382 : QEMU_MONITOR_IO_WRITE: mon=0x7f0988083460 buf={"execute":"query-migrate","id":"libvirt-427"}^M 2022-11-10 14:00:44.303+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 303047}, "event": "MIGRATION", "data": {"status": "failed"}} so the source thinks something failed in those few ms On the destination we see: 2022-11-10 14:00:39.232+0000: 120615: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f08fc083460 event={"timestamp": {"seconds": 1668088839, "microseconds": 232939}, "event": "MIGRATION", "data": {"status": "active"}} 2022-11-10 14:00:44.338+0000: 120543: debug : virThreadJobSet:93 : Thread 120543 (rpc-virtqemud) is now running job remoteDispatchDomainMigrateFinish3Params 2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjStartJobPhase:765 : Starting phase 'finish3' of 'migration in' job 2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjSetJobPhase:734 : Setting 'migration in' phase to 'finish3' 2022-11-10 14:00:44.339+0000: 120615: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"}] 2022-11-10 14:00:44.339+0000: 120615: info : qemuMonitorJSONIOProcessLine:214 : QEMU_MONITOR_RECV_REPLY: mon=0x7f08fc083460 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"} so the source seems to have seen a fail, but the destination was apparently happy - I *think* the destination thinks it finished
Apparently the migration is almost finished as the reported failure happens very shortly after 2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}} (this event is accompanied with the MIGRATION_PASS event mentined in comment #3) that is QEMU is migrating the device state after transferring all memory. And interestingly only 4 seconds after guest cpus were resumed (as a recovery from failed migration) the domain crashes: 2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0 name=pool-10-37-167-238 2022-11-10 14:00:48.958+0000: 127981: debug : processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238' closed without SHUTDOWN event; assuming the domain crashed That finish3 phase from comment #3 is started by 2022-11-10 14:00:44.338+0000: 120543: debug : virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190, params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0, cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1 where cancelled=1 means this API is called to tell the destination migration failed on the source and the domain on the destination host needs to be killed. So it looks something bad happened when device state was supposed to be transferred. But there's no info about what happened as even "query-migrate" called just after the failure only says "status": "failed" without any error message. Perhaps QEMU log for the domain on the source host could help?
(In reply to Jiri Denemark from comment #6) > Apparently the migration is almost finished as the reported failure happens > very shortly after > > 2022-11-10 14:00:44.189+0000: 127919: info : > qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: > mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, > "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}} > > (this event is accompanied with the MIGRATION_PASS event mentined in comment > #3) that is QEMU is migrating the device state after transferring all memory. > And interestingly only 4 seconds after guest cpus were resumed (as a recovery > from failed migration) the domain crashes: > > 2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error > on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0 > name=pool-10-37-167-238 > 2022-11-10 14:00:48.958+0000: 127981: debug : > processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238' > closed without SHUTDOWN event; assuming the domain crashed > > That finish3 phase from comment #3 is started by > > 2022-11-10 14:00:44.338+0000: 120543: debug : > virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190, > params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0, > cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1 > > where cancelled=1 means this API is called to tell the destination migration > failed on the source and the domain on the destination host needs to be > killed. > > So it looks something bad happened when device state was supposed to be > transferred. But there's no info about what happened as even "query-migrate" > called just after the failure only says "status": "failed" without any error > message. Perhaps QEMU log for the domain on the source host could help? The source host has: 2022-11-10 14:00:39.232+0000: initiating migration 2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) 2022-11-10T14:00:44.384781Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-11-10T14:00:44.384815Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-11-10T14:00:44.384825Z qemu-kvm: Unable to read from socket: Bad file descriptor qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. 2022-11-10 14:00:48.958+0000: shutting down, reason=crashed but I'm assuming that's the recovery failing rather than the migration failing
Hmm still not sure what's going on here; the: 2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) is on the source near the end of the migration - so that's bad and suggests it's a block problem; but it doesn't always happen which makes me suspicious. Still, if that is happening, I wonder if either libvirt or qemu isn't understanding this is shared storage?
BTW, the migration was started with flags 0x101, that is VIR_MIGRATE_LIVE and VIR_MIGRATE_CHANGE_PROTECTION. There should be no storage migration of any kind involved.
Good news, I've been able to replicate this in my environment (a nested host): L0 RHEL8.x ish L1 RHEL9.1 L2 RHEL9.1 L0 provides iSCSI and virt fencing to the L1's virsh migrate --live L2rhel9.1 qemu+ssh://dgilbertbz2132247n2/system --verbose on the source we see: char device redirected to /dev/pts/0 (label charserial0) 2022-12-13 19:47:07.526+0000: initiating migration 2022-12-13T19:47:07.912528Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) 2022-12-13T19:47:07.998369Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-12-13T19:47:07.998385Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-12-13T19:47:07.998390Z qemu-kvm: Unable to read from socket: Bad file descriptor qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. 2022-12-13 19:47:21.241+0000: shutting down, reason=crashed
Downgraded L1 qemu's to 6.2.0-11.el9_0.6.x86_64 and it still fails.
Back on the 7.0.0 qemu Rolled kernel back to 5.14.0-70.39.1.el9_0.x86_64 - still fails
Still fails with libvirt-8.4.0-3 Hmm, so having rolled pretty much everything interesting back, (individually) that's not helping. I'll go back to taking gdb and tracing to it.
Still fails on upstream 7.2 qemu
There's some selinuxism going on here, I'm not yet understanding what, but on the source during migrate: Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2. For complete SELinux messages run: sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49 Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2. [root@dgilbertbz2132247n1 ~]# sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49 SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2. ***** Plugin restorecon_source (62.6 confidence) suggests ***************** If you want to fix the label. /opt/bin/qemu-system-x86_64 default label should be bin_t. Then you can run restorecon. Do # /sbin/restorecon -v /opt/bin/qemu-system-x86_64 ***** Plugin qemu_file_image (37.7 confidence) suggests ******************* If rhel-guest-image-forl2.qcow2 is a virtualization target Then you need to change the label on rhel-guest-image-forl2.qcow2' Do # semanage fcontext -a -t virt_image_t 'rhel-guest-image-forl2.qcow2' # restorecon -v 'rhel-guest-image-forl2.qcow2' ***** Plugin catchall (1.12 confidence) suggests ************************** If you believe that qemu-system-x86_64 should be allowed read access on the rhel-guest-image-forl2.qcow2 file by default. Then you should report this as a bug. You can generate a local policy module to allow this access. Do allow this access for now by executing: # ausearch -c 'live_migration' --raw | audit2allow -M my-livemigration # semodule -X 300 -i my-livemigration.pp Additional Information: Source Context system_u:system_r:svirt_t:s0:c227,c943 Target Context system_u:object_r:svirt_image_t:s0:c179,c732 Target Objects rhel-guest-image-forl2.qcow2 [ file ] Source live_migration Source Path /opt/bin/qemu-system-x86_64 Port <Unknown> Host dgilbertbz2132247n1 Source RPM Packages Target RPM Packages SELinux Policy RPM selinux-policy-targeted-34.1.43-1.el9.noarch Local Policy RPM selinux-policy-targeted-34.1.43-1.el9.noarch Selinux Enabled True Policy Type targeted Enforcing Mode Enforcing Host Name dgilbertbz2132247n1 Platform Linux dgilbertbz2132247n1 5.14.0-162.11.1.el9_1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 8 05:28:54 EST 2022 x86_64 x86_64 Alert Count 1 First Seen 2022-12-14 13:10:05 EST Last Seen 2022-12-14 13:10:05 EST Local ID 0c080ecf-c3c5-4d71-a8b3-875bd2770b49 Raw Audit Messages type=AVC msg=audit(1671041405.277:1638): avc: denied { read } for pid=293735 comm="live_migration" name="rhel-guest-image-forl2.qcow2" dev="dm-0" ino=97994 scontext=system_u:system_r:svirt_t:s0:c227,c943 tcontext=system_u:object_r:svirt_image_t:s0:c179,c732 tclass=file permissive=0 type=SYSCALL msg=audit(1671041405.277:1638): arch=x86_64 syscall=openat success=no exit=EACCES a0=ffffff9c a1=56367c2e06d1 a2=80000 a3=0 items=0 ppid=1 pid=293735 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm=live_migration exe=/opt/bin/qemu-system-x86_64 subj=system_u:system_r:svirt_t:s0:c227,c943 key=(null) Hash: live_migration,svirt_t,svirt_image_t,file,read
With some debug added, I'm seeing: 2022-12-14 18:10:04.902+0000: initiating migration bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: ENTRY driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: bdrv_inactivate: 0 bdrv_inactivate_recurse: klass->inactivate: 0 bdrv_inactivate_recurse: ENTRY driver: file filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: klass->inactivate: 0 bdrv_inactivate_recurse: perms cumulative_perms=b bdrv_inactivate_recurse: bdrv_inactivate_recurse ret=-1 bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 recurse=-1 2022-12-14T18:10:05.361844Z qemu-system-x86_64: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) so the failure there is: bdrv_inactivate_recurse: perms cumulative_perms=b which is the cumulative_perms check added in qemu a13de40a054 'a13de40 block: bdrv_inactivate_recurse(): check for permissions and fix crash Vladimir Sementsov-Ogievskiy <vsementsov> Sat Sep 11 15:00:27 2021 +0300 ' =b' corresponds to 0xb=11 = 8+2+1 = BLK_PERM_RESIZE + BLK_PERM_WRITE + BLK_PERM_CONSISTENT_READ and I think that check is saying we don't expect BLK_PERM_WRITE at that stage - something should have dropped it by then.
Seen on the destination side only: Dec 14 13:45:40 dgilbertbz2132247n2 virtqemud[296716]: 2022-12-14 18:45:40.987+0000: 296716: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=selinux Dec 14 13:45:41 dgilbertbz2132247n2 virtqemud[296717]: 2022-12-14 18:45:41.014+0000: 296717: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=dac
If I hack the check from a13de40 out, I get further but the destination fails with: Dec 14 14:43:42 dgilbertbz2132247n2 virtqemud[336909]: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock I noticed in the reporter's logs that they have: 2022-11-10 14:00:44.541+0000: 120543: debug : virFileIsSharedFSType:3442 : Check if path /var/lib/libvirt/images/pool-10-37-167-238-delta with FS magic 18225520 is shared I'm not seeing that in my setup Jiri: Is there some magic that should mean libvirt notices it's shared? Do those Xattr timestamp things in comment 19 mean anything to you?
For me, disabling selinux is making this work.
Hmm, it might be a labeling issue of some kind. It would be nice to see the domain XML, libvirt logs and the mount table. I see /var/lib/libvirt/images/pool-10-37-167-238-delta is on GFS2, but no idea what filesystem /mnt/shared/rhel-guest-image-forl2.qcow2 is on.
The mount entry in my nested reproducer setup is: /dev/mapper/shared_vg-shared_lv /mnt/shared gfs2 rw,seclabel,noatime,rgrplvb 0 0
jdenemar pointed me at: https://listman.redhat.com/archives/libvir-list/2022-December/236418.html which was asked int he context of ceph, but is the same: ''For shared filesytems, which DO have SELinux labelling, then label assignment needs to be tracked across all hosts which can use that filesystem for VM storage. Libvirt is NOT capable of doing this tracking itself. So mgmt apps need to tell libvirt to use a static label which they've figured out uniqueness for globally. Libvirt can still do relabelling of files. IOW, we'd recommend this: <seclabel type='static' model='selinux' relabel='yes'> ''