Bug 2132247 - pacemaker VirtualDomain live migration fails
Summary: pacemaker VirtualDomain live migration fails
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: resource-agents
Version: 9.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe
URL:
Whiteboard:
Depends On:
Blocks: 2152186
TreeView+ depends on / blocked
 
Reported: 2022-10-05 08:46 UTC by michal novacek
Modified: 2023-08-10 15:39 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2152186 (view as bug list)
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-135632 0 None None None 2022-10-05 10:05:07 UTC

Description michal novacek 2022-10-05 08:46:48 UTC
Description of problem:
VirtualDomain live migration does not work and always ends up with vm being rebooted.

It seems to me that the qemu command for live migration fails and causes this:

Oct 05 10:19:14  VirtualDomain(pool-10-37-167-241)[1104724]:    INFO: pool-10-37-167-241: Starting live migration to picard-02 (using: virsh --connect=qemu:///system --quiet migrate --live  pool-10-37-167-241 qemu+ssh://picard-02/system ).
Oct 05 10:19:21  VirtualDomain(pool-10-37-167-241)[1104724]:    ERROR: pool-10-37-167-241: live migration to picard-02 failed: 1
Oct 05 10:19:21  VirtualDomain(pool-10-37-167-241)[1104977]:    INFO: Issuing graceful shutdown request for domain pool-10-37-167-241.


Version-Release number of selected component (if applicable): 
pacemaker-2.1.4-5.el9.x86_64
resource-agents-4.10.0-23.el9.x86_64
libvirt-8.5.0-7.el9_1.x86_64

How reproducible: always


Steps to Reproduce:
1. pcs resource move pool-10-37-167-241
2. 
3.

Actual results: offline migration

Expected results: live migration

Additional info:

running "virsh --connect=qemu:///system --quiet migrate --live  pool-10-37-167-241 qemu+ssh://picard-02/system" manually will just end up with error:

[root@riker-01 ~]# virsh --connect=qemu:///system migrate --live  pool-10-37-167-242 qemu+ssh://10.37.165.56/system --verbose 
Migration: [ 94 %]error: operation failed: job 'migration out' unexpectedly failed

Comment 2 Dr. David Alan Gilbert 2022-11-10 11:26:12 UTC
I tried to reproduce this without libvirt, but it works:

on riker:
/usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}'

on picard:
/usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}' -incoming tcp::4444

let the VM boot on the source, then ctrl-a c
(qemu)   migrate_set_parameter max-bandwidth 10G
(qemu)   migrate -d tcp:picard-02:4444

and wait a few seconds and the VM is happy on the destination; I bounced it back in the other direction as well.
So the basics seem OK for tht stripped down set.

Comment 3 Dr. David Alan Gilbert 2022-11-10 14:18:31 UTC
Seen on the source:

2022-11-10 14:00:44.189+0000: 127919: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}}]
2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}}

2022-11-10 14:00:44.234+0000: 127919: info : qemuMonitorIOWrite:382 : QEMU_MONITOR_IO_WRITE: mon=0x7f0988083460 buf={"execute":"query-migrate","id":"libvirt-427"}^M
2022-11-10 14:00:44.303+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 303047}, "event": "MIGRATION", "data": {"status": "failed"}}

so the source thinks something failed in those few ms

On the destination we see:
2022-11-10 14:00:39.232+0000: 120615: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f08fc083460 event={"timestamp": {"seconds": 1668088839, "microseconds": 232939}, "event": "MIGRATION", "data": {"status": "active"}}
2022-11-10 14:00:44.338+0000: 120543: debug : virThreadJobSet:93 : Thread 120543 (rpc-virtqemud) is now running job remoteDispatchDomainMigrateFinish3Params
2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjStartJobPhase:765 : Starting phase 'finish3' of 'migration in' job
2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjSetJobPhase:734 : Setting 'migration in' phase to 'finish3'

2022-11-10 14:00:44.339+0000: 120615: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"}]
2022-11-10 14:00:44.339+0000: 120615: info : qemuMonitorJSONIOProcessLine:214 : QEMU_MONITOR_RECV_REPLY: mon=0x7f08fc083460 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"}

so the source seems to have seen a fail, but the destination was apparently happy - I *think* the destination thinks it finished

Comment 6 Jiri Denemark 2022-11-10 15:48:23 UTC
Apparently the migration is almost finished as the reported failure happens
very shortly after

    2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}}

(this event is accompanied with the MIGRATION_PASS event mentined in comment
#3) that is QEMU is migrating the device state after transferring all memory.
And interestingly only 4 seconds after guest cpus were resumed (as a recovery
from failed migration) the domain crashes:

    2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0 name=pool-10-37-167-238
    2022-11-10 14:00:48.958+0000: 127981: debug : processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238' closed without SHUTDOWN event; assuming the domain crashed

That finish3 phase from comment #3 is started by

    2022-11-10 14:00:44.338+0000: 120543: debug : virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190, params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0, cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1

where cancelled=1 means this API is called to tell the destination migration
failed on the source and the domain on the destination host needs to be
killed.

So it looks something bad happened when device state was supposed to be
transferred. But there's no info about what happened as even "query-migrate"
called just after the failure only says "status": "failed" without any error
message. Perhaps QEMU log for the domain on the source host could help?

Comment 7 Dr. David Alan Gilbert 2022-11-10 15:58:09 UTC
(In reply to Jiri Denemark from comment #6)
> Apparently the migration is almost finished as the reported failure happens
> very shortly after
> 
>     2022-11-10 14:00:44.189+0000: 127919: info :
> qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT:
> mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844,
> "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}}
> 
> (this event is accompanied with the MIGRATION_PASS event mentined in comment
> #3) that is QEMU is migrating the device state after transferring all memory.
> And interestingly only 4 seconds after guest cpus were resumed (as a recovery
> from failed migration) the domain crashes:
> 
>     2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error
> on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0
> name=pool-10-37-167-238
>     2022-11-10 14:00:48.958+0000: 127981: debug :
> processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238'
> closed without SHUTDOWN event; assuming the domain crashed
> 
> That finish3 phase from comment #3 is started by
> 
>     2022-11-10 14:00:44.338+0000: 120543: debug :
> virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190,
> params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0,
> cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1
> 
> where cancelled=1 means this API is called to tell the destination migration
> failed on the source and the domain on the destination host needs to be
> killed.
> 
> So it looks something bad happened when device state was supposed to be
> transferred. But there's no info about what happened as even "query-migrate"
> called just after the failure only says "status": "failed" without any error
> message. Perhaps QEMU log for the domain on the source host could help?

The source host has:

2022-11-10 14:00:39.232+0000: initiating migration
2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
2022-11-10T14:00:44.384781Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-11-10T14:00:44.384815Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-11-10T14:00:44.384825Z qemu-kvm: Unable to read from socket: Bad file descriptor
qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
2022-11-10 14:00:48.958+0000: shutting down, reason=crashed

but I'm assuming that's the recovery failing rather than the migration failing

Comment 8 Dr. David Alan Gilbert 2022-11-10 20:09:32 UTC
Hmm still not sure what's going on here; the:

2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

is on the source near the end of the migration - so that's bad and suggests it's a block problem; but it doesn't
always happen which makes me suspicious.  Still, if that is happening, I wonder if either libvirt or qemu
isn't understanding this is shared storage?

Comment 9 Jiri Denemark 2022-11-11 09:13:05 UTC
BTW, the migration was started with flags 0x101, that is VIR_MIGRATE_LIVE and
VIR_MIGRATE_CHANGE_PROTECTION. There should be no storage migration of any
kind involved.

Comment 12 Dr. David Alan Gilbert 2022-12-13 19:53:26 UTC
Good news, I've been able to replicate this in my environment (a nested host):
  L0 RHEL8.x ish
  L1 RHEL9.1
  L2 RHEL9.1

L0 provides iSCSI and virt fencing to the L1's

virsh migrate --live L2rhel9.1 qemu+ssh://dgilbertbz2132247n2/system --verbose

on the source we see:

char device redirected to /dev/pts/0 (label charserial0)
2022-12-13 19:47:07.526+0000: initiating migration
2022-12-13T19:47:07.912528Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
2022-12-13T19:47:07.998369Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-12-13T19:47:07.998385Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-12-13T19:47:07.998390Z qemu-kvm: Unable to read from socket: Bad file descriptor
qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
2022-12-13 19:47:21.241+0000: shutting down, reason=crashed

Comment 13 Dr. David Alan Gilbert 2022-12-14 10:08:49 UTC
Downgraded L1 qemu's to 6.2.0-11.el9_0.6.x86_64 and it still fails.

Comment 14 Dr. David Alan Gilbert 2022-12-14 11:30:33 UTC
Back on the 7.0.0 qemu
Rolled kernel back to 5.14.0-70.39.1.el9_0.x86_64 - still fails

Comment 15 Dr. David Alan Gilbert 2022-12-14 12:04:28 UTC
Still fails with libvirt-8.4.0-3

Hmm, so having rolled pretty much everything interesting back, (individually)
that's not helping.
I'll go back to taking gdb and tracing to it.

Comment 16 Dr. David Alan Gilbert 2022-12-14 16:42:37 UTC
Still fails on upstream 7.2 qemu

Comment 17 Dr. David Alan Gilbert 2022-12-14 18:12:56 UTC
There's some selinuxism going on here, I'm not yet understanding what, but on the source during migrate:

Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2. For complete SELinux messages run: sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49
Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2.
[root@dgilbertbz2132247n1 ~]# sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49
SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2.

*****  Plugin restorecon_source (62.6 confidence) suggests   *****************

If you want to fix the label. 
/opt/bin/qemu-system-x86_64 default label should be bin_t.
Then you can run restorecon.
Do
# /sbin/restorecon -v /opt/bin/qemu-system-x86_64

*****  Plugin qemu_file_image (37.7 confidence) suggests   *******************

If rhel-guest-image-forl2.qcow2 is a virtualization target
Then you need to change the label on rhel-guest-image-forl2.qcow2'
Do
# semanage fcontext -a -t virt_image_t 'rhel-guest-image-forl2.qcow2'
# restorecon -v 'rhel-guest-image-forl2.qcow2'

*****  Plugin catchall (1.12 confidence) suggests   **************************

If you believe that qemu-system-x86_64 should be allowed read access on the rhel-guest-image-forl2.qcow2 file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'live_migration' --raw | audit2allow -M my-livemigration
# semodule -X 300 -i my-livemigration.pp


Additional Information:
Source Context                system_u:system_r:svirt_t:s0:c227,c943
Target Context                system_u:object_r:svirt_image_t:s0:c179,c732
Target Objects                rhel-guest-image-forl2.qcow2 [ file ]
Source                        live_migration
Source Path                   /opt/bin/qemu-system-x86_64
Port                          <Unknown>
Host                          dgilbertbz2132247n1
Source RPM Packages           
Target RPM Packages           
SELinux Policy RPM            selinux-policy-targeted-34.1.43-1.el9.noarch
Local Policy RPM              selinux-policy-targeted-34.1.43-1.el9.noarch
Selinux Enabled               True
Policy Type                   targeted
Enforcing Mode                Enforcing
Host Name                     dgilbertbz2132247n1
Platform                      Linux dgilbertbz2132247n1
                              5.14.0-162.11.1.el9_1.x86_64 #1 SMP
                              PREEMPT_DYNAMIC Thu Dec 8 05:28:54 EST 2022 x86_64
                              x86_64
Alert Count                   1
First Seen                    2022-12-14 13:10:05 EST
Last Seen                     2022-12-14 13:10:05 EST
Local ID                      0c080ecf-c3c5-4d71-a8b3-875bd2770b49

Raw Audit Messages
type=AVC msg=audit(1671041405.277:1638): avc:  denied  { read } for  pid=293735 comm="live_migration" name="rhel-guest-image-forl2.qcow2" dev="dm-0" ino=97994 scontext=system_u:system_r:svirt_t:s0:c227,c943 tcontext=system_u:object_r:svirt_image_t:s0:c179,c732 tclass=file permissive=0


type=SYSCALL msg=audit(1671041405.277:1638): arch=x86_64 syscall=openat success=no exit=EACCES a0=ffffff9c a1=56367c2e06d1 a2=80000 a3=0 items=0 ppid=1 pid=293735 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm=live_migration exe=/opt/bin/qemu-system-x86_64 subj=system_u:system_r:svirt_t:s0:c227,c943 key=(null)

Hash: live_migration,svirt_t,svirt_image_t,file,read

Comment 18 Dr. David Alan Gilbert 2022-12-14 18:22:33 UTC
With some debug added, I'm seeing:

2022-12-14 18:10:04.902+0000: initiating migration
bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2
bdrv_inactivate_recurse: ENTRY driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2
bdrv_inactivate_recurse: bdrv_inactivate: 0
bdrv_inactivate_recurse: klass->inactivate: 0
bdrv_inactivate_recurse: ENTRY driver: file filename: /mnt/shared/rhel-guest-image-forl2.qcow2
bdrv_inactivate_recurse: klass->inactivate: 0
bdrv_inactivate_recurse: perms cumulative_perms=b
bdrv_inactivate_recurse: bdrv_inactivate_recurse ret=-1
bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 recurse=-1
2022-12-14T18:10:05.361844Z qemu-system-x86_64: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

so the failure there is:
bdrv_inactivate_recurse: perms cumulative_perms=b which is the cumulative_perms check added in qemu a13de40a054
'a13de40 block: bdrv_inactivate_recurse(): check for permissions and fix crash
   Vladimir Sementsov-Ogievskiy <vsementsov> Sat Sep 11 15:00:27 2021 +0300
'
=b' corresponds to 0xb=11 = 8+2+1 = BLK_PERM_RESIZE + BLK_PERM_WRITE + BLK_PERM_CONSISTENT_READ

and I think that check is saying we don't expect BLK_PERM_WRITE at that stage - something should have dropped it
by then.

Comment 19 Dr. David Alan Gilbert 2022-12-14 18:59:38 UTC
Seen on the destination side only:

Dec 14 13:45:40 dgilbertbz2132247n2 virtqemud[296716]: 2022-12-14 18:45:40.987+0000: 296716: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=selinux
Dec 14 13:45:41 dgilbertbz2132247n2 virtqemud[296717]: 2022-12-14 18:45:41.014+0000: 296717: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=dac

Comment 20 Dr. David Alan Gilbert 2022-12-14 19:58:21 UTC
If I hack the check from a13de40 out, I get further but the destination fails with:

Dec 14 14:43:42 dgilbertbz2132247n2 virtqemud[336909]: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock

I noticed in the reporter's logs that they have:

2022-11-10 14:00:44.541+0000: 120543: debug : virFileIsSharedFSType:3442 : Check if path /var/lib/libvirt/images/pool-10-37-167-238-delta with FS magic 18225520 is shared

I'm not seeing that in my setup

Jiri: Is there some magic that should mean libvirt notices it's shared?
      Do those Xattr timestamp things in comment 19 mean anything to you?

Comment 21 Dr. David Alan Gilbert 2022-12-15 13:41:01 UTC
For me, disabling selinux is making this work.

Comment 22 Jiri Denemark 2022-12-15 15:22:36 UTC
Hmm, it might be a labeling issue of some kind. It would be nice to see the
domain XML, libvirt logs and the mount table. I see
/var/lib/libvirt/images/pool-10-37-167-238-delta is on GFS2, but no idea what
filesystem /mnt/shared/rhel-guest-image-forl2.qcow2 is on.

Comment 23 Dr. David Alan Gilbert 2023-01-05 11:02:55 UTC
The mount entry in my nested reproducer setup is:

/dev/mapper/shared_vg-shared_lv /mnt/shared gfs2 rw,seclabel,noatime,rgrplvb 0 0

Comment 25 Dr. David Alan Gilbert 2023-01-05 14:41:45 UTC
jdenemar pointed me at:
https://listman.redhat.com/archives/libvir-list/2022-December/236418.html

which was asked int he context of ceph, but is the same:

''For shared filesytems, which DO have SELinux labelling, then label
assignment needs to be tracked across all hosts which can use that
filesystem for VM storage.

Libvirt is NOT capable of doing this tracking itself. So mgmt apps
need to tell libvirt to use a static label which they've figured
out uniqueness for globally. Libvirt can still do relabelling of
files. IOW, we'd recommend this:

  <seclabel type='static' model='selinux' relabel='yes'>
''


Note You need to log in before you can comment on or make changes to this bug.