Bug 2132247
| Summary: | pacemaker VirtualDomain live migration fails | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 9 | Reporter: | michal novacek <mnovacek> | |
| Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> | |
| Status: | ASSIGNED --- | QA Contact: | cluster-qe <cluster-qe> | |
| Severity: | unspecified | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 9.1 | CC: | agk, cluster-maint, fdinitto, jdenemar, peterx | |
| Target Milestone: | rc | |||
| Target Release: | --- | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2152186 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2152186 | |||
|
Description
michal novacek
2022-10-05 08:46:48 UTC
I tried to reproduce this without libvirt, but it works:
on riker:
/usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}'
on picard:
/usr/libexec/qemu-kvm -m 2048 -cpu host -smp 2 -machine pc-q35-rhel9.0.0 -accel kvm -nographic -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/rhel9.1","node-name":"libvirt-3-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-3-format","read-only":true,"driver":"qcow2","file":"libvirt-3-storage","backing":null}' -blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/pool-10-37-167-238-delta","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"}' -blockdev '{"node-name":"libvirt-2-format","read-only":false,"driver":"qcow2","file":"libvirt-2-storage","backing":"libvirt-3-format"}' -device '{"driver":"virtio-blk-pci","drive":"libvirt-2-format","id":"virtio-disk0","bootindex":1}' -incoming tcp::4444
let the VM boot on the source, then ctrl-a c
(qemu) migrate_set_parameter max-bandwidth 10G
(qemu) migrate -d tcp:picard-02:4444
and wait a few seconds and the VM is happy on the destination; I bounced it back in the other direction as well.
So the basics seem OK for tht stripped down set.
Seen on the source:
2022-11-10 14:00:44.189+0000: 127919: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}}]
2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189656}, "event": "MIGRATION_PASS", "data": {"pass": 3}}
2022-11-10 14:00:44.234+0000: 127919: info : qemuMonitorIOWrite:382 : QEMU_MONITOR_IO_WRITE: mon=0x7f0988083460 buf={"execute":"query-migrate","id":"libvirt-427"}^M
2022-11-10 14:00:44.303+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 303047}, "event": "MIGRATION", "data": {"status": "failed"}}
so the source thinks something failed in those few ms
On the destination we see:
2022-11-10 14:00:39.232+0000: 120615: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f08fc083460 event={"timestamp": {"seconds": 1668088839, "microseconds": 232939}, "event": "MIGRATION", "data": {"status": "active"}}
2022-11-10 14:00:44.338+0000: 120543: debug : virThreadJobSet:93 : Thread 120543 (rpc-virtqemud) is now running job remoteDispatchDomainMigrateFinish3Params
2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjStartJobPhase:765 : Starting phase 'finish3' of 'migration in' job
2022-11-10 14:00:44.338+0000: 120543: debug : qemuDomainObjSetJobPhase:734 : Setting 'migration in' phase to 'finish3'
2022-11-10 14:00:44.339+0000: 120615: debug : qemuMonitorJSONIOProcessLine:195 : Line [{"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"}]
2022-11-10 14:00:44.339+0000: 120615: info : qemuMonitorJSONIOProcessLine:214 : QEMU_MONITOR_RECV_REPLY: mon=0x7f08fc083460 reply={"return": {"status": "inmigrate", "singlestep": false, "running": false}, "id": "libvirt-408"}
so the source seems to have seen a fail, but the destination was apparently happy - I *think* the destination thinks it finished
Apparently the migration is almost finished as the reported failure happens
very shortly after
2022-11-10 14:00:44.189+0000: 127919: info : qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}}
(this event is accompanied with the MIGRATION_PASS event mentined in comment
#3) that is QEMU is migrating the device state after transferring all memory.
And interestingly only 4 seconds after guest cpus were resumed (as a recovery
from failed migration) the domain crashes:
2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0 name=pool-10-37-167-238
2022-11-10 14:00:48.958+0000: 127981: debug : processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238' closed without SHUTDOWN event; assuming the domain crashed
That finish3 phase from comment #3 is started by
2022-11-10 14:00:44.338+0000: 120543: debug : virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190, params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0, cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1
where cancelled=1 means this API is called to tell the destination migration
failed on the source and the domain on the destination host needs to be
killed.
So it looks something bad happened when device state was supposed to be
transferred. But there's no info about what happened as even "query-migrate"
called just after the failure only says "status": "failed" without any error
message. Perhaps QEMU log for the domain on the source host could help?
(In reply to Jiri Denemark from comment #6) > Apparently the migration is almost finished as the reported failure happens > very shortly after > > 2022-11-10 14:00:44.189+0000: 127919: info : > qemuMonitorJSONIOProcessLine:209 : QEMU_MONITOR_RECV_EVENT: > mon=0x7f0988083460 event={"timestamp": {"seconds": 1668088844, > "microseconds": 189440}, "event": "MIGRATION", "data": {"status": "device"}} > > (this event is accompanied with the MIGRATION_PASS event mentined in comment > #3) that is QEMU is migrating the device state after transferring all memory. > And interestingly only 4 seconds after guest cpus were resumed (as a recovery > from failed migration) the domain crashes: > > 2022-11-10 14:00:48.756+0000: 127919: debug : qemuMonitorIO:556 : Error > on monitor <null> mon=0x7f0988083460 vm=0x55d5f04ca6a0 > name=pool-10-37-167-238 > 2022-11-10 14:00:48.958+0000: 127981: debug : > processMonitorEOFEvent:4113 : Monitor connection to 'pool-10-37-167-238' > closed without SHUTDOWN event; assuming the domain crashed > > That finish3 phase from comment #3 is started by > > 2022-11-10 14:00:44.338+0000: 120543: debug : > virDomainMigrateFinish3Params:5425 : dconn=0x7f08fc015190, > params=0x7f09380030f0, nparams=3, cookiein=(nil), cookieinlen=0, > cookieout=0x7f0942ee78d0, cookieoutlen=0x7f0942ee78c4, flags=0x1, cancelled=1 > > where cancelled=1 means this API is called to tell the destination migration > failed on the source and the domain on the destination host needs to be > killed. > > So it looks something bad happened when device state was supposed to be > transferred. But there's no info about what happened as even "query-migrate" > called just after the failure only says "status": "failed" without any error > message. Perhaps QEMU log for the domain on the source host could help? The source host has: 2022-11-10 14:00:39.232+0000: initiating migration 2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) 2022-11-10T14:00:44.384781Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-11-10T14:00:44.384815Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-11-10T14:00:44.384825Z qemu-kvm: Unable to read from socket: Bad file descriptor qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. 2022-11-10 14:00:48.958+0000: shutting down, reason=crashed but I'm assuming that's the recovery failing rather than the migration failing Hmm still not sure what's going on here; the: 2022-11-10T14:00:44.302946Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) is on the source near the end of the migration - so that's bad and suggests it's a block problem; but it doesn't always happen which makes me suspicious. Still, if that is happening, I wonder if either libvirt or qemu isn't understanding this is shared storage? BTW, the migration was started with flags 0x101, that is VIR_MIGRATE_LIVE and VIR_MIGRATE_CHANGE_PROTECTION. There should be no storage migration of any kind involved. Good news, I've been able to replicate this in my environment (a nested host): L0 RHEL8.x ish L1 RHEL9.1 L2 RHEL9.1 L0 provides iSCSI and virt fencing to the L1's virsh migrate --live L2rhel9.1 qemu+ssh://dgilbertbz2132247n2/system --verbose on the source we see: char device redirected to /dev/pts/0 (label charserial0) 2022-12-13 19:47:07.526+0000: initiating migration 2022-12-13T19:47:07.912528Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) 2022-12-13T19:47:07.998369Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-12-13T19:47:07.998385Z qemu-kvm: Unable to read from socket: Bad file descriptor 2022-12-13T19:47:07.998390Z qemu-kvm: Unable to read from socket: Bad file descriptor qemu-kvm: ../block/io.c:2022: int bdrv_co_write_req_prepare(BdrvChild *, int64_t, int64_t, BdrvTrackedRequest *, int): Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed. 2022-12-13 19:47:21.241+0000: shutting down, reason=crashed Downgraded L1 qemu's to 6.2.0-11.el9_0.6.x86_64 and it still fails. Back on the 7.0.0 qemu Rolled kernel back to 5.14.0-70.39.1.el9_0.x86_64 - still fails Still fails with libvirt-8.4.0-3 Hmm, so having rolled pretty much everything interesting back, (individually) that's not helping. I'll go back to taking gdb and tracing to it. Still fails on upstream 7.2 qemu There's some selinuxism going on here, I'm not yet understanding what, but on the source during migrate:
Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2. For complete SELinux messages run: sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49
Dec 14 13:10:06 dgilbertbz2132247n1 setroubleshoot[295298]: SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2.
[root@dgilbertbz2132247n1 ~]# sealert -l 0c080ecf-c3c5-4d71-a8b3-875bd2770b49
SELinux is preventing /opt/bin/qemu-system-x86_64 from read access on the file rhel-guest-image-forl2.qcow2.
***** Plugin restorecon_source (62.6 confidence) suggests *****************
If you want to fix the label.
/opt/bin/qemu-system-x86_64 default label should be bin_t.
Then you can run restorecon.
Do
# /sbin/restorecon -v /opt/bin/qemu-system-x86_64
***** Plugin qemu_file_image (37.7 confidence) suggests *******************
If rhel-guest-image-forl2.qcow2 is a virtualization target
Then you need to change the label on rhel-guest-image-forl2.qcow2'
Do
# semanage fcontext -a -t virt_image_t 'rhel-guest-image-forl2.qcow2'
# restorecon -v 'rhel-guest-image-forl2.qcow2'
***** Plugin catchall (1.12 confidence) suggests **************************
If you believe that qemu-system-x86_64 should be allowed read access on the rhel-guest-image-forl2.qcow2 file by default.
Then you should report this as a bug.
You can generate a local policy module to allow this access.
Do
allow this access for now by executing:
# ausearch -c 'live_migration' --raw | audit2allow -M my-livemigration
# semodule -X 300 -i my-livemigration.pp
Additional Information:
Source Context system_u:system_r:svirt_t:s0:c227,c943
Target Context system_u:object_r:svirt_image_t:s0:c179,c732
Target Objects rhel-guest-image-forl2.qcow2 [ file ]
Source live_migration
Source Path /opt/bin/qemu-system-x86_64
Port <Unknown>
Host dgilbertbz2132247n1
Source RPM Packages
Target RPM Packages
SELinux Policy RPM selinux-policy-targeted-34.1.43-1.el9.noarch
Local Policy RPM selinux-policy-targeted-34.1.43-1.el9.noarch
Selinux Enabled True
Policy Type targeted
Enforcing Mode Enforcing
Host Name dgilbertbz2132247n1
Platform Linux dgilbertbz2132247n1
5.14.0-162.11.1.el9_1.x86_64 #1 SMP
PREEMPT_DYNAMIC Thu Dec 8 05:28:54 EST 2022 x86_64
x86_64
Alert Count 1
First Seen 2022-12-14 13:10:05 EST
Last Seen 2022-12-14 13:10:05 EST
Local ID 0c080ecf-c3c5-4d71-a8b3-875bd2770b49
Raw Audit Messages
type=AVC msg=audit(1671041405.277:1638): avc: denied { read } for pid=293735 comm="live_migration" name="rhel-guest-image-forl2.qcow2" dev="dm-0" ino=97994 scontext=system_u:system_r:svirt_t:s0:c227,c943 tcontext=system_u:object_r:svirt_image_t:s0:c179,c732 tclass=file permissive=0
type=SYSCALL msg=audit(1671041405.277:1638): arch=x86_64 syscall=openat success=no exit=EACCES a0=ffffff9c a1=56367c2e06d1 a2=80000 a3=0 items=0 ppid=1 pid=293735 auid=4294967295 uid=107 gid=107 euid=107 suid=107 fsuid=107 egid=107 sgid=107 fsgid=107 tty=(none) ses=4294967295 comm=live_migration exe=/opt/bin/qemu-system-x86_64 subj=system_u:system_r:svirt_t:s0:c227,c943 key=(null)
Hash: live_migration,svirt_t,svirt_image_t,file,read
With some debug added, I'm seeing: 2022-12-14 18:10:04.902+0000: initiating migration bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: ENTRY driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: bdrv_inactivate: 0 bdrv_inactivate_recurse: klass->inactivate: 0 bdrv_inactivate_recurse: ENTRY driver: file filename: /mnt/shared/rhel-guest-image-forl2.qcow2 bdrv_inactivate_recurse: klass->inactivate: 0 bdrv_inactivate_recurse: perms cumulative_perms=b bdrv_inactivate_recurse: bdrv_inactivate_recurse ret=-1 bdrv_inactivate_all: driver: qcow2 filename: /mnt/shared/rhel-guest-image-forl2.qcow2 recurse=-1 2022-12-14T18:10:05.361844Z qemu-system-x86_64: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1) so the failure there is: bdrv_inactivate_recurse: perms cumulative_perms=b which is the cumulative_perms check added in qemu a13de40a054 'a13de40 block: bdrv_inactivate_recurse(): check for permissions and fix crash Vladimir Sementsov-Ogievskiy <vsementsov> Sat Sep 11 15:00:27 2021 +0300 ' =b' corresponds to 0xb=11 = 8+2+1 = BLK_PERM_RESIZE + BLK_PERM_WRITE + BLK_PERM_CONSISTENT_READ and I think that check is saying we don't expect BLK_PERM_WRITE at that stage - something should have dropped it by then. Seen on the destination side only: Dec 14 13:45:40 dgilbertbz2132247n2 virtqemud[296716]: 2022-12-14 18:45:40.987+0000: 296716: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=selinux Dec 14 13:45:41 dgilbertbz2132247n2 virtqemud[296717]: 2022-12-14 18:45:41.014+0000: 296717: warning : virSecurityValidateTimestamp:205 : Invalid XATTR timestamp detected on /mnt/shared/rhel-guest-image-forl2.qcow2 secdriver=dac If I hack the check from a13de40 out, I get further but the destination fails with:
Dec 14 14:43:42 dgilbertbz2132247n2 virtqemud[336909]: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock
I noticed in the reporter's logs that they have:
2022-11-10 14:00:44.541+0000: 120543: debug : virFileIsSharedFSType:3442 : Check if path /var/lib/libvirt/images/pool-10-37-167-238-delta with FS magic 18225520 is shared
I'm not seeing that in my setup
Jiri: Is there some magic that should mean libvirt notices it's shared?
Do those Xattr timestamp things in comment 19 mean anything to you?
For me, disabling selinux is making this work. Hmm, it might be a labeling issue of some kind. It would be nice to see the domain XML, libvirt logs and the mount table. I see /var/lib/libvirt/images/pool-10-37-167-238-delta is on GFS2, but no idea what filesystem /mnt/shared/rhel-guest-image-forl2.qcow2 is on. The mount entry in my nested reproducer setup is: /dev/mapper/shared_vg-shared_lv /mnt/shared gfs2 rw,seclabel,noatime,rgrplvb 0 0 jdenemar pointed me at: https://listman.redhat.com/archives/libvir-list/2022-December/236418.html which was asked int he context of ceph, but is the same: ''For shared filesytems, which DO have SELinux labelling, then label assignment needs to be tracked across all hosts which can use that filesystem for VM storage. Libvirt is NOT capable of doing this tracking itself. So mgmt apps need to tell libvirt to use a static label which they've figured out uniqueness for globally. Libvirt can still do relabelling of files. IOW, we'd recommend this: <seclabel type='static' model='selinux' relabel='yes'> '' |