Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1941961

Summary:

Libvirt should validate the UID/GID of src and des server before initiating the live migration

Product:

Red Hat Enterprise Linux 9

Reporter:

Pei Zhang <pezhang>

Component:

libvirt

Assignee:

Michal Privoznik <mprivozn>

libvirt sub component:

Live Migration

QA Contact:

Fangge Jin <fjin>

Status:

CLOSED WONTFIX

Docs Contact:

Severity:

low

Priority:

low

CC:

berrange, chayang, dgilbert, jinzhao, juzhang, kchamart, lmen, mprivozn, stefanha, virt-maint, xuzhang

Version:

9.0

Keywords:

Triaged

Target Milestone:

Flags:

pm-rhel: mirror+

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-09-23 07:27:49 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Guest XML	none

Description Pei Zhang 2021-03-23 10:00:43 UTC

Description of problem:
Install a fresh VM using virt-install on rhel8.3.1 host, then migrate it to rhel8.4 host, fails with "qemu-kvm: load of migration failed: Input/output error"

Version-Release number of selected component (if applicable):
src host: RHEL-AV 8.3.1:
4.18.0-240.18.1.el8_3.x86_64
qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64
openvswitch2.13-2.13.0-97.el8fdp.x86_64
libvirt-6.6.0-13.1.module+el8.3.1+10185+675b2148.x86_64

des host: RHEL-AV 8.4:
4.18.0-298.el8.x86_64
qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64
openvswitch2.15-2.15.0-3.el8fdp.x86_64
libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64

VM:RHEL 8.4
4.18.0-298.el8.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Install a fresh rhel8.4 VM on rhel8.3.1 host

virt-install  \
	--graphics type=spice,listen=0.0.0.0 \
	--name=rhel8.4 \
	--machine q35 \
	--memory=8192 \
	--vcpus=6 \
	--controller type=scsi,model=virtio-scsi  \
	--disk path=/mnt/nfv/rhel8.4.qcow2,size=40,target.bus=scsi \
	-l http://download.eng.pek2.redhat.com/nightly/RHEL-8.4.0-20210316.n.1/compose/BaseOS/x86_64/os/ \
	--noautoconsole \
	--noreboot \
	-x ks=http://10.73.72.41/kickstart-rhel8.cfg  \
	--network bridge=switch,model=virtio,mac=28:66:11:22:33:44

2. After the VM installation finishes, keep it's default VM XML and start the VM.

# virsh start rhel8.4
Domain rhel8.4 started


3. Migrate the VM to rhel8.4 host, fails. Here are several detail items:

(1) On src host
# virsh migrate --verbose --persistent --live rhel8.4 qemu+ssh://10.73.72.194/system
Migration: [ 98 %]error: internal error: qemu unexpectedly closed the monitor: 2021-03-23T09:55:25.326295Z qemu-kvm: load of migration failed: Input/output error

(2) On des host

# cat /var/log/libvirt/qemu/rhel8.4.log
...
2021-03-23 09:55:15.555+0000: Domain id=2 is tainted: high-privileges
char device redirected to /dev/pts/1 (label charserial0)
2021-03-23T09:55:25.326295Z qemu-kvm: load of migration failed: Input/output error
2021-03-23 09:55:25.436+0000: shutting down, reason=failed


(3) VM console shows below error info

vm-73-219 login: 
Red Hat Enterprise Linux 8.4 Beta (Ootpa)
Kernel 4.18.0-298.el8.x86_64 on an x86_64

vm-73-219 login: [   65.641344] sd 0:0:0:0: [sda] tag#53 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[   65.643844] sd 0:0:0:0: [sda] tag#53 Sense Key : Aborted Command [current] 
[   65.645599] sd 0:0:0:0: [sda] tag#53 Add. Sense: I/O process terminated
[   65.647318] sd 0:0:0:0: [sda] tag#53 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[   65.649529] blk_update_request: I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[   65.692325] sd 0:0:0:0: [

Comment 1 Pei Zhang 2021-03-23 10:02:24 UTC

(Sorry some info is wrongly lost, continue to update here)

(3) VM console shows below error info

vm-73-219 login: 
Red Hat Enterprise Linux 8.4 Beta (Ootpa)
Kernel 4.18.0-298.el8.x86_64 on an x86_64

vm-73-219 login: [   65.641344] sd 0:0:0:0: [sda] tag#53 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[   65.643844] sd 0:0:0:0: [sda] tag#53 Sense Key : Aborted Command [current] 
[   65.645599] sd 0:0:0:0: [sda] tag#53 Add. Sense: I/O process terminated
[   65.647318] sd 0:0:0:0: [sda] tag#53 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[   65.649529] blk_update_request: I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
[   65.692325] sd 0:0:0:0: [sda] tag#60 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[   65.695259] sd 0:0:0:0: [sda] tag#60 Sense Key : Aborted Command [current] 
[   65.697138] sd 0:0:0:0: [sda] tag#60 Add. Sense: I/O process terminated
[   65.698937] sd 0:0:0:0: [sda] tag#60 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
[   65.701263] blk_update_request: I/O error, dev sda, sector 47200168 op 0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 0
[   65.704236] XFS (dm-0): log I/O error -5
[   65.705423] XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1304 of file fs/xfs/xfs_log.c. Return address = 00000000ff1087a0
[   65.708840] XFS (dm-0): Log I/O Error Detected. Shutting down filesystem
[   65.710809] XFS (dm-0): Please unmount the filesystem and rectify the problem(s)



Actual results:
A fresh VM fails to migrate from rhel8.3.1 to rhel8.4.

Expected results:
VM should be migrated from rhel8.3.1 to rhel8.4 successfully.

Additional info:
1. Src host and des host has same CPU
# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              64
On-line CPU(s) list: 0-63
Thread(s) per core:  2
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
Stepping:            4
CPU MHz:             2100.122
BogoMIPS:            4200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            22528K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts pku ospke md_clear flush_l1d


2. Qemu cmd line generated by libvirt is below. The full VM XML will be attached in next Comment.
/usr/libexec/qemu-kvm \
-name guest=rhel8.4,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-rhel8.4/master-key.aes \
-machine pc-q35-rhel8.3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on \
-m 8192 \
-overcommit mem-lock=off \
-smp 6,sockets=6,cores=1,threads=1 \
-uuid 5d4cb110-f921-4718-ba5a-10929bf29dd8 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=34,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \
-device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
-blockdev '{"driver":"file","filename":"/mnt/nfv/rhel8.4.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1 \
-netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=38 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=28:66:da:5f:dd:01,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=39,server,nowait \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-chardev spicevmc,id=charchannel1,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \
-device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \
-chardev spicevmc,id=charredir0,name=usbredir \
-device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 \
-chardev spicevmc,id=charredir1,name=usbredir \
-device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-object rng-random,id=objrng0,filename=/dev/urandom \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Comment 2 Pei Zhang 2021-03-23 10:07:16 UTC

Created attachment 1765474 [details]
Guest XML

Comment 4 John Ferlan 2021-03-31 19:05:44 UTC

Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.

Comment 5 Dr. David Alan Gilbert 2021-04-01 10:21:08 UTC

stefan: Any idea where this SCSI error is coming from?

Comment 6 Stefan Hajnoczi 2021-04-07 09:04:07 UTC

(In reply to Dr. David Alan Gilbert from comment #5)
> stefan: Any idea where this SCSI error is coming from?

No. A disk cache flush request (Synchronize Cache(10)) failed but I'm not sure if that was before or after migration. It may have been before since migration drains all in-flight requests.

The flush request most likely got "Sense Key : Aborted Command" and "Sense: I/O process terminated" here:
https://gitlab.com/qemu-project/qemu/-/blob/master/scsi/utils.c#L618

That means the errno was not EDOM, EBADE, ENODATA, EREMOTEIO, ENOMEDIUM, ENOMEM, EINVAL, or ENOSPC. Beyond that I don't think there is enough information to explain the failed flush request.

I suggest looking at the migration load EIO error to understand which device failed and why.

Maybe there is an issue with network connectivity to /mnt/nfv/rhel8.4.qcow2 (e.g. an NFS server has become unreachable)?

Comment 7 Pei Zhang 2021-04-15 08:31:43 UTC

(In reply to Stefan Hajnoczi from comment #6)
> (In reply to Dr. David Alan Gilbert from comment #5)
> > stefan: Any idea where this SCSI error is coming from?
> 
> No. A disk cache flush request (Synchronize Cache(10)) failed but I'm not
> sure if that was before or after migration. It may have been before since
> migration drains all in-flight requests.
> 
> The flush request most likely got "Sense Key : Aborted Command" and "Sense:
> I/O process terminated" here:
> https://gitlab.com/qemu-project/qemu/-/blob/master/scsi/utils.c#L618
> 
> That means the errno was not EDOM, EBADE, ENODATA, EREMOTEIO, ENOMEDIUM,
> ENOMEM, EINVAL, or ENOSPC. Beyond that I don't think there is enough
> information to explain the failed flush request.
> 
> I suggest looking at the migration load EIO error to understand which device
> failed and why.
> 
> Maybe there is an issue with network connectivity to /mnt/nfv/rhel8.4.qcow2
> (e.g. an NFS server has become unreachable)?

Hello Stefan,

Thanks for your analyze. I double confirmed that the nfs was mounted well, VM can boot very well with /mnt/nfv/rhel8.4.qcow2. It only hit errors when migrating. 

I've setup the env ready. Feel free to let me know if you would have a look in my setup. Thanks a lot.

Best regards,

Pei

Comment 8 Dr. David Alan Gilbert 2021-04-15 10:56:07 UTC

This is feeling like an selinux ism to me, but I'm not sure
on both hosts:
getsebool virt_use_nfs
virt_use_nfs --> on

[root@dell-per740-03 qemu]# ls -lZ /mnt/nfv/rhel8.4.qcow2
-rw-------. 1 root qemu system_u:object_r:nfs_t:s0 42956488704 Apr 15 06:07 /mnt/nfv/rhel8.4.qcow2[root@dell-per740-03 qemu]# getfacl /mnt/nfv/rhel8.4.qcow2
getfacl: Removing leading '/' from absolute path names
# file: mnt/nfv/rhel8.4.qcow2
# owner: root
# group: qemu
user::rw-
group::---
other::---

(qemu is 107 on both)
NFS mount is:
10.73.72.41:/home/nfv-shared /mnt/nfv nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.73.72.196,local_lock=none,addr=10.73.72.41 0 0

[root@dell-per740-03 qemu]# qemu-img info /mnt/nfv/rhel8.4.qcow2
image: /mnt/nfv/rhel8.4.qcow2
file format: qcow2
virtual size: 40 GiB (42949672960 bytes)
disk size: 2.4 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

(unusual to have compression?)


    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/mnt/nfv/rhel8.4.qcow2'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

the migration fails; it's not clear why, but on the source we then see:
Apr 15 06:32:52 dell-per740-03.lab.eng.pek2.redhat.com libvirtd[76973]: internal error: unable to execute QEMU command 'cont': Could not reopen qcow2 layer: Could not read qcow2 header: Permission denied
Apr 15 06:32:52 dell-per740-03.lab.eng.pek2.redhat.com libvirtd[76973]: Failed to resume guest rhel8.4 after failure

manually resuming it gives a pile of scsi errors with 'Sense: Medium not present'

Comment 9 Dr. David Alan Gilbert 2021-04-15 10:58:51 UTC

Hmm, the attached xml shows seclabel entries for selinux and dac, but the one on the source system has no seclabel entry at all.

Comment 10 Dr. David Alan Gilbert 2021-04-15 11:06:51 UTC

I added the seclabel entries back on to the host and the behaviour changes;
the source restarts - but the log is full of xfs errors, it's not ibvious why since the qemuhas locks? Cerrtainly still has the fil e open

Comment 11 Daniel Berrangé 2021-04-15 12:42:57 UTC

I've done some debugging of the QEMU migration process on the source host

First, the libvirt logs show migration running normally until suddenly it transitions to "failed" status:

SEND_MSG: mon=0x7f9cd403c620 msg={"execute":"query-migrate","id":"libvirt-398"}
IO_WRITE: mon=0x7f9cd403c620 buf={"execute":"query-migrate","id":"libvirt-398"}
RECV_REPLY: mon=0x7f9cd403c620 reply={"return": {"expected-downtime": 300, "status": "active", "setup-time": 6, "total-time": 7795, "ram": {"total": 8725020672, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 28524, "page-size": 4096, "remaining": 178008064, "mbps": 936.526099, "transferred": 621245557, "duplicate": 1939550, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 602611712, "normal": 147122}}, "id": "libvirt-398"}
RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 651109}, "event": "MIGRATION_PASS", "data": {"pass": 2}}
RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664651}, "event": "STOP"}
RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664877}, "event": "MIGRATION", "data": {"status": "failed"}}
RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664906}, "event": "RESUME"}


Using qemu-trace-stap I get these probes at the same time:

92342@1618490027162060048 migration_bitmap_sync_end dirty_pages 3746
92342@1618490027162233397 migrate_pending pending size 15343616 max 35370571 (pre = 15343616 compat=0 post=0)
92342@1618490027162236588 migrate_pending pending size 15343616 max 35370571 (pre = 15343616 compat=0 post=0)
92342@1618490027162241464 migration_thread_low_pending 15343616
92342@1618490027162243778 migration_thread_low_pending 15343616
92342@1618490027193707192 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 1
92342@1618490027193711061 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 1
92342@1618490027193715672 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 0
92342@1618490027193717484 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 0
92342@1618490027194352614 migrate_set_state new state failed
92342@1618490027194355126 migrate_set_state new state failed
92342@1618490027194420461 migration_thread_after_loop 
92342@1618490027194422844 migration_thread_after_loop 
92342@1618490027211010419 savevm_state_cleanup 
92342@1618490027211013420 savevm_state_cleanup 
92342@1618490027278068078 migrate_fd_cleanup 
92342@1618490027278071314 migrate_fd_cleanup 


Further investigating with GDB, I find we're in "migration_completion" and the call to "vm_stop_force_state" is what's failing

Further investigating shows that bdrv_flush_all is returning an error, and this comes from the raw-posix.c file's   "handle_aiocb_flush"

That method is seeing  "page_cache_inconsistent == true", which indicates that a previous call to qemu_fdatasync() failed.

Finally, watching fdatasync with strace shows

[pid 92429] fdatasync(18)               = 0
[pid 92429] fdatasync(18)               = -1 EACCES (Permission denied)


This happens pretty much at the end of the migration.

Setting  "setenforce 0" on both hosts has no impact - we still see failure, so it isn't SELinux fault.

Running migration without the "--live" flag doesn't see a failure. 

QEMU does acquire  fcntl() locks on the file, but these are supposed to be advisory from the OS kernel - ie a fcntl() lock shouldn't cause EACCESS from fdatasync.....

...but NFS is involved here - I wonder if NFS server is (mistakenly) turning the advisory fcntl() locks into mandatory locks ?

Comment 12 Dr. David Alan Gilbert 2021-04-15 13:22:55 UTC

I tcpdump'd the NFS; I'm seeing a couple of denys, one on an open, one on a read:

    Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR
        Opcode: SEQUENCE (53)
            sessionid: 0e1ebb5fb6bf162ef402000000000000
            seqid: 0x00007bb2
            slot id: 0
            high slot id: 0
            cache this?: Yes
        Opcode: PUTFH (22)
            FileHandle
                length: 40
                [hash (CRC-32): 0x60e94268]
                FileHandle: 0100078147000000000000004ee18c417b4142a49294ff5b57508d024c00000000000000…
        Opcode: OPEN (18)
            seqid: 0x00000000
            share_access: OPEN4_SHARE_ACCESS_READ (1)
            share_deny: OPEN4_SHARE_DENY_NONE (0)
            clientid: 0x0e1ebb5fb6bf162e
            owner: <DATA>
                length: 24
                contents: <DATA>
            Open Type: OPEN4_NOCREATE (0)
            Claim Type: CLAIM_FH (4)
        Opcode: ACCESS (3), [Check: RD MD XT XE]
            Check access: 0x2d
                .... ...1 = 0x001 READ: allowed?
                .... .1.. = 0x004 MODIFY: allowed?
                .... 1... = 0x008 EXTEND: allowed?
                ..1. .... = 0x020 EXECUTE: allowed?
        Opcode: GETATTR (9)
            Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId)
                reqd_attr: Type (1)
                reqd_attr: Change (3)
                reqd_attr: Size (4)
                reqd_attr: FSID (8)
                reco_attr: FileId (20)
            Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId)
                reco_attr: Mode (33)
                reco_attr: NumLinks (35)
                reco_attr: Owner (36)
                reco_attr: Owner_Group (37)
                reco_attr: RawDev (41)
                reco_attr: Space_Used (45)
                reco_attr: Time_Access (47)
                reco_attr: Time_Metadata (52)
                reco_attr: Time_Modify (53)
                reco_attr: Mounted_on_FileId (55)
    [Main Opcode: OPEN (18)]
Network File System, Ops(3): SEQUENCE PUTFH OPEN(NFS4ERR_ACCESS)
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_ACCESS (13)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 3)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: 0e1ebb5fb6bf162ef402000000000000
            seqid: 0x00007bb2
            slot id: 0
            high slot id: 9
            target high slot id: 9
            status flags: 0x00000000
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .0.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Not set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: OPEN (18)
            Status: NFS4ERR_ACCESS (13)
    [Main Opcode: OPEN (18)]

then a bit later:
Network File System, Ops(3): SEQUENCE, PUTFH, READ
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    minorversion: 2
    Operations (count: 3): SEQUENCE, PUTFH, READ
        Opcode: SEQUENCE (53)
            sessionid: 0e1ebb5fb6bf162ef402000000000000
            seqid: 0x00007bb4
            slot id: 0
            high slot id: 0
            cache this?: No
        Opcode: PUTFH (22)
            FileHandle
                length: 40
                [hash (CRC-32): 0x60e94268]
                FileHandle: 0100078147000000000000004ee18c417b4142a49294ff5b57508d024c00000000000000…
        Opcode: READ (25)
            StateID
                [StateID Hash: 0xcdd2]
                StateID seqid: 0
                StateID Other: 0e1ebb5fb6bf162e28010000
                [StateID Other hash: 0xbc0779b9]
            offset: 0
            count: 16384
    [Main Opcode: READ (25)]
Network File System, Ops(3): SEQUENCE PUTFH READ(NFS4ERR_ACCESS)
    [Program Version: 4]
    [V4 Procedure: COMPOUND (1)]
    Status: NFS4ERR_ACCESS (13)
    Tag: <EMPTY>
        length: 0
        contents: <EMPTY>
    Operations (count: 3)
        Opcode: SEQUENCE (53)
            Status: NFS4_OK (0)
            sessionid: 0e1ebb5fb6bf162ef402000000000000
            seqid: 0x00007bb4
            slot id: 0
            high slot id: 9
            target high slot id: 9
            status flags: 0x00000000
                .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set
                .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set
                .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set
                .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set
                .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set
                .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set
                .... .... .... .... .... .... .0.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Not set
                .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set
                .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set
                .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set
                .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set
                .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set
                .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set
        Opcode: PUTFH (22)
            Status: NFS4_OK (0)
        Opcode: READ (25)
            Status: NFS4ERR_ACCESS (13)
    [Main Opcode: READ (25)]

so that surprises me if it's related to fdatasync

Comment 13 Dr. David Alan Gilbert 2021-04-15 13:35:41 UTC

I ran a loop constantly ls'ing and getfacl'ing the file on the source:


09:32:53.086023846
-rw-------. 1 qemu qemu 42956488704 Apr 15 09:32 /mnt/nfv/rhel8.4.qcow2
getfacl: Removing leading '/' from absolute path names
# file: mnt/nfv/rhel8.4.qcow2
# owner: qemu
# group: qemu
user::rw-
group::---
other::---

09:32:53.090876709
ls: cannot access '/mnt/nfv/rhel8.4.qcow2': Input/output error
getfacl: Removing leading '/' from absolute path names
# file: mnt/nfv/rhel8.4.qcow2
# owner: root
# group: qemu
user::rw-
group::---
other::---

09:32:53.097749266
-rw-------. 1 root qemu 42956488704 Apr 15 09:32 /mnt/nfv/rhel8.4.qcow2
getfacl: Removing leading '/' from absolute path names
# file: mnt/nfv/rhel8.4.qcow2
# owner: root
# group: qemu
user::rw-
group::---
other::---


so at some point it's changed from qemu.qemu to root.qemu - with an intermediate io error.

Comment 14 Dr. David Alan Gilbert 2021-04-15 15:59:55 UTC

tracing on the destination:

19315@1618501920271377940 ram_load_loop 0000:00:01.0/qxl.vrom: addr: 0x0 flags: 0x8 host: 0x7f8258400000
19315@1618501920271388884 ram_load_loop 0000:00:01.0/qxl.vrom: addr: 0x1000 flags: 0x28 host: 0x7f8258401000
19315@1618501920271394341 ram_load_loop /rom@etc/table-loader: addr: 0x0 flags: 0x8 host: 0x7f8232800000
19315@1618501920271399415 ram_load_loop /rom@etc/acpi/rsdp: addr: 0x0 flags: 0x8 host: 0x7f8232600000
19315@1618501920271409522 qemu_loadvm_state_post_main -5
19315@1618501920271413482 loadvm_state_cleanup
19315@1618501920272033219 migrate_set_state new state failed

the -5 is io error; but that's coming during/at the end of the ram load - we've not loaded other devices or got out of the loading;
so I think the failure is a consequence of the source failing, not the other way around.

Comment 15 Dr. David Alan Gilbert 2021-04-15 17:10:04 UTC

I tried tracing chowns; it'sa  bit difficult to follow, but given qemu's uid is 107 (=0x6b), I think:

source:
           <...>-112635 [023] .... 49286.398953: sys_chown(filename: 7f9c10140970, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398964: sys_chown(filename: 7f9c10140950, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398968: sys_chown(filename: 7f9c1013ff80, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398973: sys_chown(filename: 7f9c1013ffb0, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398977: sys_chown(filename: 7f9c1013ffe0, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398980: sys_chown(filename: 7f9c10140010, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398985: sys_chown(filename: 556b472b4f50, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398990: sys_chown(filename: 7f9c10140040, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398994: sys_chown(filename: 7f9c10140070, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.398997: sys_chown(filename: 7f9c101400a0, user: 6b, group: 6b)
           <...>-112635 [023] .... 49286.399001: sys_chown(filename: 7f9c101401e0, user: 6b, group: 6b)
           <...>-112697 [003] .... 49286.578633: sys_chown(filename: 55f23e9c4010, user: 0, group: 0)
           <...>-112700 [003] .... 49286.594524: sys_chown(filename: 5599400bc6a0, user: 0, group: 0)
           <...>-112266 [011] .... 49317.258315: sys_chown(filename: 5570b7884280, user: 0, group: 0)
           <...>-112266 [011] .... 49317.258550: sys_chown(filename: 5570b7884280, user: 0, group: 0)
           <...>-112705 [024] .... 49340.020061: sys_chown(filename: 7f9c64044200, user: 6b, group: 6b)
           <...>-112707 [024] .... 49340.025003: sys_chown(filename: 7f9c64027ff0, user: 6b, group: 6b)
           <...>-112709 [024] .... 49340.030089: sys_chown(filename: 7f9c64027fb0, user: 6b, group: 6b)
           <...>-112716 [024] .... 49340.077825: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0)
           <...>-112716 [024] .... 49340.077904: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0)
           <...>-112716 [024] .... 49340.077975: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0)
           <...>-112716 [024] .... 49340.078047: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0)
           <...>-112716 [024] .... 49340.078116: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0)
           <...>-112716 [024] .... 49340.078188: sys_lchown(filename: 7f9c64029c10, user: 0, group: 5)
           <...>-112716 [024] .... 49340.078247: sys_lchown(filename: 7f9c64029c10, user: 0, group: 24)
           <...>-112716 [024] .... 49340.078566: sys_lchown(filename: 7f9c640357e0, user: 0, group: 0)
  NetworkManager-1726  [017] .... 49340.081779: sys_chown(filename: 55c670470a60, user: 0, group: 0)
           <...>-112756 [020] .... 49340.227858: sys_chown(filename: 7f9c101bb6c0, user: 6b, group: 6b)
           <...>-112756 [020] .... 49340.228016: sys_chown(filename: 7f9c6402e230, user: 6b, group: 6b)

so the soruce always seems to be setting things to qemu/qemu.
On the dest however:
      rpc-worker-19693 [009] .... 50283.404186: sys_chown(filename: 7f014c031d50, user: 0, group: 6b)
      rpc-worker-19695 [003] .... 50283.408440: sys_chown(filename: 7f014c0361a0, user: 0, group: 6b)
      rpc-worker-19697 [009] .... 50283.413162: sys_chown(filename: 7f014c0385b0, user: 0, group: 6b)
  NetworkManager-1621  [005] .... 50283.458396: sys_chown(filename: 5603a223dc10, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478358: sys_lchown(filename: 7f014c040fe0, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478408: sys_lchown(filename: 7f014c040c20, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478428: sys_lchown(filename: 7f014c040c40, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478449: sys_lchown(filename: 7f014c040c60, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478474: sys_lchown(filename: 7f014c040c80, user: 0, group: 0)
      rpc-worker-19711 [001] .... 50283.478494: sys_lchown(filename: 7f014c040fa0, user: 0, group: 5)
      rpc-worker-19711 [001] .... 50283.478518: sys_lchown(filename: 7f014c040fc0, user: 0, group: 24)
      rpc-worker-19711 [001] .... 50283.478542: sys_lchown(filename: 7f014c040c00, user: 0, group: 0)
      rpc-worker-19726 [000] .... 50283.507233: sys_chown(filename: 7f014c0243e0, user: 0, group: 6b)
      rpc-worker-19726 [000] .... 50283.507386: sys_chown(filename: 7f014c035be0, user: 0, group: 6b)
  NetworkManager-1621  [007] .... 50292.565646: sys_chown(filename: 5603a2247160, user: 0, group: 0)

so it looks like libvirt's rpc-worker is setting stuff to root.qemu

Still, there's some NFSisms I'm not understanding, on the source after a failed migration:

[root@dell-per740-03 tracing]# ls -l /mnt/nfv/rhel8.4.qcow2
ls: cannot access '/mnt/nfv/rhel8.4.qcow2': Input/output error
[root@dell-per740-03 tracing]# ls -l /mnt/nfv/rhel8.4.qcow2
-rw-------. 1 root qemu 42956488704 Apr 15 12:57 /mnt/nfv/rhel8.4.qcow2

I can't see how that ls should have failed

Comment 16 Daniel Berrangé 2021-04-15 17:35:19 UTC

The red flag is setting     "user:0, group: 107".

A normally  configure libvirt will use  "user:107, group: 107" when granting access, and "user:0, group: 0" when revoking access.

Looking at /etc/libvirt/qemu.conf on the target host I see

   user="root"

So, we have the target host configured to use a different acount from the source host.  This is not going to work when you have disk images on shared filesystems that need to be accessible by both.


I'm still not 100% sure at which point dest libvirtd is setting the ownershup to  "user:0, group: 107" though - it is not supposed to touch images on shared filesystems at all for an incoming migration, as its supposed to assume the source host already has ownership correctly set.

None the less removing the bogus   user="root" from qemu.conf makes this work again.

Comment 17 Daniel Berrangé 2021-04-15 17:54:22 UTC

(In reply to Daniel Berrangé from comment #16)
> I'm still not 100% sure at which point dest libvirtd is setting the
> ownershup to  "user:0, group: 107" though - it is not supposed to touch
> images on shared filesystems at all for an incoming migration, as its
> supposed to assume the source host already has ownership correctly set.

libvirt appears to set the ownership during startup of the incoming QEMU on the target.

Normally this is fine, because both source + dest hosts are configured with the same UID/GID for running QEMU, and this when target QEMU starts its chown is basically a no-op.

In the case of this mis-configuration though, the chown during startup of the incoming QEMU break access to the source host. The effect on the source host seems a little delayed possibly NFS client is caching the access mode for a while.

Anyway it is critical that both hosts be using the same UID:GID when a shared FS is involved.

Libvirt probably ought to validate this before initiating the migration. ie the target libvirt can report back to the source what UID it intends to use for QEMU, and source libvirtd can validate this, if there's any shared FS involved.

Comment 18 Pei Zhang 2021-04-16 04:41:25 UTC

Thanks David and Daniel for the debugging and finding the root cause. 

I tested again, after changing the libvirt on src and des with same UID, migration works successfully. Sorry I didn't notice this difference on libvirt(src server is setup by automation, des server is setup manually, that's why I missed this libvirt configuration mismatch).


(In reply to Daniel Berrangé from comment #17)
[...]
> 
> Libvirt probably ought to validate this before initiating the migration. ie
> the target libvirt can report back to the source what UID it intends to use
> for QEMU, and source libvirtd can validate this, if there's any shared FS
> involved.

Make sense. It's better Libvirt give users warning or reminding info about the UID difference.

So I would move this bug to libvirt component and change the title accordingly. 

Feel free to correct me if anything wrong. Thanks.

Comment 19 Fangge Jin 2021-04-19 14:09:44 UTC

Change to low priority since the test scenario is not normal

Comment 20 Stefan Hajnoczi 2021-06-08 12:51:41 UTC

(In reply to Dr. David Alan Gilbert from comment #8)
> [root@dell-per740-03 qemu]# qemu-img info /mnt/nfv/rhel8.4.qcow2
> image: /mnt/nfv/rhel8.4.qcow2
> file format: qcow2
> virtual size: 40 GiB (42949672960 bytes)
> disk size: 2.4 GiB
> cluster_size: 65536
> Format specific information:
>     compat: 1.1
>     compression type: zlib
>     lazy refcounts: true
>     refcount bits: 16
>     corrupt: false
> 
> (unusual to have compression?)

zlib is the default compression type. It does not mean that compression is enabled. All images will show a compression type.

Comment 21 Michal Privoznik 2021-07-20 09:41:31 UTC

(In reply to Daniel Berrangé from comment #17)

> Libvirt probably ought to validate this before initiating the migration. ie
> the target libvirt can report back to the source what UID it intends to use
> for QEMU, and source libvirtd can validate this, if there's any shared FS
> involved.

And libvirt would do that if only NFS supported XATTRs. I mean, if the destination tried to set a different UID then the destination libvirtd would report an error.
But I don't think we should be getting into permission check, let kernel handle that instead. The best we can do is to check whether numerical values of UID:GID match on both src and dest. And we would have to do that on the destination because the moment source initiates migration the destination starts QEMU and relabels everything.

Comment 22 Daniel Berrangé 2021-07-20 10:10:56 UTC

(In reply to Michal Privoznik from comment #21)
> (In reply to Daniel Berrangé from comment #17)

> But I don't think we should be getting into permission check, let kernel
> handle that instead. The best we can do is to check whether numerical values
> of UID:GID match on both src and dest. And we would have to do that on the
> destination because the moment source initiates migration the destination
> starts QEMU and relabels everything.

Yes, I was only suggesting comparing the numerical IDs, on the basis we want to ensure that DAC relabelling is a no-op when shared filesystems are involved.

During the "Begin" phase we would need to put the UID:GID in the cookie XML, and then the "Prepare" phase on dest would validate a match.

Comment 23 John Ferlan 2021-09-08 13:31:21 UTC

Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.

Comment 25 RHEL Program Management 2022-09-23 07:27:49 UTC

After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.