Description of problem: Install a fresh VM using virt-install on rhel8.3.1 host, then migrate it to rhel8.4 host, fails with "qemu-kvm: load of migration failed: Input/output error" Version-Release number of selected component (if applicable): src host: RHEL-AV 8.3.1: 4.18.0-240.18.1.el8_3.x86_64 qemu-kvm-5.1.0-20.module+el8.3.1+9918+230f5c26.x86_64 openvswitch2.13-2.13.0-97.el8fdp.x86_64 libvirt-6.6.0-13.1.module+el8.3.1+10185+675b2148.x86_64 des host: RHEL-AV 8.4: 4.18.0-298.el8.x86_64 qemu-kvm-5.2.0-13.module+el8.4.0+10369+fd280775.x86_64 openvswitch2.15-2.15.0-3.el8fdp.x86_64 libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 VM:RHEL 8.4 4.18.0-298.el8.x86_64 How reproducible: 100% Steps to Reproduce: 1. Install a fresh rhel8.4 VM on rhel8.3.1 host virt-install \ --graphics type=spice,listen=0.0.0.0 \ --name=rhel8.4 \ --machine q35 \ --memory=8192 \ --vcpus=6 \ --controller type=scsi,model=virtio-scsi \ --disk path=/mnt/nfv/rhel8.4.qcow2,size=40,target.bus=scsi \ -l http://download.eng.pek2.redhat.com/nightly/RHEL-8.4.0-20210316.n.1/compose/BaseOS/x86_64/os/ \ --noautoconsole \ --noreboot \ -x ks=http://10.73.72.41/kickstart-rhel8.cfg \ --network bridge=switch,model=virtio,mac=28:66:11:22:33:44 2. After the VM installation finishes, keep it's default VM XML and start the VM. # virsh start rhel8.4 Domain rhel8.4 started 3. Migrate the VM to rhel8.4 host, fails. Here are several detail items: (1) On src host # virsh migrate --verbose --persistent --live rhel8.4 qemu+ssh://10.73.72.194/system Migration: [ 98 %]error: internal error: qemu unexpectedly closed the monitor: 2021-03-23T09:55:25.326295Z qemu-kvm: load of migration failed: Input/output error (2) On des host # cat /var/log/libvirt/qemu/rhel8.4.log ... 2021-03-23 09:55:15.555+0000: Domain id=2 is tainted: high-privileges char device redirected to /dev/pts/1 (label charserial0) 2021-03-23T09:55:25.326295Z qemu-kvm: load of migration failed: Input/output error 2021-03-23 09:55:25.436+0000: shutting down, reason=failed (3) VM console shows below error info vm-73-219 login: Red Hat Enterprise Linux 8.4 Beta (Ootpa) Kernel 4.18.0-298.el8.x86_64 on an x86_64 vm-73-219 login: [ 65.641344] sd 0:0:0:0: [sda] tag#53 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s [ 65.643844] sd 0:0:0:0: [sda] tag#53 Sense Key : Aborted Command [current] [ 65.645599] sd 0:0:0:0: [sda] tag#53 Add. Sense: I/O process terminated [ 65.647318] sd 0:0:0:0: [sda] tag#53 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ 65.649529] blk_update_request: I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [ 65.692325] sd 0:0:0:0: [
(Sorry some info is wrongly lost, continue to update here) (3) VM console shows below error info vm-73-219 login: Red Hat Enterprise Linux 8.4 Beta (Ootpa) Kernel 4.18.0-298.el8.x86_64 on an x86_64 vm-73-219 login: [ 65.641344] sd 0:0:0:0: [sda] tag#53 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s [ 65.643844] sd 0:0:0:0: [sda] tag#53 Sense Key : Aborted Command [current] [ 65.645599] sd 0:0:0:0: [sda] tag#53 Add. Sense: I/O process terminated [ 65.647318] sd 0:0:0:0: [sda] tag#53 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ 65.649529] blk_update_request: I/O error, dev sda, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [ 65.692325] sd 0:0:0:0: [sda] tag#60 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s [ 65.695259] sd 0:0:0:0: [sda] tag#60 Sense Key : Aborted Command [current] [ 65.697138] sd 0:0:0:0: [sda] tag#60 Add. Sense: I/O process terminated [ 65.698937] sd 0:0:0:0: [sda] tag#60 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [ 65.701263] blk_update_request: I/O error, dev sda, sector 47200168 op 0x1:(WRITE) flags 0x1800 phys_seg 1 prio class 0 [ 65.704236] XFS (dm-0): log I/O error -5 [ 65.705423] XFS (dm-0): xfs_do_force_shutdown(0x2) called from line 1304 of file fs/xfs/xfs_log.c. Return address = 00000000ff1087a0 [ 65.708840] XFS (dm-0): Log I/O Error Detected. Shutting down filesystem [ 65.710809] XFS (dm-0): Please unmount the filesystem and rectify the problem(s) Actual results: A fresh VM fails to migrate from rhel8.3.1 to rhel8.4. Expected results: VM should be migrated from rhel8.3.1 to rhel8.4 successfully. Additional info: 1. Src host and des host has same CPU # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 2 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz Stepping: 4 CPU MHz: 2100.122 BogoMIPS: 4200.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 22528K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts pku ospke md_clear flush_l1d 2. Qemu cmd line generated by libvirt is below. The full VM XML will be attached in next Comment. /usr/libexec/qemu-kvm \ -name guest=rhel8.4,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-rhel8.4/master-key.aes \ -machine pc-q35-rhel8.3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off \ -cpu Skylake-Server-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,amd-stibp=on,amd-ssbd=on,skip-l1dfl-vmentry=on,pschange-mc-no=on \ -m 8192 \ -overcommit mem-lock=off \ -smp 6,sockets=6,cores=1,threads=1 \ -uuid 5d4cb110-f921-4718-ba5a-10929bf29dd8 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=34,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0 \ -device virtio-scsi-pci,id=scsi0,bus=pci.2,addr=0x0 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ -blockdev '{"driver":"file","filename":"/mnt/nfv/rhel8.4.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=libvirt-1-format,id=scsi0-0-0-0,bootindex=1 \ -netdev tap,fd=37,id=hostnet0,vhost=on,vhostfd=38 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=28:66:da:5f:dd:01,bus=pci.1,addr=0x0 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -chardev socket,id=charchannel0,fd=39,server,nowait \ -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \ -chardev spicevmc,id=charchannel1,name=vdagent \ -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -spice port=5900,addr=0.0.0.0,disable-ticketing,image-compression=off,seamless-migration=on \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \ -device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b \ -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 \ -chardev spicevmc,id=charredir0,name=usbredir \ -device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 \ -chardev spicevmc,id=charredir1,name=usbredir \ -device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 \ -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \ -object rng-random,id=objrng0,filename=/dev/urandom \ -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on
Created attachment 1765474 [details] Guest XML
Assigned to Amnon for initial triage per bz process and age of bug created or assigned to virt-maint without triage.
stefan: Any idea where this SCSI error is coming from?
(In reply to Dr. David Alan Gilbert from comment #5) > stefan: Any idea where this SCSI error is coming from? No. A disk cache flush request (Synchronize Cache(10)) failed but I'm not sure if that was before or after migration. It may have been before since migration drains all in-flight requests. The flush request most likely got "Sense Key : Aborted Command" and "Sense: I/O process terminated" here: https://gitlab.com/qemu-project/qemu/-/blob/master/scsi/utils.c#L618 That means the errno was not EDOM, EBADE, ENODATA, EREMOTEIO, ENOMEDIUM, ENOMEM, EINVAL, or ENOSPC. Beyond that I don't think there is enough information to explain the failed flush request. I suggest looking at the migration load EIO error to understand which device failed and why. Maybe there is an issue with network connectivity to /mnt/nfv/rhel8.4.qcow2 (e.g. an NFS server has become unreachable)?
(In reply to Stefan Hajnoczi from comment #6) > (In reply to Dr. David Alan Gilbert from comment #5) > > stefan: Any idea where this SCSI error is coming from? > > No. A disk cache flush request (Synchronize Cache(10)) failed but I'm not > sure if that was before or after migration. It may have been before since > migration drains all in-flight requests. > > The flush request most likely got "Sense Key : Aborted Command" and "Sense: > I/O process terminated" here: > https://gitlab.com/qemu-project/qemu/-/blob/master/scsi/utils.c#L618 > > That means the errno was not EDOM, EBADE, ENODATA, EREMOTEIO, ENOMEDIUM, > ENOMEM, EINVAL, or ENOSPC. Beyond that I don't think there is enough > information to explain the failed flush request. > > I suggest looking at the migration load EIO error to understand which device > failed and why. > > Maybe there is an issue with network connectivity to /mnt/nfv/rhel8.4.qcow2 > (e.g. an NFS server has become unreachable)? Hello Stefan, Thanks for your analyze. I double confirmed that the nfs was mounted well, VM can boot very well with /mnt/nfv/rhel8.4.qcow2. It only hit errors when migrating. I've setup the env ready. Feel free to let me know if you would have a look in my setup. Thanks a lot. Best regards, Pei
This is feeling like an selinux ism to me, but I'm not sure on both hosts: getsebool virt_use_nfs virt_use_nfs --> on [root@dell-per740-03 qemu]# ls -lZ /mnt/nfv/rhel8.4.qcow2 -rw-------. 1 root qemu system_u:object_r:nfs_t:s0 42956488704 Apr 15 06:07 /mnt/nfv/rhel8.4.qcow2[root@dell-per740-03 qemu]# getfacl /mnt/nfv/rhel8.4.qcow2 getfacl: Removing leading '/' from absolute path names # file: mnt/nfv/rhel8.4.qcow2 # owner: root # group: qemu user::rw- group::--- other::--- (qemu is 107 on both) NFS mount is: 10.73.72.41:/home/nfv-shared /mnt/nfv nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,soft,nosharecache,proto=tcp,timeo=600,retrans=6,sec=sys,clientaddr=10.73.72.196,local_lock=none,addr=10.73.72.41 0 0 [root@dell-per740-03 qemu]# qemu-img info /mnt/nfv/rhel8.4.qcow2 image: /mnt/nfv/rhel8.4.qcow2 file format: qcow2 virtual size: 40 GiB (42949672960 bytes) disk size: 2.4 GiB cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: true refcount bits: 16 corrupt: false (unusual to have compression?) <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/mnt/nfv/rhel8.4.qcow2'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk> the migration fails; it's not clear why, but on the source we then see: Apr 15 06:32:52 dell-per740-03.lab.eng.pek2.redhat.com libvirtd[76973]: internal error: unable to execute QEMU command 'cont': Could not reopen qcow2 layer: Could not read qcow2 header: Permission denied Apr 15 06:32:52 dell-per740-03.lab.eng.pek2.redhat.com libvirtd[76973]: Failed to resume guest rhel8.4 after failure manually resuming it gives a pile of scsi errors with 'Sense: Medium not present'
Hmm, the attached xml shows seclabel entries for selinux and dac, but the one on the source system has no seclabel entry at all.
I added the seclabel entries back on to the host and the behaviour changes; the source restarts - but the log is full of xfs errors, it's not ibvious why since the qemuhas locks? Cerrtainly still has the fil e open
I've done some debugging of the QEMU migration process on the source host First, the libvirt logs show migration running normally until suddenly it transitions to "failed" status: SEND_MSG: mon=0x7f9cd403c620 msg={"execute":"query-migrate","id":"libvirt-398"} IO_WRITE: mon=0x7f9cd403c620 buf={"execute":"query-migrate","id":"libvirt-398"} RECV_REPLY: mon=0x7f9cd403c620 reply={"return": {"expected-downtime": 300, "status": "active", "setup-time": 6, "total-time": 7795, "ram": {"total": 8725020672, "postcopy-requests": 0, "dirty-sync-count": 1, "multifd-bytes": 0, "pages-per-second": 28524, "page-size": 4096, "remaining": 178008064, "mbps": 936.526099, "transferred": 621245557, "duplicate": 1939550, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 602611712, "normal": 147122}}, "id": "libvirt-398"} RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 651109}, "event": "MIGRATION_PASS", "data": {"pass": 2}} RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664651}, "event": "STOP"} RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664877}, "event": "MIGRATION", "data": {"status": "failed"}} RECV_EVENT: mon=0x7f9cd403c620 event={"timestamp": {"seconds": 1618485766, "microseconds": 664906}, "event": "RESUME"} Using qemu-trace-stap I get these probes at the same time: 92342@1618490027162060048 migration_bitmap_sync_end dirty_pages 3746 92342@1618490027162233397 migrate_pending pending size 15343616 max 35370571 (pre = 15343616 compat=0 post=0) 92342@1618490027162236588 migrate_pending pending size 15343616 max 35370571 (pre = 15343616 compat=0 post=0) 92342@1618490027162241464 migration_thread_low_pending 15343616 92342@1618490027162243778 migration_thread_low_pending 15343616 92342@1618490027193707192 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 1 92342@1618490027193711061 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 1 92342@1618490027193715672 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 0 92342@1618490027193717484 qcow2_cache_flush co 0x7fc5481c5e90 is_l2_cache 0 92342@1618490027194352614 migrate_set_state new state failed 92342@1618490027194355126 migrate_set_state new state failed 92342@1618490027194420461 migration_thread_after_loop 92342@1618490027194422844 migration_thread_after_loop 92342@1618490027211010419 savevm_state_cleanup 92342@1618490027211013420 savevm_state_cleanup 92342@1618490027278068078 migrate_fd_cleanup 92342@1618490027278071314 migrate_fd_cleanup Further investigating with GDB, I find we're in "migration_completion" and the call to "vm_stop_force_state" is what's failing Further investigating shows that bdrv_flush_all is returning an error, and this comes from the raw-posix.c file's "handle_aiocb_flush" That method is seeing "page_cache_inconsistent == true", which indicates that a previous call to qemu_fdatasync() failed. Finally, watching fdatasync with strace shows [pid 92429] fdatasync(18) = 0 [pid 92429] fdatasync(18) = -1 EACCES (Permission denied) This happens pretty much at the end of the migration. Setting "setenforce 0" on both hosts has no impact - we still see failure, so it isn't SELinux fault. Running migration without the "--live" flag doesn't see a failure. QEMU does acquire fcntl() locks on the file, but these are supposed to be advisory from the OS kernel - ie a fcntl() lock shouldn't cause EACCESS from fdatasync..... ...but NFS is involved here - I wonder if NFS server is (mistakenly) turning the advisory fcntl() locks into mandatory locks ?
I tcpdump'd the NFS; I'm seeing a couple of denys, one on an open, one on a read: Operations (count: 5): SEQUENCE, PUTFH, OPEN, ACCESS, GETATTR Opcode: SEQUENCE (53) sessionid: 0e1ebb5fb6bf162ef402000000000000 seqid: 0x00007bb2 slot id: 0 high slot id: 0 cache this?: Yes Opcode: PUTFH (22) FileHandle length: 40 [hash (CRC-32): 0x60e94268] FileHandle: 0100078147000000000000004ee18c417b4142a49294ff5b57508d024c00000000000000… Opcode: OPEN (18) seqid: 0x00000000 share_access: OPEN4_SHARE_ACCESS_READ (1) share_deny: OPEN4_SHARE_DENY_NONE (0) clientid: 0x0e1ebb5fb6bf162e owner: <DATA> length: 24 contents: <DATA> Open Type: OPEN4_NOCREATE (0) Claim Type: CLAIM_FH (4) Opcode: ACCESS (3), [Check: RD MD XT XE] Check access: 0x2d .... ...1 = 0x001 READ: allowed? .... .1.. = 0x004 MODIFY: allowed? .... 1... = 0x008 EXTEND: allowed? ..1. .... = 0x020 EXECUTE: allowed? Opcode: GETATTR (9) Attr mask[0]: 0x0010011a (Type, Change, Size, FSID, FileId) reqd_attr: Type (1) reqd_attr: Change (3) reqd_attr: Size (4) reqd_attr: FSID (8) reco_attr: FileId (20) Attr mask[1]: 0x00b0a23a (Mode, NumLinks, Owner, Owner_Group, RawDev, Space_Used, Time_Access, Time_Metadata, Time_Modify, Mounted_on_FileId) reco_attr: Mode (33) reco_attr: NumLinks (35) reco_attr: Owner (36) reco_attr: Owner_Group (37) reco_attr: RawDev (41) reco_attr: Space_Used (45) reco_attr: Time_Access (47) reco_attr: Time_Metadata (52) reco_attr: Time_Modify (53) reco_attr: Mounted_on_FileId (55) [Main Opcode: OPEN (18)] Network File System, Ops(3): SEQUENCE PUTFH OPEN(NFS4ERR_ACCESS) [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4ERR_ACCESS (13) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 3) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: 0e1ebb5fb6bf162ef402000000000000 seqid: 0x00007bb2 slot id: 0 high slot id: 9 target high slot id: 9 status flags: 0x00000000 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .0.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Not set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: OPEN (18) Status: NFS4ERR_ACCESS (13) [Main Opcode: OPEN (18)] then a bit later: Network File System, Ops(3): SEQUENCE, PUTFH, READ [Program Version: 4] [V4 Procedure: COMPOUND (1)] Tag: <EMPTY> length: 0 contents: <EMPTY> minorversion: 2 Operations (count: 3): SEQUENCE, PUTFH, READ Opcode: SEQUENCE (53) sessionid: 0e1ebb5fb6bf162ef402000000000000 seqid: 0x00007bb4 slot id: 0 high slot id: 0 cache this?: No Opcode: PUTFH (22) FileHandle length: 40 [hash (CRC-32): 0x60e94268] FileHandle: 0100078147000000000000004ee18c417b4142a49294ff5b57508d024c00000000000000… Opcode: READ (25) StateID [StateID Hash: 0xcdd2] StateID seqid: 0 StateID Other: 0e1ebb5fb6bf162e28010000 [StateID Other hash: 0xbc0779b9] offset: 0 count: 16384 [Main Opcode: READ (25)] Network File System, Ops(3): SEQUENCE PUTFH READ(NFS4ERR_ACCESS) [Program Version: 4] [V4 Procedure: COMPOUND (1)] Status: NFS4ERR_ACCESS (13) Tag: <EMPTY> length: 0 contents: <EMPTY> Operations (count: 3) Opcode: SEQUENCE (53) Status: NFS4_OK (0) sessionid: 0e1ebb5fb6bf162ef402000000000000 seqid: 0x00007bb4 slot id: 0 high slot id: 9 target high slot id: 9 status flags: 0x00000000 .... .... .... .... .... .... .... ...0 = SEQ4_STATUS_CB_PATH_DOWN: Not set .... .... .... .... .... .... .... ..0. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRING: Not set .... .... .... .... .... .... .... .0.. = SEQ4_STATUS_CB_GSS_CONTEXTS_EXPIRED: Not set .... .... .... .... .... .... .... 0... = SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED: Not set .... .... .... .... .... .... ...0 .... = SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED: Not set .... .... .... .... .... .... ..0. .... = SEQ4_STATUS_ADMIN_STATE_REVOKED: Not set .... .... .... .... .... .... .0.. .... = SEQ4_STATUS_RECALLABLE_STATE_REVOKED: Not set .... .... .... .... .... .... 0... .... = SEQ4_STATUS_LEASE_MOVED: Not set .... .... .... .... .... ...0 .... .... = SEQ4_STATUS_RESTART_RECLAIM_NEEDED: Not set .... .... .... .... .... ..0. .... .... = SEQ4_STATUS_CB_PATH_DOWN_SESSION: Not set .... .... .... .... .... .0.. .... .... = SEQ4_STATUS_BACKCHANNEL_FAULT: Not set .... .... .... .... .... 0... .... .... = SEQ4_STATUS_DEVID_CHANGED: Not set .... .... .... .... ...0 .... .... .... = SEQ4_STATUS_DEVID_DELETED: Not set Opcode: PUTFH (22) Status: NFS4_OK (0) Opcode: READ (25) Status: NFS4ERR_ACCESS (13) [Main Opcode: READ (25)] so that surprises me if it's related to fdatasync
I ran a loop constantly ls'ing and getfacl'ing the file on the source: 09:32:53.086023846 -rw-------. 1 qemu qemu 42956488704 Apr 15 09:32 /mnt/nfv/rhel8.4.qcow2 getfacl: Removing leading '/' from absolute path names # file: mnt/nfv/rhel8.4.qcow2 # owner: qemu # group: qemu user::rw- group::--- other::--- 09:32:53.090876709 ls: cannot access '/mnt/nfv/rhel8.4.qcow2': Input/output error getfacl: Removing leading '/' from absolute path names # file: mnt/nfv/rhel8.4.qcow2 # owner: root # group: qemu user::rw- group::--- other::--- 09:32:53.097749266 -rw-------. 1 root qemu 42956488704 Apr 15 09:32 /mnt/nfv/rhel8.4.qcow2 getfacl: Removing leading '/' from absolute path names # file: mnt/nfv/rhel8.4.qcow2 # owner: root # group: qemu user::rw- group::--- other::--- so at some point it's changed from qemu.qemu to root.qemu - with an intermediate io error.
tracing on the destination: 19315@1618501920271377940 ram_load_loop 0000:00:01.0/qxl.vrom: addr: 0x0 flags: 0x8 host: 0x7f8258400000 19315@1618501920271388884 ram_load_loop 0000:00:01.0/qxl.vrom: addr: 0x1000 flags: 0x28 host: 0x7f8258401000 19315@1618501920271394341 ram_load_loop /rom@etc/table-loader: addr: 0x0 flags: 0x8 host: 0x7f8232800000 19315@1618501920271399415 ram_load_loop /rom@etc/acpi/rsdp: addr: 0x0 flags: 0x8 host: 0x7f8232600000 19315@1618501920271409522 qemu_loadvm_state_post_main -5 19315@1618501920271413482 loadvm_state_cleanup 19315@1618501920272033219 migrate_set_state new state failed the -5 is io error; but that's coming during/at the end of the ram load - we've not loaded other devices or got out of the loading; so I think the failure is a consequence of the source failing, not the other way around.
I tried tracing chowns; it'sa bit difficult to follow, but given qemu's uid is 107 (=0x6b), I think: source: <...>-112635 [023] .... 49286.398953: sys_chown(filename: 7f9c10140970, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398964: sys_chown(filename: 7f9c10140950, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398968: sys_chown(filename: 7f9c1013ff80, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398973: sys_chown(filename: 7f9c1013ffb0, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398977: sys_chown(filename: 7f9c1013ffe0, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398980: sys_chown(filename: 7f9c10140010, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398985: sys_chown(filename: 556b472b4f50, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398990: sys_chown(filename: 7f9c10140040, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398994: sys_chown(filename: 7f9c10140070, user: 6b, group: 6b) <...>-112635 [023] .... 49286.398997: sys_chown(filename: 7f9c101400a0, user: 6b, group: 6b) <...>-112635 [023] .... 49286.399001: sys_chown(filename: 7f9c101401e0, user: 6b, group: 6b) <...>-112697 [003] .... 49286.578633: sys_chown(filename: 55f23e9c4010, user: 0, group: 0) <...>-112700 [003] .... 49286.594524: sys_chown(filename: 5599400bc6a0, user: 0, group: 0) <...>-112266 [011] .... 49317.258315: sys_chown(filename: 5570b7884280, user: 0, group: 0) <...>-112266 [011] .... 49317.258550: sys_chown(filename: 5570b7884280, user: 0, group: 0) <...>-112705 [024] .... 49340.020061: sys_chown(filename: 7f9c64044200, user: 6b, group: 6b) <...>-112707 [024] .... 49340.025003: sys_chown(filename: 7f9c64027ff0, user: 6b, group: 6b) <...>-112709 [024] .... 49340.030089: sys_chown(filename: 7f9c64027fb0, user: 6b, group: 6b) <...>-112716 [024] .... 49340.077825: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0) <...>-112716 [024] .... 49340.077904: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0) <...>-112716 [024] .... 49340.077975: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0) <...>-112716 [024] .... 49340.078047: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0) <...>-112716 [024] .... 49340.078116: sys_lchown(filename: 7f9c64029c10, user: 0, group: 0) <...>-112716 [024] .... 49340.078188: sys_lchown(filename: 7f9c64029c10, user: 0, group: 5) <...>-112716 [024] .... 49340.078247: sys_lchown(filename: 7f9c64029c10, user: 0, group: 24) <...>-112716 [024] .... 49340.078566: sys_lchown(filename: 7f9c640357e0, user: 0, group: 0) NetworkManager-1726 [017] .... 49340.081779: sys_chown(filename: 55c670470a60, user: 0, group: 0) <...>-112756 [020] .... 49340.227858: sys_chown(filename: 7f9c101bb6c0, user: 6b, group: 6b) <...>-112756 [020] .... 49340.228016: sys_chown(filename: 7f9c6402e230, user: 6b, group: 6b) so the soruce always seems to be setting things to qemu/qemu. On the dest however: rpc-worker-19693 [009] .... 50283.404186: sys_chown(filename: 7f014c031d50, user: 0, group: 6b) rpc-worker-19695 [003] .... 50283.408440: sys_chown(filename: 7f014c0361a0, user: 0, group: 6b) rpc-worker-19697 [009] .... 50283.413162: sys_chown(filename: 7f014c0385b0, user: 0, group: 6b) NetworkManager-1621 [005] .... 50283.458396: sys_chown(filename: 5603a223dc10, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478358: sys_lchown(filename: 7f014c040fe0, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478408: sys_lchown(filename: 7f014c040c20, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478428: sys_lchown(filename: 7f014c040c40, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478449: sys_lchown(filename: 7f014c040c60, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478474: sys_lchown(filename: 7f014c040c80, user: 0, group: 0) rpc-worker-19711 [001] .... 50283.478494: sys_lchown(filename: 7f014c040fa0, user: 0, group: 5) rpc-worker-19711 [001] .... 50283.478518: sys_lchown(filename: 7f014c040fc0, user: 0, group: 24) rpc-worker-19711 [001] .... 50283.478542: sys_lchown(filename: 7f014c040c00, user: 0, group: 0) rpc-worker-19726 [000] .... 50283.507233: sys_chown(filename: 7f014c0243e0, user: 0, group: 6b) rpc-worker-19726 [000] .... 50283.507386: sys_chown(filename: 7f014c035be0, user: 0, group: 6b) NetworkManager-1621 [007] .... 50292.565646: sys_chown(filename: 5603a2247160, user: 0, group: 0) so it looks like libvirt's rpc-worker is setting stuff to root.qemu Still, there's some NFSisms I'm not understanding, on the source after a failed migration: [root@dell-per740-03 tracing]# ls -l /mnt/nfv/rhel8.4.qcow2 ls: cannot access '/mnt/nfv/rhel8.4.qcow2': Input/output error [root@dell-per740-03 tracing]# ls -l /mnt/nfv/rhel8.4.qcow2 -rw-------. 1 root qemu 42956488704 Apr 15 12:57 /mnt/nfv/rhel8.4.qcow2 I can't see how that ls should have failed
The red flag is setting "user:0, group: 107". A normally configure libvirt will use "user:107, group: 107" when granting access, and "user:0, group: 0" when revoking access. Looking at /etc/libvirt/qemu.conf on the target host I see user="root" So, we have the target host configured to use a different acount from the source host. This is not going to work when you have disk images on shared filesystems that need to be accessible by both. I'm still not 100% sure at which point dest libvirtd is setting the ownershup to "user:0, group: 107" though - it is not supposed to touch images on shared filesystems at all for an incoming migration, as its supposed to assume the source host already has ownership correctly set. None the less removing the bogus user="root" from qemu.conf makes this work again.
(In reply to Daniel Berrangé from comment #16) > I'm still not 100% sure at which point dest libvirtd is setting the > ownershup to "user:0, group: 107" though - it is not supposed to touch > images on shared filesystems at all for an incoming migration, as its > supposed to assume the source host already has ownership correctly set. libvirt appears to set the ownership during startup of the incoming QEMU on the target. Normally this is fine, because both source + dest hosts are configured with the same UID/GID for running QEMU, and this when target QEMU starts its chown is basically a no-op. In the case of this mis-configuration though, the chown during startup of the incoming QEMU break access to the source host. The effect on the source host seems a little delayed possibly NFS client is caching the access mode for a while. Anyway it is critical that both hosts be using the same UID:GID when a shared FS is involved. Libvirt probably ought to validate this before initiating the migration. ie the target libvirt can report back to the source what UID it intends to use for QEMU, and source libvirtd can validate this, if there's any shared FS involved.
Thanks David and Daniel for the debugging and finding the root cause. I tested again, after changing the libvirt on src and des with same UID, migration works successfully. Sorry I didn't notice this difference on libvirt(src server is setup by automation, des server is setup manually, that's why I missed this libvirt configuration mismatch). (In reply to Daniel Berrangé from comment #17) [...] > > Libvirt probably ought to validate this before initiating the migration. ie > the target libvirt can report back to the source what UID it intends to use > for QEMU, and source libvirtd can validate this, if there's any shared FS > involved. Make sense. It's better Libvirt give users warning or reminding info about the UID difference. So I would move this bug to libvirt component and change the title accordingly. Feel free to correct me if anything wrong. Thanks.
Change to low priority since the test scenario is not normal
(In reply to Dr. David Alan Gilbert from comment #8) > [root@dell-per740-03 qemu]# qemu-img info /mnt/nfv/rhel8.4.qcow2 > image: /mnt/nfv/rhel8.4.qcow2 > file format: qcow2 > virtual size: 40 GiB (42949672960 bytes) > disk size: 2.4 GiB > cluster_size: 65536 > Format specific information: > compat: 1.1 > compression type: zlib > lazy refcounts: true > refcount bits: 16 > corrupt: false > > (unusual to have compression?) zlib is the default compression type. It does not mean that compression is enabled. All images will show a compression type.
(In reply to Daniel Berrangé from comment #17) > Libvirt probably ought to validate this before initiating the migration. ie > the target libvirt can report back to the source what UID it intends to use > for QEMU, and source libvirtd can validate this, if there's any shared FS > involved. And libvirt would do that if only NFS supported XATTRs. I mean, if the destination tried to set a different UID then the destination libvirtd would report an error. But I don't think we should be getting into permission check, let kernel handle that instead. The best we can do is to check whether numerical values of UID:GID match on both src and dest. And we would have to do that on the destination because the moment source initiates migration the destination starts QEMU and relabels everything.
(In reply to Michal Privoznik from comment #21) > (In reply to Daniel Berrangé from comment #17) > But I don't think we should be getting into permission check, let kernel > handle that instead. The best we can do is to check whether numerical values > of UID:GID match on both src and dest. And we would have to do that on the > destination because the moment source initiates migration the destination > starts QEMU and relabels everything. Yes, I was only suggesting comparing the numerical IDs, on the basis we want to ensure that DAC relabelling is a no-op when shared filesystems are involved. During the "Begin" phase we would need to put the UID:GID in the cookie XML, and then the "Prepare" phase on dest would validate a match.
Bulk update: Move RHEL-AV bugs to RHEL9. If necessary to resolve in RHEL8, then clone to the current RHEL8 release.
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release. Therefore, it is being closed. If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.