Description of problem: Restoring VM failed with the err "Cannot open log file: '/var/log/libvirt/qemu/77.log': Device or resource busy" Version-Release number of selected component (if applicable): 2 hosts libvirt-4.5.0-18.el7.x86_64 qemu-kvm-rhev-2.12.0-30.el7.x86_64 kernel-3.10.0-1052.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. Mount NFS for 2 hosts and check # mount NFSserver:/home/nfs_data_intel on /tmp 2. Start VM in the 1st host; and then save the VM # virsh start 7701 Domain 7701 started # virsh list --all Id Name State ---------------------------------------------------- 42 7701 running # virsh save 7701 7701.save Domain 7701 saved to 7701.save 3. Scp the save_file to the 2rd host # scp 7701.save root.lab.eng.nay.redhat.com:/root 4. Start the VM in the 1st host again # virsh start 7701 Domain 7701 started # virsh list --all Id Name State ---------------------------------------------------- 43 7701 running 5. Restore VM from the save_file in the 2rd host # virsh restore 7701.save error: Failed to restore domain from 7701.save error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock ##### Thre is no VM list by virsh # virsh list --all No 7701 here ##### There is qemu progress # ps -ef |grep 7701 qemu 18839 1 7 05:56 ? 00:00:00 /usr/libexec/qemu-kvm -name guest=7701,debug-threads=on 6. Restore VM from the save_file in the 2rd host again # virsh restore 77.save error: Failed to restore domain from 77.save error: Cannot open log file: '/var/log/libvirt/qemu/77.log': Device or resource busy Actual results: As step-5. step-6 shows Expected results: While there is qemu process of VM, virsh list should also be able to output the status of VM> Additional info: Opereate the steps above in same host; the following 2 VMs use same img. 1. Start the first VM and save it # virsh start 7701 Domain 7701 started # virsh list --all Id Name State ---------------------------------------------------- 42 7701 running - 7702 shut off # virsh save 7701 7701.save Domain 7701 saved to 7701.save # virsh list --all Id Name State ---------------------------------------------------- - 7701 shut off - 7702 shut off 2. Start the 2rd VM # virsh start 7702 Domain 7702 started # virsh list --all Id Name State ---------------------------------------------------- 43 7702 running 44 7701 shut off 3. Restore the 1st VM and check status # virsh restore 7701.save error: Failed to restore domain from 7701.save error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock # virsh list --all Id Name State ---------------------------------------------------- 43 7702 running 44 7701 paused # ps -ef |grep 7701 qemu 31264 1 0 05:58 ? 00:00:00 /usr/libexec/qemu-kvm -name guest=7701
Just for the record, an error is expected here because the disk on the NFS mount is still in use. Therefore, restore MUST fail. However, it must fail with a different error and certainly, libvirt can't leave qemu process behind. This is why you see "Device or resource busy" error - there is already a qemu process running that libvirt doesn't know of (well, forgot about) - after the first attempt to restore qemu was started but then left behind (with still opened log file). Therefore, the second attempt results in EBUSY error. Anyway, patches posted upstream: https://www.redhat.com/archives/libvir-list/2020-January/msg00584.html
I've merged patches upstream: 4c581527d4 qemu: Stop domain on failed restore 3203ad6cfd qemu: Use g_autoptr() for qemuDomainSaveCookie 82e127e343 qemuDomainSaveImageStartVM: Use g_autoptr() for virCommand 1c16f261d0 qemuDomainSaveImageStartVM: Use VIR_AUTOCLOSE for @intermediatefd
To POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2020-January/msg00224.html
Verify this bug on: libvirt-daemon-6.0.0-4.module+el8.2.0+5642+838f3513.x86_64 qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 Steps: S1. Mount NFS on 2 hosts 1.Save running guest with nfs image on [hosta]: # virsh start vm2-yqz Domain vm2-yqz started # virsh save vm2-yqz vm2-yqz.save Domain vm2-yqz saved to vm2-yqz.save 2. scp saved file to [hostb]: # scp lenovo-***:/root/vm2-yqz.save . 3. Start guest on [hosta] again: # virsh start vm2-yqz Domain vm2-yqz started 4. Try to restore guest from the saved file on [hostb]: # virsh restore vm2-yqz.save error: Failed to restore domain from vm2-yqz.save error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock # ps aux|grep qemu|grep vm2-yqz (nothing output, no qemu process running after restore failed) # virsh list --all|grep vm2-yqz (nothing output) Try 4 more times again, still get same result: # virsh restore vm2-yqz.save error: Failed to restore domain from vm2-yqz.save error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock # ps aux|grep qemu|grep vm2-yqz # virsh list --all|grep vm2-yqz Check qemu log: 2020-02-06 08:15:26.386+0000: starting up libvirt version: 6.0.0... ... 2020-02-06T08:15:27.346353Z qemu-kvm: Failed to get "write" lock Is another process using the image [/s3-qe-team/yanqzhan/RHEL-8.2.0-20191219.0-x86_64.qcow2]? 2020-02-06 08:15:27.358+0000: shutting down, reason=failed 2020-02-06T08:15:27.359107Z qemu-kvm: terminating on signal 15 from pid 28129 (<unknown process>) # ps aux|grep 28129 root 28129 0.0 0.7 2279660 59084 ? Ssl 01:44 0:01 /usr/sbin/libvirtd --timeout 120 S2. On same host, start another guest using same image, then try to restore 1st guest: 1. # virsh start avocado-vt-vm1 Domain avocado-vt-vm1 started 2. # virsh restore vm2-yqz.save error: Failed to restore domain from vm2-yqz.save error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock # ps aux|grep qemu|grep vm2-yqz (nothing output) # virsh list --all|grep vm2-yqz - vm2-yqz shut off Try 4 more times again, still get same result. S3. Additional try for managedsave+start situation: # virsh managedsave avocado-vt-vm1 Domain avocado-vt-vm1 state saved by libvirt # virsh start vm2-yqz Domain vm2-yqz started # virsh start avocado-vt-vm1 error: Failed to start domain avocado-vt-vm1 error: internal error: unable to execute QEMU command 'cont': Failed to get "write" lock # ps aux|grep qemu|grep avocado-vt-vm1 # virsh list --all|grep avocado-vt-vm1 - avocado-vt-vm1 shut off No qemu process left(killed by libvirtd) after restore failed. Since the result is as expected, mark bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017