Bug 2127324

Summary: Vm migration fails if undefine vm with --nvram(removing nvram file) before migration
Product: Red Hat Enterprise Linux 9 Reporter: Fangge Jin <fjin>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Live Migration QA Contact: Li Xiaohui <xiaohli>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: unspecified CC: coli, dgilbert, jinzhao, juzhang, virt-maint, xuwei
Version: 9.1   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-20 12:26:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
vm xml none

Description Fangge Jin 2022-09-16 01:14:57 UTC
Created attachment 1912245 [details]
vm xml

Description of problem:
Define and start a UEFI vm, then undefine vm with --nvram, then do live migration. Migration failed. Error message:
2022-09-14 11:12:25.561+0000: initiating migration
2022-09-14T11:12:30.289346Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)
2022-09-14T11:12:30.377421Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-09-14T11:12:30.377448Z qemu-kvm: Unable to read from socket: Bad file descriptor
2022-09-14T11:12:30.377459Z qemu-kvm: Unable to read from socket: Bad file descriptor


Version-Release number of selected component:
libvirt-8.5.0-6.el9.x86_64
qemu-kvm-7.0.0-13.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Define a UEFI vm

2. Start vm

3. Undefine vm with --nvram(it means removing nvram file when undefining vm)
# virsh undefine avocado-vt-vm1 --nvram

4. Migrate vm:
# virsh migrate avocado-vt-vm1 qemu+ssh://{target_host}/system --live --p2p  --persistent --postcopy

Actual results:
Migration failed.

Expected results:
Migration can succeed.

Additional info:
1. In step3, if undefine vm with --keep-nvram, migration can succeed.

Comment 1 Li Xiaohui 2022-09-16 07:51:27 UTC
1.Boot ovmf guest on src and dst host(with '-incoming defer');
2.After guest work on src host, delete the OVMF vars file:
# rm -rf /mnt/xiaohli/rhel920-64-virtio-scsi.qcow2_VARS.fd
3.Then start migration, migration would fail:
(1) src hmp would give prompt:
(qemu) 2022-09-16T07:30:13.856017Z qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

(qemu) info migrate
globals:
store-global-state: on
only-migratable: off
send-configuration: on
send-section-footer: on
decompress-error-check: on
clear-bitmap-shift: 18
Migration status: failed
total time: 0 ms
(2) dst qemu will quit automatically with Inout/output error:
(qemu) 2022-09-16T07:30:14.136464Z qemu-kvm: warning: TSC frequency mismatch between VM (2095077 kHz) and host (3503999 kHz), and TSC scaling unavailable
2022-09-16T07:30:14.136680Z qemu-kvm: warning: TSC frequency mismatch between VM (2095077 kHz) and host (3503999 kHz), and TSC scaling unavailable
2022-09-16T07:30:14.136884Z qemu-kvm: warning: TSC frequency mismatch between VM (2095077 kHz) and host (3503999 kHz), and TSC scaling unavailable
2022-09-16T07:30:14.137085Z qemu-kvm: warning: TSC frequency mismatch between VM (2095077 kHz) and host (3503999 kHz), and TSC scaling unavailable
2022-09-16T07:30:14.137205Z qemu-kvm: load of migration failed: Input/output error


And I checked the guest after migration fail, guest still works well, can also reboot successfully.


I think migration fail is the expected result. David, what do you think?


If need some change, I think we should do from ovmf side, maybe forbit deleting ovmf vars file after guest start, or give some error prompts if we delete the ovmf vars file. 
Xueqiang, I see you're the ovmf feature owner, can you help handle this issue?

Comment 2 Xueqiang Wei 2022-09-16 10:14:59 UTC
> If need some change, I think we should do from ovmf side, maybe forbit
> deleting ovmf vars file after guest start, or give some error prompts if we
> delete the ovmf vars file. 
> Xueqiang, I see you're the ovmf feature owner, can you help handle this
> issue?


Xiaohui,

I think it's the expected result, not a bug, we should use --keep-nvram, otherwise fail to save vm state.

We can get the codes from migration/savevm.c


Error message:
src host:
(qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: bdrv_inactivate_all() failed (-1)

dst host:
(qemu) qemu-kvm: load of migration failed: Input/output error


Let's also wait for developer's feedback. Thanks.

Comment 3 Xueqiang Wei 2022-09-20 09:38:11 UTC
Confirmed with EDK2 developer, for multi-host migration, shared storage needed for disk image and efi vars.
Not needed for ovmf code. So it's the expected result, not a bug, we should use --keep-nvram.

Comment 4 Li Xiaohui 2022-09-20 12:26:57 UTC
Thank you Xueqiang.

I would close this bug as not a bug since migration failure is the expected result per Comment 3

Comment 5 Fangge Jin 2022-09-21 04:32:50 UTC
(In reply to Xueqiang Wei from comment #3)
> Confirmed with EDK2 developer, for multi-host migration, shared storage
> needed for disk image and efi vars.
> Not needed for ovmf code. So it's the expected result, not a bug, we should
> use --keep-nvram.

In my test, vm nvram files are stored on local storage, and migration can succeed:
  <os>
    <type arch='x86_64' machine='pc-q35-rhel8.6.0'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/q35_uefi_VARS.fd</nvram>
    <boot dev='hd'/>
  </os>

Comment 6 Xueqiang Wei 2022-09-21 06:26:57 UTC
(In reply to Fangge Jin from comment #5)
> (In reply to Xueqiang Wei from comment #3)
> > Confirmed with EDK2 developer, for multi-host migration, shared storage
> > needed for disk image and efi vars.
> > Not needed for ovmf code. So it's the expected result, not a bug, we should
> > use --keep-nvram.
> 
> In my test, vm nvram files are stored on local storage, and migration can
> succeed:
>   <os>
>     <type arch='x86_64' machine='pc-q35-rhel8.6.0'>hvm</type>
>     <loader readonly='yes' secure='yes'
> type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.secboot.fd</loader>
>     <nvram>/var/lib/libvirt/qemu/nvram/q35_uefi_VARS.fd</nvram>
>     <boot dev='hd'/>
>   </os>


Migrate successfully, doesn't mean it's right usage.
The vars file belongs to the VM, so that needs to be migrated over or start on shared storage.  
If it's just a simple test, there may be no problem. But some configurations will be saved in the vars file, may hit the error of save_vm_state with local storage. Please follow the developer's usage. Thanks.