Bug 2010000
| Summary: | Live storage migration breaks Windows (UEFI) vitrual machine | ||
|---|---|---|---|
| Product: | [oVirt] ovirt-engine | Reporter: | Patrick <patrick.lomakin> |
| Component: | BLL.Storage | Assignee: | Arik <ahadas> |
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Avihai <aefrat> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4.8.6 | CC: | ahadas, bugs |
| Target Milestone: | --- | Flags: | ahadas:
needinfo?
|
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-02 17:37:12 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Patrick
2021-10-02 19:48:52 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. Eyal, as it happens after live storage migration, why do we suspect it's a Virt issue? (In reply to Arik from comment #2) > Eyal, as it happens after live storage migration, why do we suspect it's a > Virt issue? I saw Sandro put it on virt so I assumed you talked on that bug (just set the assignee to you so you will not miss it). But if you are asking, I suspect it might related to the new snapshot flow added by virt team, and it happens only in running windows VM which is virt area. (In reply to Eyal Shenitzky from comment #3) > (In reply to Arik from comment #2) > > Eyal, as it happens after live storage migration, why do we suspect it's a > > Virt issue? > > I saw Sandro put it on virt so I assumed you talked on that bug (just set > the assignee to you so you will not miss it). I see > > But if you are asking, I suspect it might related to the new snapshot flow > added by virt team Can you elaborate on that? I don't recall introducing a new snapshot flow > and it happens only in running windows VM which is virt > area. Because it is a running Windows VM? so if incremental backup would fail for a Windows VM then the virt team would need to handle that as well? :) OK, I'm switching it to the storage team as the live storage migration flow is the primary suspect here. That doesn't mean it's a RHV/storage issue though, it could also be a platform issue. And I'll try to assist Patrick, can you please provide more information: 1. What is the configuration (preferably in the form of a domain XML) of the VM? we'll be particularly interested in the disk properties (their type, whether they had snapshots - such things) 2. Does it happen only with Windows+UEFI? did you try with the same type of disks and a different operating system? 3. You wrote that the OS gets into recovery mode - we have a similar bug in which the guest is not able to boot from a copied disk (bz 1983638) but you also mention that the VM crashed. Do you mean the qemu process crashed during the live storage migration? 4. You wrote that "all attempts to restore the machine to work were unsuccessful" - so the VM is no longer able to boot from this disk at all? even after hard reset? Arik, more information: 1. First of all, I use iscsi for storage. The manager has added a storage domain on several paths. A preallocated 100G disk with a "boot" flag has been added to the machine. Next I deployed a Windows Server machine in UEFI mode. Firmware type is "Q35 Chipset with UEFI". No snapshots were created. 2. I also have virtual machines with CentOS that in the same disk configuration and "Q35 chipset with UEFI" migrated successfully and started successfully after reboot. I have not tried migrating machines with different chipset firmware. 3. The QEMU process was not affected. The problem is specifically with the OS. Here is what I tried: I booted via live windows iso and went into the command line. After that I installed the VirtIO-SCSI driver and through the diskpart utility I mounted the system disk and the recovery disk. After mounting, I disabled the auto recovery mode of Windows and could see the error message - "The operating system couldn't be loaded because the digital signature of a file couldn't be verified. Error code: 0xc0000428". Unfortunately, the error does not specify which file failed the verification. An attempt to disable the signature check is also unsuccessful. 4. Yes, I lost some machines, but I was able to copy the files from them. The problem occurs specifically with Windows and only after a live migration. If you migrate just the VM drive and the VM is turned off, everything is successful and Windows will start without problems Please note that the template created from a working virtual machine with Windows Server 2019 also does not work no matter if thin disk or thick disk is used. The clone of the working virtual machine also does not work. After startup, the virtual machine shows that there is no disk. The "bootable" flag is set on the virtual machine disk. Thanks for the quick follow-up. I believe some of what you wrote in comment 7 describes different issues - and it would be hard to help with those unless you provide us with the vm configuration (for the case the disk is not recognized) and details why you say the template doesn't work. So let's concentrate on the case we have more information on which is that the VM enters recovery state after live storage migration (for clarity, let's not use the term live migration that can refer to both live storage migration, of the disk, or live virtual machine migration). Is it correct that you're using UEFI without secure boot in oVirt (as you previously wrote) and UEFI is set with secure boot within the VM? (you can check that in the UEFI setting - the menu you see when pressing ESC during the intial phase of the boot process). In that case, I think that a simple workaround for you would be to cancel secure boot in the UEFI settings. But anyway, I wonder if the different signature is really detected because the active volume changes or due to a different reason. I'd suggest checking if it also happens with SATA disks (and if not, what version of virtio drivers you use). It would also be interesting to check if it happens when you take a working VM, shut it down, create a snapshot for it and start it on the same host. I'm sorry for the long response. I have the protected download turned off. As I noticed the problem occurs only when migrating an enabled virtual machine between storage domains with parameters - Q35 (UEFI). If the virtual machine is shut down and just move its disk to another storage domain, it starts up without any problems. (In reply to Patrick from comment #9) > I'm sorry for the long response. I have the protected download turned off. > As I noticed the problem occurs only when migrating an enabled virtual > machine between storage domains with parameters - Q35 (UEFI). What do you mean by "protected download"? Does "enabled virtual machine" means a virtual machine with that "protected download" option enabled? > If the virtual > machine is shut down and just move its disk to another storage domain, it > starts up without any problems. Sure, if the virtual machine is turned off and it managed to boot, moving its disks around shouldn't matter Arik, I'm sorry, it's a translation mistake. It meant "Secure Boot". What version of virtio-drivers did you use? I wasn't able to reproduce it with Win10+UEFI Please reopen if it still happens with relevant (engine, vdsm) logs and the version of virtio-drivers that were installed |