Bug 1968433
Summary: | [DR] Failover / Failback HA VM Fails to be started due to 'VM XXX is being imported' | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ilan Zuckerman <izuckerm> | ||||
Component: | ovirt-engine | Assignee: | Arik <ahadas> | ||||
Status: | CLOSED ERRATA | QA Contact: | sshmulev | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.4.6 | CC: | ahadas, bugs, emarcus, gveitmic, mavital, michal.skrivanek, mkalinin, sfishbai | ||||
Target Milestone: | ovirt-4.5.3 | Keywords: | PrioBumpQA, ZStream | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-engine-4.5.3.1 | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, attempts to start highly available virtual machines during failover or failback flows sometimes failed with an error "Cannot run VM. VM X is being imported", resulting in the virtual machines staying down.
In this release, virtual machines are no longer started by the disaster-recovery scripts while being imported.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2022-11-16 12:17:27 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
*** Bug 1974535 has been marked as a duplicate of this bug. *** Raising to High severity as VM not running after DR failover/back is not a medium severity bug and seems like a basic DR feature expectation. Although the issue does not seem like a regression from recent 4.4 builds at least starting HA VM is still a basic requirement that should be fixed and get more attention. Changing to d/w, since it has a customer ticket attached. This should be solved by the fix to bz 2074112 - import vm from configuration is now synchronous so we would not get to RunVm while the VM is locked by the ImportVmFromConfiguration command (In reply to Arik from comment #8) > This should be solved by the fix to bz 2074112 - import vm from > configuration is now synchronous so we would not get to RunVm while the VM > is locked by the ImportVmFromConfiguration command The above is true for import-vm [1] but not for register-vm [2] which is used by those scripts.. We can change [2] in a similar way but it would be better if the client would set the operation as async=False instead only for VMs that it is going to run.. [1] https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.5.1/backend/manager/modules/restapi/jaxrs/src/main/java/org/ovirt/engine/api/restapi/resource/BackendVmsResource.java#L402-L407 [2] https://github.com/oVirt/ovirt-engine/blob/ovirt-engine-4.5.1/backend/manager/modules/restapi/jaxrs/src/main/java/org/ovirt/engine/api/restapi/resource/BackendStorageDomainVmResource.java#L120 This bug has low overall severity and is not going to be further verified by QE. If you believe special care is required, feel free to properly align relevant severity, flags and keywords to raise PM_Score or use one of the Bumps ('PrioBumpField', 'PrioBumpGSS', 'PrioBumpPM', 'PrioBumpQA') in Keywords to raise it's PM_Score above verification threashold (1000). Verified. VM was up both in failover and fallback in the same flow. Versions: RHV 4.5.3-3 ovirt-engine-4.5.3.1-2 vdsm-4.50.3.4-1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.3] bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8502 |
Created attachment 1789207 [details] ovirt dr + engine logs Description of problem: When trying to invoke Failover/Failback flows of the DR, with having a running HA VM on the 'primary' site (if failback) or on 'secondary' site (if failover), the VM fails to be started [1] (from ovirt-dr log). The 'failover' flow seems to be completed successfully, but the VM is in 'stopped' state on the 'secondary' site. The ERROR occurs during the 'TASK [redhat.rhv.disaster_recovery : Run VMs]' ansible task. The flow below describes the issue as it was seen with 'failover', but the same kind of behavior can be observed with 'failback' as well. DR schema: Master - as a driver of the DR scripts - storage-ge-13 Primary - an env where the disaster occurs - storage-ge-15 Secondary - an env where the assets are being migrated to - storage-ge-16 Envs state prior the testing: 1. Primary site containing: Active data center Active cluster Active hosts One active and attached nfs storage domain One template No VMs 2. Secondary site containing: Active data center Active cluster Active hosts NO VMs, templates or attached storage domains [1]: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Cannot run VM. VM test is being imported.]". HTTP response code is 409. Version-Release number of selected component (if applicable): rhv-release-4.4.6-9-001.noarch How reproducible: 100% Steps to Reproduce: - Create HA VM on the 'master' site + start it - generate the mappings file with ./ovirt-dr generate - Update OVF store for the storage domain on master - Mount primary and secondary Storages: mount -t nfs mantis-nfs-xxx.com:/nas01/ge_storage16_nfs_1 /mnt/secondary/ge_storage16_nfs_1 mount -t nfs mantis-nfs-xxx.com:/nas01/ge_storage15_nfs_1 /mnt/primary/ge_storage15_nfs_1/ - Make sure that the secondary mount point is empty: Rm -rf /mnt/secondary/ge_storage16_nfs_1/* - To create a replica, rsync primary storage content to a secondary one: [root@storage-ge-13 files]# rsync -azvh /mnt/primary/ge_storage15_nfs_1/* /mnt/secondary/ge_storage16_nfs_1 - change ownership of the replicated storage mount folder + contents: chown -R vdsm:kvm /mnt/secondary/ge_storage16_nfs_1/ - Run failover ./ovirt-dr failover Actual results: The secondary storage domain is attached and active The template was imported The VM was imported BUT NOT running Expected results: Wait for failover to finish and verify that: The secondary storage domain is attached and active The template was imported The VM was imported and running Additional info: Attaching ovirt-dr log and engine log of the 'secondary' site (where the VM should be up) and vdsm