Bug 1785344
Summary: | Failed / cancelled migration shows incorrect status and node in the UI | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Radim Hrazdil <rhrazdil> | ||||
Component: | Console Kubevirt Plugin | Assignee: | Yaacov Zamir <yzamir> | ||||
Status: | CLOSED ERRATA | QA Contact: | Nelly Credi <ncredi> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.3.0 | CC: | aos-bugs, cnv-qe-bugs, gouyang, tjelinek, yzamir, zpeng | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | 4.4.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause:
The failed migration was not handled correctly in the UI
Consequence:
If the migration was cancelled or it failed, the UI showed the target node of the VM being very confusing for the user since it looked like the migration actually passed.
Fix:
The handeling of the cancelled / failed migration has been fixed.
Result:
The UI reports the actual state of the VM even in case of failed/cancelled migration
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1792847 1806974 (view as bug list) | Environment: | |||||
Last Closed: | 2020-05-04 11:20:55 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1806974 | ||||||
Bug Blocks: | 1792847 | ||||||
Attachments: |
|
Description
Radim Hrazdil
2019-12-19 18:03:04 UTC
I believe the correct way to detect that the VM is being migrated is to check the existence of the vmim. Moving to the virt to investigate the time it takes to remove the vmim Petr, can you investigate? I have investigated the code and played with the migration cancellation. It occurred almost instantly. I have also discussed with Radim. Radim reproduced the steps and could not reproduce the issue. Therefore I am for closing it and reopen if it appears once again. Radim are you okay with it? Since I wasn't able to reproduce the issue, I agree with closing this. We can reopen if we start hitting this in the future. Created attachment 1650441 [details] VMI.yaml I apologize for the mess, I need to reopen this. I have misinterpreted our tests when I thought that the VM was stuck in 'Migrating' status. The actual problem is that the VM Overview page displayed incorrect Node name. The page displays the migration target node, instead the source node the VM was originally running on. VMI contains correct Node (the original one). Interestingly, after cca 5 or 6 minutes, the VM Status is suddenly 'Stopping', without invoking any action. VMI again reads correct phase Running. Updated reproduction steps: 1. Created Cirros VM from URL 2. Started VM - make note of node the VM is running on, in my case working-9lt8v-worker-0-n7plb 3. Started Migration 4. Canceled the migration right away 5. Wait until the VM is back in Running Now at this point in time, the displayed Node in the Overview and Dashboard pages is working-9lt8v-worker-0-fvbl4, which is the designated target node of the migration. Wait a couple of minutes, 5-6 and VM Status suddenly becomes Stopping. Video sample of the whole process: http://file.rdu.redhat.com/rhrazdil/migrationcancellation-2020-01-07_16.20.59.mkv it is a bit longer, the important times are: 00:27 VM is in Running state 00:36 VM is in Migration is canceled 01:11 VM is back in Running state, with incorrectly displayed Node 02:38 VMI content 06:07 VM suddenly goes to Stopping State Moving back to User Interface *** Bug 1787551 has been marked as a duplicate of this bug. *** Radim hi, on my system after stoping and starting migrations I get strange things: I have one vm, one vmi, two vmim ( state schedualing ) and three pods ( one in running state ) Is my cluster broken ? ``` [yzamir@dhcp-2-187 ~]$ oc get vm NAME AGE RUNNING VOLUME example 57m true [yzamir@dhcp-2-187 ~]$ oc get vmi NAME AGE PHASE IP NODENAME example 57m Running 10.130.2.12 working-xb67k-worker-0-jbqql [yzamir@dhcp-2-187 ~]$ oc get vmim NAME AGE example-migration-b6gmk 55m example-migration-qsfmq 16m [yzamir@dhcp-2-187 ~]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-example-2v9sh 0/2 Completed 0 16m virt-launcher-example-44dm8 2/2 Running 0 57m virt-launcher-example-lfp6v 1/2 Error 0 56m ``` moving to post - vm status is not fixed in case we have two pods of same vm. verified on 4.4.0-0.nightly-2020-03-06-170328, cancel VM migration, it does not take much time for VM back to running and stay at the previous node. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |