|Summary:||Failed / cancelled migration shows incorrect status and node in the UI|
|Product:||OpenShift Container Platform||Reporter:||Radim Hrazdil <rhrazdil>|
|Component:||Console Kubevirt Plugin||Assignee:||Yaacov Zamir <yzamir>|
|Status:||CLOSED ERRATA||QA Contact:||Nelly Credi <ncredi>|
|Version:||4.3.0||CC:||aos-bugs, cnv-qe-bugs, gouyang, tjelinek, yzamir, zpeng|
|Fixed In Version:||Doc Type:||Bug Fix|
Cause: The failed migration was not handled correctly in the UI Consequence: If the migration was cancelled or it failed, the UI showed the target node of the VM being very confusing for the user since it looked like the migration actually passed. Fix: The handeling of the cancelled / failed migration has been fixed. Result: The UI reports the actual state of the VM even in case of failed/cancelled migration
|:||1792847 1806974 (view as bug list)||Environment:|
|Last Closed:||2020-05-04 11:20:55 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1806974|
Description Radim Hrazdil 2019-12-19 18:03:04 UTC
Description of problem: When user cancels VM migration, Status of the VM is displayed as Migrating for as long as the vmim resource exists. Deletion of the vmim resource however takes a lot of time, meaning that VM stays in Migrating state for unexpectedly long time after the Virtual Machine Migration was canceled. This is not a serious issue, but it is a poor user experience for sure. Version-Release number of selected component (if applicable): OCP 4.3 CNV 2.2 How reproducible: 100% Steps to Reproduce: 1. Create a live migratable VM 2. Migrate the VM 3. Cancel the VM migration Actual results: Expected results: Additional info:
Comment 1 Tomas Jelinek 2020-01-02 10:07:49 UTC
I believe the correct way to detect that the VM is being migrated is to check the existence of the vmim. Moving to the virt to investigate the time it takes to remove the vmim
Comment 2 sgott 2020-01-06 18:38:47 UTC
Petr, can you investigate?
Comment 4 Petr Kotas 2020-01-07 14:08:19 UTC
I have investigated the code and played with the migration cancellation. It occurred almost instantly. I have also discussed with Radim. Radim reproduced the steps and could not reproduce the issue. Therefore I am for closing it and reopen if it appears once again. Radim are you okay with it?
Comment 5 Radim Hrazdil 2020-01-07 14:20:06 UTC
Since I wasn't able to reproduce the issue, I agree with closing this. We can reopen if we start hitting this in the future.
Comment 6 Radim Hrazdil 2020-01-07 15:44:48 UTC
Created attachment 1650441 [details] VMI.yaml I apologize for the mess, I need to reopen this. I have misinterpreted our tests when I thought that the VM was stuck in 'Migrating' status. The actual problem is that the VM Overview page displayed incorrect Node name. The page displays the migration target node, instead the source node the VM was originally running on. VMI contains correct Node (the original one). Interestingly, after cca 5 or 6 minutes, the VM Status is suddenly 'Stopping', without invoking any action. VMI again reads correct phase Running. Updated reproduction steps: 1. Created Cirros VM from URL 2. Started VM - make note of node the VM is running on, in my case working-9lt8v-worker-0-n7plb 3. Started Migration 4. Canceled the migration right away 5. Wait until the VM is back in Running Now at this point in time, the displayed Node in the Overview and Dashboard pages is working-9lt8v-worker-0-fvbl4, which is the designated target node of the migration. Wait a couple of minutes, 5-6 and VM Status suddenly becomes Stopping. Video sample of the whole process: http://file.rdu.redhat.com/rhrazdil/migrationcancellation-2020-01-07_16.20.59.mkv it is a bit longer, the important times are: 00:27 VM is in Running state 00:36 VM is in Migration is canceled 01:11 VM is back in Running state, with incorrectly displayed Node 02:38 VMI content 06:07 VM suddenly goes to Stopping State
Comment 7 Radim Hrazdil 2020-01-07 15:47:59 UTC
Moving back to User Interface
Comment 8 Tomas Jelinek 2020-01-15 11:41:22 UTC
*** Bug 1787551 has been marked as a duplicate of this bug. ***
Comment 12 Yaacov Zamir 2020-02-23 16:13:00 UTC
Radim hi, on my system after stoping and starting migrations I get strange things: I have one vm, one vmi, two vmim ( state schedualing ) and three pods ( one in running state ) Is my cluster broken ? ``` [yzamir@dhcp-2-187 ~]$ oc get vm NAME AGE RUNNING VOLUME example 57m true [yzamir@dhcp-2-187 ~]$ oc get vmi NAME AGE PHASE IP NODENAME example 57m Running 10.130.2.12 working-xb67k-worker-0-jbqql [yzamir@dhcp-2-187 ~]$ oc get vmim NAME AGE example-migration-b6gmk 55m example-migration-qsfmq 16m [yzamir@dhcp-2-187 ~]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-example-2v9sh 0/2 Completed 0 16m virt-launcher-example-44dm8 2/2 Running 0 57m virt-launcher-example-lfp6v 1/2 Error 0 56m ```
Comment 16 Yaacov Zamir 2020-02-27 12:06:52 UTC
moving to post - vm status is not fixed in case we have two pods of same vm.
Comment 18 Guohua Ouyang 2020-03-12 07:57:52 UTC
verified on 4.4.0-0.nightly-2020-03-06-170328, cancel VM migration, it does not take much time for VM back to running and stay at the previous node.
Comment 20 errata-xmlrpc 2020-05-04 11:20:55 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581