Description of problem: When user cancels VM migration, Status of the VM is displayed as Migrating for as long as the vmim resource exists. Deletion of the vmim resource however takes a lot of time, meaning that VM stays in Migrating state for unexpectedly long time after the Virtual Machine Migration was canceled. This is not a serious issue, but it is a poor user experience for sure. Version-Release number of selected component (if applicable): OCP 4.3 CNV 2.2 How reproducible: 100% Steps to Reproduce: 1. Create a live migratable VM 2. Migrate the VM 3. Cancel the VM migration Actual results: Expected results: Additional info:
I believe the correct way to detect that the VM is being migrated is to check the existence of the vmim. Moving to the virt to investigate the time it takes to remove the vmim
Petr, can you investigate?
I have investigated the code and played with the migration cancellation. It occurred almost instantly. I have also discussed with Radim. Radim reproduced the steps and could not reproduce the issue. Therefore I am for closing it and reopen if it appears once again. Radim are you okay with it?
Since I wasn't able to reproduce the issue, I agree with closing this. We can reopen if we start hitting this in the future.
Created attachment 1650441 [details] VMI.yaml I apologize for the mess, I need to reopen this. I have misinterpreted our tests when I thought that the VM was stuck in 'Migrating' status. The actual problem is that the VM Overview page displayed incorrect Node name. The page displays the migration target node, instead the source node the VM was originally running on. VMI contains correct Node (the original one). Interestingly, after cca 5 or 6 minutes, the VM Status is suddenly 'Stopping', without invoking any action. VMI again reads correct phase Running. Updated reproduction steps: 1. Created Cirros VM from URL 2. Started VM - make note of node the VM is running on, in my case working-9lt8v-worker-0-n7plb 3. Started Migration 4. Canceled the migration right away 5. Wait until the VM is back in Running Now at this point in time, the displayed Node in the Overview and Dashboard pages is working-9lt8v-worker-0-fvbl4, which is the designated target node of the migration. Wait a couple of minutes, 5-6 and VM Status suddenly becomes Stopping. Video sample of the whole process: http://file.rdu.redhat.com/rhrazdil/migrationcancellation-2020-01-07_16.20.59.mkv it is a bit longer, the important times are: 00:27 VM is in Running state 00:36 VM is in Migration is canceled 01:11 VM is back in Running state, with incorrectly displayed Node 02:38 VMI content 06:07 VM suddenly goes to Stopping State
Moving back to User Interface
*** Bug 1787551 has been marked as a duplicate of this bug. ***
Radim hi, on my system after stoping and starting migrations I get strange things: I have one vm, one vmi, two vmim ( state schedualing ) and three pods ( one in running state ) Is my cluster broken ? ``` [yzamir@dhcp-2-187 ~]$ oc get vm NAME AGE RUNNING VOLUME example 57m true [yzamir@dhcp-2-187 ~]$ oc get vmi NAME AGE PHASE IP NODENAME example 57m Running 10.130.2.12 working-xb67k-worker-0-jbqql [yzamir@dhcp-2-187 ~]$ oc get vmim NAME AGE example-migration-b6gmk 55m example-migration-qsfmq 16m [yzamir@dhcp-2-187 ~]$ oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-example-2v9sh 0/2 Completed 0 16m virt-launcher-example-44dm8 2/2 Running 0 57m virt-launcher-example-lfp6v 1/2 Error 0 56m ```
moving to post - vm status is not fixed in case we have two pods of same vm.
verified on 4.4.0-0.nightly-2020-03-06-170328, cancel VM migration, it does not take much time for VM back to running and stay at the previous node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581