Bug 1937920

Summary: Multiple live migration can be created at once for the same VMI, they will run in parallel and make a mess
Product: Container Native Virtualization (CNV) Reporter: Jed Lejosne <jlejosne>
Component: VirtualizationAssignee: Jed Lejosne <jlejosne>
Status: CLOSED ERRATA QA Contact: Kedar Bidarkar <kbidarka>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.6.0CC: cnv-qe-bugs, kbidarka, sgott
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-07-27 14:28:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jed Lejosne 2021-03-11 18:00:37 UTC
Description of problem:
Nothing prevents a user from creating multiple live migration objects for the same VMI in a quick burst, and nothing ensures those will be handled correctly.
That will result in all of them trying to run in parallel, potentially creating race conditions.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a VMI, either directly or by starting a VM
2. Create multiple migrations for that VMI, either in a single yaml file, or in multiple ones that get created one after the other rather quickly
3.

Actual results:
*Usually* the migration that was created last will succeed, the other ones will fail and leave behind a Completed virt-launcher pod

Expected results:
Either the creation of more than one migration for one VMI is denied (could be hard to prevent all race conditions),
or all the migrations run one after the other (seems kind of pointless),
or all but one fail "gracefully", i.e. before even attempting to create a target virt-launcher pod (probably a good compromise).

Additional info:

Comment 2 sgott 2021-03-15 21:05:54 UTC
This situation sounds like it would be easy to avoid by not creating multiple migrations at once. However, it's reasonable to assume there will exist workflows that make this more likely.

The impact of this is fairly significant because if this is triggered, it can cause data corruption.

Comment 3 Shaul Garbourg 2021-03-18 13:11:44 UTC
https://github.com/kubevirt/kubevirt/pull/5242

Comment 4 Jed Lejosne 2021-03-31 17:26:31 UTC
Master PR merged.
PR backported to:
- release-0.36 (CNV 2.6.z): https://github.com/kubevirt/kubevirt/pull/5365
- release-0.34 (CNV 2.5.z): https://github.com/kubevirt/kubevirt/pull/5366

Comment 5 sgott 2021-05-19 11:43:18 UTC
To verify:

1) attempt to create 2 migrations at same time. suggest scripting this.
2) observe that second migration is rejected

Comment 6 Kedar Bidarkar 2021-05-31 16:29:20 UTC
Summary: Multiple migrations is rejected successfully, as seen below.


[kbidarka@localhost migration]$ cat migration-job2-multi1.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstanceMigration
metadata:
  creationTimestamp: null
  name: job22-multi1
  namespace: default
spec:
  vmiName: vm2-rhel84-secref
status: {}
[kbidarka@localhost migration]$ cat migration-job2-multi2.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstanceMigration
metadata:
  creationTimestamp: null
  name: job22-multi2
  namespace: default
spec:
  vmiName: vm2-rhel84-secref
status: {}
 [kbidarka@localhost migration]$ oc get vmi 
NAME                AGE   PHASE     IP             NODENAME
vm2-rhel84          45h   Running   10.xxx.y.zz   node-07.redhat.com
vm2-rhel84-secref   18m   Running   10.xxx.y.mm    node-06.redhat.com

[kbidarka@localhost migration]$ for i in migration-job2-multi1.yaml migration-job2-multi2.yaml
> do
> oc apply -f $i
> done
virtualmachineinstancemigration.kubevirt.io/job22-multi1 created
Error from server: error when creating "migration-job2-multi2.yaml": admission webhook "migration-create-validator.kubevirt.io" denied the request: in-flight migration detected. Active migration job (0d9a0dac-bd4a-4a4d-8e67-b5de565dd846) is currently already in progress for VMI vm2-rhel84-secref.

Comment 10 errata-xmlrpc 2021-07-27 14:28:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920