Bug 1937920 - Multiple live migration can be created at once for the same VMI, they will run in parallel and make a mess
Summary: Multiple live migration can be created at once for the same VMI, they will ru...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.6.0
Hardware: All
OS: All
unspecified
high
Target Milestone: ---
: 4.8.0
Assignee: Jed Lejosne
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-11 18:00 UTC by Jed Lejosne
Modified: 2021-07-27 14:29 UTC (History)
3 users (show)

Fixed In Version: hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-27 14:28:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 5095 0 None open [WIP] migration: create MigrationState on VMI before creating the target pod 2021-03-11 18:00:37 UTC
Github kubevirt kubevirt pull 5242 0 None open migration admitter: prevent creation of duplicate migrations 2021-03-16 19:07:50 UTC
Red Hat Product Errata RHSA-2021:2920 0 None None None 2021-07-27 14:29:15 UTC

Description Jed Lejosne 2021-03-11 18:00:37 UTC
Description of problem:
Nothing prevents a user from creating multiple live migration objects for the same VMI in a quick burst, and nothing ensures those will be handled correctly.
That will result in all of them trying to run in parallel, potentially creating race conditions.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Create a VMI, either directly or by starting a VM
2. Create multiple migrations for that VMI, either in a single yaml file, or in multiple ones that get created one after the other rather quickly
3.

Actual results:
*Usually* the migration that was created last will succeed, the other ones will fail and leave behind a Completed virt-launcher pod

Expected results:
Either the creation of more than one migration for one VMI is denied (could be hard to prevent all race conditions),
or all the migrations run one after the other (seems kind of pointless),
or all but one fail "gracefully", i.e. before even attempting to create a target virt-launcher pod (probably a good compromise).

Additional info:

Comment 2 sgott 2021-03-15 21:05:54 UTC
This situation sounds like it would be easy to avoid by not creating multiple migrations at once. However, it's reasonable to assume there will exist workflows that make this more likely.

The impact of this is fairly significant because if this is triggered, it can cause data corruption.

Comment 3 Shaul Garbourg 2021-03-18 13:11:44 UTC
https://github.com/kubevirt/kubevirt/pull/5242

Comment 4 Jed Lejosne 2021-03-31 17:26:31 UTC
Master PR merged.
PR backported to:
- release-0.36 (CNV 2.6.z): https://github.com/kubevirt/kubevirt/pull/5365
- release-0.34 (CNV 2.5.z): https://github.com/kubevirt/kubevirt/pull/5366

Comment 5 sgott 2021-05-19 11:43:18 UTC
To verify:

1) attempt to create 2 migrations at same time. suggest scripting this.
2) observe that second migration is rejected

Comment 6 Kedar Bidarkar 2021-05-31 16:29:20 UTC
Summary: Multiple migrations is rejected successfully, as seen below.


[kbidarka@localhost migration]$ cat migration-job2-multi1.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstanceMigration
metadata:
  creationTimestamp: null
  name: job22-multi1
  namespace: default
spec:
  vmiName: vm2-rhel84-secref
status: {}
[kbidarka@localhost migration]$ cat migration-job2-multi2.yaml
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstanceMigration
metadata:
  creationTimestamp: null
  name: job22-multi2
  namespace: default
spec:
  vmiName: vm2-rhel84-secref
status: {}
 [kbidarka@localhost migration]$ oc get vmi 
NAME                AGE   PHASE     IP             NODENAME
vm2-rhel84          45h   Running   10.xxx.y.zz   node-07.redhat.com
vm2-rhel84-secref   18m   Running   10.xxx.y.mm    node-06.redhat.com

[kbidarka@localhost migration]$ for i in migration-job2-multi1.yaml migration-job2-multi2.yaml
> do
> oc apply -f $i
> done
virtualmachineinstancemigration.kubevirt.io/job22-multi1 created
Error from server: error when creating "migration-job2-multi2.yaml": admission webhook "migration-create-validator.kubevirt.io" denied the request: in-flight migration detected. Active migration job (0d9a0dac-bd4a-4a4d-8e67-b5de565dd846) is currently already in progress for VMI vm2-rhel84-secref.

Comment 10 errata-xmlrpc 2021-07-27 14:28:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920


Note You need to log in before you can comment on or make changes to this bug.