1895414 – Virt-operator is accepting updates to the placement of its workload components even with running VMs

Bug 1895414 - Virt-operator is accepting updates to the placement of its workload components even with running VMs

Summary: Virt-operator is accepting updates to the placement of its workload component...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	2.6.0
Assignee:	aschuett
QA Contact:	vsibirsk
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-06 16:19 UTC by Simone Tiraboschi
Modified:	2021-03-10 11:19 UTC (History)
CC List:	3 users (show)
Fixed In Version:	hco-bundle-registry-container-v2.6.0-489 virt-operator-container-v2.6.0-100
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-03-10 11:18:59 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	kubevirt kubevirt pull 4619	0	None	closed	[release-0.36] add webhook to validate kubevirt CR updates	2021-02-16 11:18:40 UTC
Red Hat Product Errata	RHSA-2021:0799	0	None	None	None	2021-03-10 11:19:51 UTC

Description Simone Tiraboschi 2020-11-06 16:19:26 UTC

Description of problem:

With running VMS:

[cloud-user@ocp-psi-executor ~]$ oc get vmi -A
NAMESPACE       NAME           AGE     PHASE        IP             NODENAME
openshift-cnv   test-satya     4h54m   Running      10.128.3.215   ginger-np-8k84k-worker-0-l4lhf
openshift-cnv   test-satya02   4h42m   Running      10.128.3.217   ginger-np-8k84k-worker-0-l4lhf
openshift-cnv   test-satya03   4h5m    Running      10.128.3.219   ginger-np-8k84k-worker-0-l4lhf
openshift-cnv   test-satya04   3h55m   Running      10.128.3.223   ginger-np-8k84k-worker-0-l4lhf
openshift-cnv   test-satya05   3h55m   Scheduling                  
openshift-cnv   test-satya06   3h55m   Scheduling

The user is allowed to successfully edit the workload stanza of HCO CR:

[cloud-user@ocp-psi-executor ~]$ oc get hyperconverged -n openshift-cnv kubevirt-hyperconverged -o json | jq '.spec.workloads.nodePlacement.nodeSelector["work-comp"]'
"work3"
[cloud-user@ocp-psi-executor ~]$ oc patch hyperconverged -n openshift-cnv kubevirt-hyperconverged --type='json' -p='[{"op": "replace", "path": "/spec/workloads/nodePlacement/nodeSelector/work-comp", "value": "work2" }]'
hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
[cloud-user@ocp-psi-executor ~]$ oc get hyperconverged -n openshift-cnv kubevirt-hyperconverged -o json | jq '.spec.workloads.nodePlacement.nodeSelector["work-comp"]'
"work2"

According to HCO validating webhook logs:

[cloud-user@ocp-psi-executor ~]$ oc logs -n openshift-cnv hco-webhook-79d78f6797-jwh5b
...
{"level":"info","ts":1604677768.0789814,"logger":"hyperconverged-resource","msg":"Validating update","name":"kubevirt-hyperconverged"}
{"level":"info","ts":1604677768.097826,"logger":"hyperconverged-resource","msg":"dry-run update the object passed","kind":"&TypeMeta{Kind:KubeVirt,APIVersion:kubevirt.io/v1alpha3,}"}
{"level":"info","ts":1604677768.1084776,"logger":"hyperconverged-resource","msg":"dry-run update the object passed","kind":"&TypeMeta{Kind:CDI,APIVersion:cdi.kubevirt.io/v1beta1,}"}


HCO tried, as designed, to validate the user update request applying it in dry-run mode first and really propagating to all the components if and only if all the componentes operators accepted the dry-run update.

And in our case all the operators accepted even if with running VMs.


Version-Release number of selected component (if applicable):
2.5.0

How reproducible:
100%

Steps to Reproduce:
1. deploy CNV
2. configure a workload placement configuration through HCO CR
3. start a VM
3. try to edit workload placement configuration through HCO CR

Actual results:
virt-operator accepts the change even when with running VMs so HCO accepts it as well and propagate it to other components.
Existing VMs are not going to be migrated but workload related components managed by other operators are.

Expected results:
virt-operator accepts updates to the placement of its workload components if and only if no VMs are running

Additional info:
we have the exactly the same bug also on CDI operator

Comment 2 aschuett 2020-11-27 11:38:42 UTC

PR Fix: https://github.com/kubevirt/kubevirt/pull/4599

Comment 3 aschuett 2020-12-17 14:39:19 UTC

Merged into release branch https://github.com/kubevirt/kubevirt/pull/4619

Comment 4 vsibirsk 2021-01-25 09:53:46 UTC

Verified on:
$ oc get csv -n openshift-cnv
NAME                                      DISPLAY                    VERSION   REPLACES                                  PHASE
kubevirt-hyperconverged-operator.v2.6.0   OpenShift Virtualization   2.6.0     kubevirt-hyperconverged-operator.v2.5.2   Succeeded

$ oc get clusterversions.config.openshift.io 
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-fc.1   True        False         13d     Cluster version is 4.7.0-fc.3

Updates to the hco workload is not accepted if there is any running VM
$ oc edit hco -n openshift-cnv
error: hyperconvergeds.hco.kubevirt.io "kubevirt-hyperconverged" could not be patched: admission webhook "validate-hco.kubevirt.io" denied the request: admission webhook "kubevirt-update-validator.kubevirt.io" denied the request: can't update placement of workload pods while there are running vms

Comment 7 errata-xmlrpc 2021-03-10 11:18:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 2.6.0 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0799

Note You need to log in before you can comment on or make changes to this bug.