1868099 – virt-operator does not continually reconcile objects

Bug 1868099 - virt-operator does not continually reconcile objects

Summary: virt-operator does not continually reconcile objects

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Container Native Virtualization (CNV)
Classification:	Red Hat
Component:	Virtualization
Sub Component:
Version:	2.4.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	aschuett
QA Contact:	Kedar Bidarkar
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-11 16:58 UTC by David Vossel
Modified:	2021-07-27 14:21 UTC (History)
CC List:	8 users (show)
Fixed In Version:	hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 14:20:49 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:2920	0	None	None	None	2021-07-27 14:21:52 UTC

Description David Vossel 2020-08-11 16:58:53 UTC

Description of problem:

Mutating one of the deployments virt-operator creates (such as virt-controller deployment spec) does not result in virt-operator re-reconciling that deployment spec and returning it to the expected spec.

This leaves open the possibility that kubevirt related components could drift from their expected installation specs. 

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. mutate the virt-controller deployment by adding replica count, adding pods, or any other action
2. virt-operator does not return virt-controller deployment to the expected install version. 

Actual results:

manual virt-controller deployment changes are not reversed by virt-operator until an upgrade of cnv occurs. 

Expected results:

manual virt-controller deployment changes are immediately reversed by virt-operator

Additional info:

We need to take careful consideration in fixing this as some logic as been introduced in production environments that may depend on this broken behavior.

Comment 2 sgott 2020-08-11 20:45:56 UTC

Deferring this for now due to the potential to break existing deployments.

Comment 4 Stephen Gordon 2020-08-13 21:16:04 UTC

(In reply to sgott from comment #2)
> Deferring this for now due to the potential to break existing deployments.

I filed the super naive https://issues.redhat.com/browse/CNV-6028 to track that I think we need a metric to determine how widespread such modifications are beyond the engagement we know about.

Comment 6 aschuett 2020-11-25 13:51:24 UTC

PR with fix is ready for review https://github.com/kubevirt/kubevirt/pull/4464

Comment 11 Kedar Bidarkar 2021-06-03 19:06:42 UTC

It was decided that we would check at-least 1 component for each of the resource-type, as there could be multiple components from each of the resource-types.

1) virt-controller: Tried updating cpu to 20m in the deployment, which was reconciled successfully.

2) virt-handler: Tried updating virt-handler command to support "--max-device=250", which was reconciled successfully.

3) Role and ClusterRole: Automation tests passed for both. (Reconciled updation to the verbs )

5) Secrets: A new Cert-bundle was created successfully, when some random stuff was updated as certs.

6) RoleBinding and ClusterRoleBinding: Filed a separate bug here to track this https://bugzilla.redhat.com/show_bug.cgi?id=1965050 

Few more resource_types are remaining will update about them here soon.

Comment 12 Kedar Bidarkar 2021-06-07 13:33:37 UTC

7) cfgMap: Filed a separate bug here to track this https://bugzilla.redhat.com/show_bug.cgi?id=1968410

8) PDB: updated the below pdb "minAvailable" to 2, which got reconciled successfully.
]$ oc get pdb virt-api-pdb -n openshift-cnv -o yaml 
spec:
  minAvailable: 1

9) CRD: updated the vmcrd.Spec.Names.ShortNames to include "new",
which got reconciled successfully.
]$ oc get crd virtualmachines.kubevirt.io -n openshift-cnv -o yaml
vmcrd.Spec.Names.ShortNames

shortNames:
    - vm
    - vms

10) Service: Updated the virt-api service ports to "123" from "443", which got reconciled successfully.
]$ oc get service virt-api -n openshift-cnv -o yaml   
spec:
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8443

Summary:
1) Filed separate bugs for RoleBinding/ClusterRoleBinding and cfgMap resource-types, these will be tracked separately now.

2) All other resource-types got reconciled successfully.

Moving this bug to VERIFIED state.

Comment 15 errata-xmlrpc 2021-07-27 14:20:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920

Note You need to log in before you can comment on or make changes to this bug.