Description of problem: Mutating one of the deployments virt-operator creates (such as virt-controller deployment spec) does not result in virt-operator re-reconciling that deployment spec and returning it to the expected spec. This leaves open the possibility that kubevirt related components could drift from their expected installation specs. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. mutate the virt-controller deployment by adding replica count, adding pods, or any other action 2. virt-operator does not return virt-controller deployment to the expected install version. Actual results: manual virt-controller deployment changes are not reversed by virt-operator until an upgrade of cnv occurs. Expected results: manual virt-controller deployment changes are immediately reversed by virt-operator Additional info: We need to take careful consideration in fixing this as some logic as been introduced in production environments that may depend on this broken behavior.
Deferring this for now due to the potential to break existing deployments.
(In reply to sgott from comment #2) > Deferring this for now due to the potential to break existing deployments. I filed the super naive https://issues.redhat.com/browse/CNV-6028 to track that I think we need a metric to determine how widespread such modifications are beyond the engagement we know about.
PR with fix is ready for review https://github.com/kubevirt/kubevirt/pull/4464
It was decided that we would check at-least 1 component for each of the resource-type, as there could be multiple components from each of the resource-types. 1) virt-controller: Tried updating cpu to 20m in the deployment, which was reconciled successfully. 2) virt-handler: Tried updating virt-handler command to support "--max-device=250", which was reconciled successfully. 3) Role and ClusterRole: Automation tests passed for both. (Reconciled updation to the verbs ) 5) Secrets: A new Cert-bundle was created successfully, when some random stuff was updated as certs. 6) RoleBinding and ClusterRoleBinding: Filed a separate bug here to track this https://bugzilla.redhat.com/show_bug.cgi?id=1965050 Few more resource_types are remaining will update about them here soon.
7) cfgMap: Filed a separate bug here to track this https://bugzilla.redhat.com/show_bug.cgi?id=1968410 8) PDB: updated the below pdb "minAvailable" to 2, which got reconciled successfully. ]$ oc get pdb virt-api-pdb -n openshift-cnv -o yaml spec: minAvailable: 1 9) CRD: updated the vmcrd.Spec.Names.ShortNames to include "new", which got reconciled successfully. ]$ oc get crd virtualmachines.kubevirt.io -n openshift-cnv -o yaml vmcrd.Spec.Names.ShortNames shortNames: - vm - vms 10) Service: Updated the virt-api service ports to "123" from "443", which got reconciled successfully. ]$ oc get service virt-api -n openshift-cnv -o yaml spec: ports: - port: 443 protocol: TCP targetPort: 8443 Summary: 1) Filed separate bugs for RoleBinding/ClusterRoleBinding and cfgMap resource-types, these will be tracked separately now. 2) All other resource-types got reconciled successfully. Moving this bug to VERIFIED state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2920