Description of problem: Operator patches the TV deployment and NL daemonset using merge patch or json patch, the operator seems to not be able to remove or edit the placement fields if they exist. For example: A CR is created with placement API - works as expected A CR is created without placement API - works as expected A CR is created without placement API and then updated to have placement API - works as expected A CR with placement API is being updated to remove or update the placement fields - the operator cannot seem to do it
After editing HCO, nodeSelector parameters work fine. When attempting to reverse changes, the following error is received: error: hyperconvergeds.hco.kubevirt.io "kubevirt-hyperconverged" could not be patched: Internal error occurred: failed calling webhook "validate-hco.kubevirt.io": Post "https://hco-operator-service.openshift-cnv.svc:4343/validate-hco-kubevirt-io-v1beta1-hyperconverged?timeout=30s": no endpoints available for service "hco-operator-service" You can run `oc replace -f /tmp/oc-edit-ikelt.yaml` to try this update again. The only way I have found to fix this is to redeploy CNV.
(In reply to Or Bairey-Sehayek from comment #1) > After editing HCO, nodeSelector parameters work fine. > > When attempting to reverse changes, the following error is received: > > error: hyperconvergeds.hco.kubevirt.io "kubevirt-hyperconverged" could not > be patched: Internal error occurred: failed calling webhook > "validate-hco.kubevirt.io": Post > "https://hco-operator-service.openshift-cnv.svc:4343/validate-hco-kubevirt- > io-v1beta1-hyperconverged?timeout=30s": no endpoints available for service > "hco-operator-service" > You can run `oc replace -f /tmp/oc-edit-ikelt.yaml` to try this update again. > > The only way I have found to fix this is to redeploy CNV. Changes made: $ oc -n openshift-cnv edit hyperconverged BEFORE: spec: infra: {} workloads: {} AFTER: spec: infra: nodePlacement: nodeSelector: foo: bar workloads: nodePlacement: nodeSelector: foo: bar It should be noted that I tried doing this twice: once with no nodes labelled foo=bar and once with a worker labelled foo=bar. The result was the same: attempting to revert the change caused the error above. Opened a separate bug about this https://bugzilla.redhat.com/show_bug.cgi?id=1889401
Checking for Infra/template-validator --------------------------------------- [kbidarka@localhost ocp-python-wrapper]$ oc get hyperconverged kubevirt-hyperconverged -n openshift-cnv -o yaml apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"hco.kubevirt.io/v1beta1","kind":"HyperConverged","metadata":{"annotations":{},"name":"kubevirt-hyperconverged","namespace":"openshift-cnv"},"spec":{"bareMetalPlatform":true}} finalizers: ... name: kubevirt-hyperconverged namespace: openshift-cnv spec: bareMetalPlatform: true infra: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeType operator: In values: - infra version: v2.5.0 workloads: {} --- [kbidarka@localhost ocp-python-wrapper]$ oc get pods -n openshift-cnv | grep validator virt-template-validator-6876b65456-z6g5r 0/1 Pending 0 54s virt-template-validator-69df488767-cg9cv 1/1 Running 0 8m36s virt-template-validator-69df488767-wxn8h 1/1 Running 0 8m40s [kbidarka@localhost ocp-python-wrapper]$ oc get deployment -n openshift-cnv virt-template-validator -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: ... spec: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeType operator: In values: - infra After reverting: ----------------- [kbidarka@localhost ocp-python-wrapper]$ oc get pods -n openshift-cnv | grep validator virt-template-validator-75f6c79bdf-9pdsk 1/1 Running 0 28s virt-template-validator-75f6c79bdf-zw4ph 1/1 Running 0 24s -------------------------------------------------------------------------------------------------------------------------------------------------- Checking for Workloads/node-labeller ------------------------------------ [kbidarka@localhost ocp-python-wrapper]$ oc get hyperconverged kubevirt-hyperconverged -n openshift-cnv -o yaml apiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"hco.kubevirt.io/v1beta1","kind":"HyperConverged","metadata":{"annotations":{},"name":"kubevirt-hyperconverged","namespace":"openshift-cnv"},"spec":{"bareMetalPlatform":true}} finalizers: ... name: kubevirt-hyperconverged namespace: openshift-cnv spec: bareMetalPlatform: true infra: {} version: v2.5.0 workloads: nodePlacement: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nodeType operator: In values: - infra [kbidarka@localhost ocp-python-wrapper]$ oc get pods -n openshift-cnv | grep labeller [kbidarka@localhost ocp-python-wrapper]$ After reverting the hyperconverged CR ---------------------------------------- [kbidarka@localhost ocp-python-wrapper]$ oc get pods -n openshift-cnv | grep labeller kubevirt-node-labeller-c4xth 1/1 Running 0 23s kubevirt-node-labeller-mssgf 1/1 Running 0 23s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.5.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:5127