Description of problem: When installing CNV: v4.11.0-304, v4.11.0-306, and v4.11.0-307 HCO is failing to reconcile its state. Version-Release number of selected component (if applicable): CNV: v4.11.0-304 CNV: v4.11.0-307 How reproducible: 100 Steps to Reproduce: 1. Install CNV. 2. Check HCO. 3. More Info: HCO Condition Message: KubeVirt is degraded: An error occurred during deployment: unable to create ClusterRole &ClusterRole{ObjectMeta:{kubevirt.io:admin 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/component:compute app.kubernetes.io/managed-by:virt-operator app.kubernetes.io/part-of:hyperconverged-cluster app.kubernetes.io/version:4.11.0 kubevirt.io: rbac.authorization.k8s.io/aggregate-to-admin:true] map[kubevirt.io/generation:1 kubevirt.io/install-strategy-identifier:4436e2203a4eb79f2547251abb4dc86dd6521959 kubevirt.io/install-strategy-registry:registry.redhat.io/container-native-virtualization kubevirt.io/install-strategy-version:sha256:af751d79e58a5d37d6510670970bcd81d6a7e384dc7441b27d3c2a8307e6e0a0] [] [] []},Rules:[]PolicyRule{PolicyRule{Verbs:[get],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachineinstances/console virtualmachineinstances/vnc virtualmachineinstances/guestosinfo virtualmachineinstances/filesystemlist virtualmachineinstances/userlist],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[update],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachineinstances/pause virtualmachineinstances/unpause virtualmachineinstances/addvolume virtualmachineinstances/removevolume virtualmachineinstances/freeze virtualmachineinstances/unfreeze virtualmachineinstances/softreboot virtualmachineinstances/portforward],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[update],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachines/start virtualmachines/stop virtualmachines/restart virtualmachines/addvolume virtualmachines/removevolume],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[kubevirt.io],Resources:[virtualmachines virtualmachineinstances virtualmachineinstancepresets virtualmachineinstancereplicasets virtualmachineinstancemigrations],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[snapshot.kubevirt.io],Resources:[virtualmachinesnapshots virtualmachinesnapshotcontents virtualmachinerestores],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[flavor.kubevirt.io],Resources:[virtualmachineflavors virtualmachineclusterflavors],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[pool.kubevirt.io],Resources:[virtualmachinepools],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get list watch],APIGroups:[migrations.kubevirt.io],Resources:[migrationpolicies],ResourceNames:[],NonResourceURLs:[],},},AggregationRule:nil,}: clusterroles.rbac.authorization.k8s.io "kubevirt.io:admin" is forbidden: user "system:serviceaccount:openshift-cnv:kubevirt-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cnv" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: Logs from hco-operator pod: {"level":"error","ts":1652016328.255472,"logger":"controller_hyperconverged","msg":"Failed to update HCO Status","Request.Namespace":"openshift-cnv","Request.Name":"kubevirt-hyperconverged","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/kubevirt/hyperconverged-cluster-operator/controllers/hyperconverged.(*ReconcileHyperConverged).updateHyperConverged\n\t/remote-source/app/controllers/hyperconverged/hyperconverged_controller.go:548\ngithub.com/kubevirt/hyperconverged-cluster-operator/controllers/hyperconverged.(*ReconcileHyperConverged).Reconcile\n\t/remote-source/app/controllers/hyperconverged/hyperconverged_controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1652016328.2555847,"logger":"controller.hyperconverged-controller","msg":"Reconciler error","name":"hco-controlled-cr-bda5152f-a88e-4ebc-92e1-0ad512c4bc7b","namespace":"openshift-cnv","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
I get the same behavior on the following CBV builds: v4.11.0-309 v4.11.0-310 v4.11.0-313
bundle v4.11.0-314 correctly passed the smoke test and bundle v4.11.0-315 is also correctly deployable
We are hitting this issue both with 4.10.0->4.10.1 upgrade and fresh 4.11 install. ================== [cnv-qe-jenkins@verify411-8f8hz-executor ~]$ kubectl get csv -n openshift-cnv kubevirt-hyperconverged-operator.v4.11.0 -o json | jq ".spec.relatedImages" | grep ssp "image": "registry.redhat.io/container-native-virtualization/kubevirt-ssp-operator@sha256:9c2c478ed7ae920d4e7ea351bb393bccd77d68c3f355e7786952add9ce958c5b", "name": "registry.redhat.io/container-native-virtualization/kubevirt-ssp-operator:v4.11.0-29" "image": "registry.redhat.io/openshift4/ose-csi-livenessprobe@sha256:6578cc878df9c7339cc8207ae8cbe185361b1fcc8ed69da7ec0346bb94d4a588", "name": "registry.redhat.io/openshift4/ose-csi-livenessprobe:v4.10" [cnv-qe-jenkins@verify411-8f8hz-executor ~]$ ==================== [cnv-qe-jenkins@verify411-8f8hz-executor ~]$ kubectl get ssp ssp-kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status" { "conditions": [ { "lastHeartbeatTime": "2022-05-09T13:39:02Z", "lastTransitionTime": "2022-05-09T13:39:02Z", "message": "Reconciling SSP resources", "reason": "Available", "status": "False", "type": "Available" }, { "lastHeartbeatTime": "2022-05-09T13:39:02Z", "lastTransitionTime": "2022-05-09T13:39:02Z", "message": "Reconciling SSP resources", "reason": "Progressing", "status": "True", "type": "Progressing" }, { "lastHeartbeatTime": "2022-05-09T13:39:02Z", "lastTransitionTime": "2022-05-09T13:39:02Z", "message": "Reconciling SSP resources", "reason": "Degraded", "status": "True", "type": "Degraded" } ], "observedGeneration": 1, "observedVersion": "4.11.0", "operatorVersion": "4.11.0", "phase": "Deploying", "targetVersion": "4.11.0" } [cnv-qe-jenkins@verify411-8f8hz-executor ~]$ HCO.status.versions shows ======================== "versions": [ { "name": "operator", "version": "4.11.0" } ] ======================== It is not very clear on what is causing this. Will attach ssp and hco log.
{"level":"info","ts":1652102943.1696093,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652102947.14979,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652102950.0840895,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652102950.3263416,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} ... {"level":"info","ts":1652103640.047834,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103642.159796,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103642.3250394,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103644.900461,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103645.1145089,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103647.6483648,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652103647.7232087,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} SSP is definitively hot-looping here but the logs are not that explicative. @dholler can you please take a look?
I have tried enabling debug logging for SSP using steps: - Scale HCO deployment to 0, so it does not reconcile SSP - Modify SSP deployment by adding command line parameter "--zap-log-level debug" But after SSP restarted, it was not looping anymore. Then I tried scaling HCO back to 1, which reverted SSP to its original definition, but it is still not looping. I will create a PR to add the reason for reconciliation into SSP log.
Hello, just updating that I can see this "unable to create ClusterRole" issue again on CNV-v4.11.0-331. So far, I didn't see this issue on later builds (332, 333).
Hey, Updating that I didn't see this issue for a while. Can I close this bug?
Hello again, I'm seeing something similar: SSP state is looping between Deploying and Deployed states for a long time until the reconciliation eventually completed. some logs from the ssp-operator log: {"level":"info","ts":1652957658.8573565,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957658.8573945,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957659.195631,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-stream8-image-cron"} {"level":"info","ts":1652957659.1956854,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-stream8-image-cron"} {"level":"info","ts":1652957659.3559809,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/rhel8-image-cron"} {"level":"info","ts":1652957659.3560662,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/rhel8-image-cron"} {"level":"info","ts":1652957659.719083,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957659.7191408,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957660.2953465,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957660.2954412,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"} {"level":"info","ts":1652957661.2083428,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957661.2084458,"logger":"controllers.SSP","msg":"Starting reconciliation","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957661.3715484,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} And it loops like that, for example: [cnv-qe-jenkins@verify-411-hx422-executor mocohen]$ oc logs -n openshift-cnv ssp-operator-bb9c945f8-5hxvr | grep "CR status updated" {"level":"info","ts":1652956654.7442966,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956659.1829374,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956662.8284736,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956666.4708722,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956670.1044624,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956673.72668,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956676.98464,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652956681.7213516,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} ..... {"level":"info","ts":1652957639.081216,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957642.684177,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957645.4238188,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957648.6668222,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957652.0845282,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957654.97079,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957658.388607,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957661.2083428,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} {"level":"info","ts":1652957665.176946,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"} Until it stops. Build v4.11.0-373
Same with CNV build 4.11.0-380.
Hey, Since kubevirt/ssp-operator/pull/361 has been merged, I don't see this issue anymore. I will update here if it comes up again for some reason. Thanks for everyone.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6526