Bug 2082912
| Summary: | [CNV-4.11] HCO Being Unable to Reconcile State | |||
|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Mor Cohen <mocohen> | |
| Component: | SSP | Assignee: | Andrej Krejcir <akrejcir> | |
| Status: | CLOSED ERRATA | QA Contact: | Geetika Kapoor <gkapoor> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 4.11.0 | CC: | akrejcir, bkorren, cnv-qe-bugs, dbasunag, dholler, kmajcher, stirabos, ycui | |
| Target Milestone: | --- | Keywords: | AutomationBlocker, Regression, Reopened, TestBlocker | |
| Target Release: | 4.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | kubevirt-ssp-operator-container-v4.11.0-43 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2089661 (view as bug list) | Environment: | ||
| Last Closed: | 2022-09-14 19:31:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2089661 | |||
I get the same behavior on the following CBV builds: v4.11.0-309 v4.11.0-310 v4.11.0-313 bundle v4.11.0-314 correctly passed the smoke test and bundle v4.11.0-315 is also correctly deployable We are hitting this issue both with 4.10.0->4.10.1 upgrade and fresh 4.11 install.
==================
[cnv-qe-jenkins@verify411-8f8hz-executor ~]$ kubectl get csv -n openshift-cnv kubevirt-hyperconverged-operator.v4.11.0 -o json | jq ".spec.relatedImages" | grep ssp
"image": "registry.redhat.io/container-native-virtualization/kubevirt-ssp-operator@sha256:9c2c478ed7ae920d4e7ea351bb393bccd77d68c3f355e7786952add9ce958c5b",
"name": "registry.redhat.io/container-native-virtualization/kubevirt-ssp-operator:v4.11.0-29"
"image": "registry.redhat.io/openshift4/ose-csi-livenessprobe@sha256:6578cc878df9c7339cc8207ae8cbe185361b1fcc8ed69da7ec0346bb94d4a588",
"name": "registry.redhat.io/openshift4/ose-csi-livenessprobe:v4.10"
[cnv-qe-jenkins@verify411-8f8hz-executor ~]$
====================
[cnv-qe-jenkins@verify411-8f8hz-executor ~]$ kubectl get ssp ssp-kubevirt-hyperconverged -n openshift-cnv -o json | jq ".status"
{
"conditions": [
{
"lastHeartbeatTime": "2022-05-09T13:39:02Z",
"lastTransitionTime": "2022-05-09T13:39:02Z",
"message": "Reconciling SSP resources",
"reason": "Available",
"status": "False",
"type": "Available"
},
{
"lastHeartbeatTime": "2022-05-09T13:39:02Z",
"lastTransitionTime": "2022-05-09T13:39:02Z",
"message": "Reconciling SSP resources",
"reason": "Progressing",
"status": "True",
"type": "Progressing"
},
{
"lastHeartbeatTime": "2022-05-09T13:39:02Z",
"lastTransitionTime": "2022-05-09T13:39:02Z",
"message": "Reconciling SSP resources",
"reason": "Degraded",
"status": "True",
"type": "Degraded"
}
],
"observedGeneration": 1,
"observedVersion": "4.11.0",
"operatorVersion": "4.11.0",
"phase": "Deploying",
"targetVersion": "4.11.0"
}
[cnv-qe-jenkins@verify411-8f8hz-executor ~]$
HCO.status.versions shows
========================
"versions": [
{
"name": "operator",
"version": "4.11.0"
}
]
========================
It is not very clear on what is causing this.
Will attach ssp and hco log.
{"level":"info","ts":1652102943.1696093,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652102947.14979,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652102950.0840895,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652102950.3263416,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
...
{"level":"info","ts":1652103640.047834,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103642.159796,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103642.3250394,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103644.900461,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103645.1145089,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103647.6483648,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652103647.7232087,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
SSP is definitively hot-looping here but the logs are not that explicative.
@dholler can you please take a look?
I have tried enabling debug logging for SSP using steps: - Scale HCO deployment to 0, so it does not reconcile SSP - Modify SSP deployment by adding command line parameter "--zap-log-level debug" But after SSP restarted, it was not looping anymore. Then I tried scaling HCO back to 1, which reverted SSP to its original definition, but it is still not looping. I will create a PR to add the reason for reconciliation into SSP log. Hello, just updating that I can see this "unable to create ClusterRole" issue again on CNV-v4.11.0-331. So far, I didn't see this issue on later builds (332, 333). Hey, Updating that I didn't see this issue for a while. Can I close this bug? Hello again,
I'm seeing something similar:
SSP state is looping between Deploying and Deployed states for a long time until the reconciliation eventually completed.
some logs from the ssp-operator log:
{"level":"info","ts":1652957658.8573565,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957658.8573945,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957659.195631,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-stream8-image-cron"}
{"level":"info","ts":1652957659.1956854,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-stream8-image-cron"}
{"level":"info","ts":1652957659.3559809,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/rhel8-image-cron"}
{"level":"info","ts":1652957659.3560662,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/rhel8-image-cron"}
{"level":"info","ts":1652957659.719083,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957659.7191408,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957660.2953465,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957660.2954412,"logger":"controllers.SSP","msg":"Reconciliation event received","ssp":{"namespace":"openshift-cnv","name":"ssp-kubevirt-hyperconverged"},"cause_type":"DataImportCron","cause_object":"openshift-virtualization-os-images/centos-7-image-cron"}
{"level":"info","ts":1652957661.2083428,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957661.2084458,"logger":"controllers.SSP","msg":"Starting reconciliation","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957661.3715484,"logger":"controllers.SSP","msg":"Reconciling operands...","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
And it loops like that, for example:
[cnv-qe-jenkins@verify-411-hx422-executor mocohen]$ oc logs -n openshift-cnv ssp-operator-bb9c945f8-5hxvr | grep "CR status updated"
{"level":"info","ts":1652956654.7442966,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956659.1829374,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956662.8284736,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956666.4708722,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956670.1044624,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956673.72668,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956676.98464,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652956681.7213516,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
.....
{"level":"info","ts":1652957639.081216,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957642.684177,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957645.4238188,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957648.6668222,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957652.0845282,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957654.97079,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957658.388607,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957661.2083428,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
{"level":"info","ts":1652957665.176946,"logger":"controllers.SSP","msg":"CR status updated","ssp":"openshift-cnv/ssp-kubevirt-hyperconverged"}
Until it stops.
Build v4.11.0-373
Same with CNV build 4.11.0-380. Hey, Since kubevirt/ssp-operator/pull/361 has been merged, I don't see this issue anymore. I will update here if it comes up again for some reason. Thanks for everyone. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Virtualization 4.11.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6526 |
Description of problem: When installing CNV: v4.11.0-304, v4.11.0-306, and v4.11.0-307 HCO is failing to reconcile its state. Version-Release number of selected component (if applicable): CNV: v4.11.0-304 CNV: v4.11.0-307 How reproducible: 100 Steps to Reproduce: 1. Install CNV. 2. Check HCO. 3. More Info: HCO Condition Message: KubeVirt is degraded: An error occurred during deployment: unable to create ClusterRole &ClusterRole{ObjectMeta:{kubevirt.io:admin 0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[app.kubernetes.io/component:compute app.kubernetes.io/managed-by:virt-operator app.kubernetes.io/part-of:hyperconverged-cluster app.kubernetes.io/version:4.11.0 kubevirt.io: rbac.authorization.k8s.io/aggregate-to-admin:true] map[kubevirt.io/generation:1 kubevirt.io/install-strategy-identifier:4436e2203a4eb79f2547251abb4dc86dd6521959 kubevirt.io/install-strategy-registry:registry.redhat.io/container-native-virtualization kubevirt.io/install-strategy-version:sha256:af751d79e58a5d37d6510670970bcd81d6a7e384dc7441b27d3c2a8307e6e0a0] [] [] []},Rules:[]PolicyRule{PolicyRule{Verbs:[get],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachineinstances/console virtualmachineinstances/vnc virtualmachineinstances/guestosinfo virtualmachineinstances/filesystemlist virtualmachineinstances/userlist],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[update],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachineinstances/pause virtualmachineinstances/unpause virtualmachineinstances/addvolume virtualmachineinstances/removevolume virtualmachineinstances/freeze virtualmachineinstances/unfreeze virtualmachineinstances/softreboot virtualmachineinstances/portforward],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[update],APIGroups:[subresources.kubevirt.io],Resources:[virtualmachines/start virtualmachines/stop virtualmachines/restart virtualmachines/addvolume virtualmachines/removevolume],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[kubevirt.io],Resources:[virtualmachines virtualmachineinstances virtualmachineinstancepresets virtualmachineinstancereplicasets virtualmachineinstancemigrations],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[snapshot.kubevirt.io],Resources:[virtualmachinesnapshots virtualmachinesnapshotcontents virtualmachinerestores],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[flavor.kubevirt.io],Resources:[virtualmachineflavors virtualmachineclusterflavors],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get delete create update patch list watch deletecollection],APIGroups:[pool.kubevirt.io],Resources:[virtualmachinepools],ResourceNames:[],NonResourceURLs:[],},PolicyRule{Verbs:[get list watch],APIGroups:[migrations.kubevirt.io],Resources:[migrationpolicies],ResourceNames:[],NonResourceURLs:[],},},AggregationRule:nil,}: clusterroles.rbac.authorization.k8s.io "kubevirt.io:admin" is forbidden: user "system:serviceaccount:openshift-cnv:kubevirt-operator" (groups=["system:serviceaccounts" "system:serviceaccounts:openshift-cnv" "system:authenticated"]) is attempting to grant RBAC permissions not currently held: Logs from hco-operator pod: {"level":"error","ts":1652016328.255472,"logger":"controller_hyperconverged","msg":"Failed to update HCO Status","Request.Namespace":"openshift-cnv","Request.Name":"kubevirt-hyperconverged","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"github.com/kubevirt/hyperconverged-cluster-operator/controllers/hyperconverged.(*ReconcileHyperConverged).updateHyperConverged\n\t/remote-source/app/controllers/hyperconverged/hyperconverged_controller.go:548\ngithub.com/kubevirt/hyperconverged-cluster-operator/controllers/hyperconverged.(*ReconcileHyperConverged).Reconcile\n\t/remote-source/app/controllers/hyperconverged/hyperconverged_controller.go:303\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"} {"level":"error","ts":1652016328.2555847,"logger":"controller.hyperconverged-controller","msg":"Reconciler error","name":"hco-controlled-cr-bda5152f-a88e-4ebc-92e1-0ad512c4bc7b","namespace":"openshift-cnv","error":"Operation cannot be fulfilled on hyperconvergeds.hco.kubevirt.io \"kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}