Bug 2058167
| Summary: | Post deploy on a baremetal cluster SSP is looping attempting to reconcile | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Debarati Basu-Nag <dbasunag> | ||||
| Component: | SSP | Assignee: | Andrej Krejcir <akrejcir> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Geetika Kapoor <gkapoor> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 4.10.0 | CC: | agilboa, akrejcir, cnv-qe-bugs, dholler, rkishner, stirabos, ycui | ||||
| Target Milestone: | --- | Keywords: | Regression | ||||
| Target Release: | 4.10.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | kubevirt-ssp-operator-container-v4.10.0-50, hco-bundle-registry-4.10.0-696 | Doc Type: | If docs needed, set a value | ||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2022-03-16 16:07:15 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
Moving to storage, as triage by Oren, indicated this is a CDI issue. Moved to SSP after having a debug session with @akrejcir I reproduced this on my development cluster. It is not related to bare metal. The problem is that SSP and CDI modify the same labels in a loop. This is the update done by SSP: @ ["metadata","labels","app.kubernetes.io/component"] - "storage" + "templating" @ ["metadata","labels","app.kubernetes.io/managed-by"] - "cdi-controller" + "ssp-operator" And CID reverts it back. I will post a PR to SSP, to break the loop. Verified on kubevirt-ssp-operator-container-v4.10.0-50 Note: The fix mention the labels are being set now when auto-update is disabled, this could mean the bug will again when auto-update is disabled, but I didn't manage to reproduce it even then so cant say for sure Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Virtualization 4.10.0 Images security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0947 |
Created attachment 1863175 [details] ssp log Description of problem: Post deployment of BM cluster bm02-cnvqe2-rdu2, noticed that HCO is in degraded state, due to SSP not being available. From the ssp operator log, it looks like it is continuously attempting to reconcile and failing Version-Release number of selected component (if applicable): 4.10.0 - 686 How reproducible: Not sure. Steps to Reproduce: 1.Not sure. 2. 3. Actual results: HCO Status condition: ==================== { "lastTransitionTime": "2022-02-23T16:59:54Z", "message": "Reconcile completed successfully", "observedGeneration": 73, "reason": "ReconcileCompleted", "status": "True", "type": "ReconcileComplete" }, { "lastTransitionTime": "2022-02-24T00:24:04Z", "message": "SSP is not available: Reconciling SSP resources", "observedGeneration": 73, "reason": "SSPNotAvailable", "status": "False", "type": "Available" }, { "lastTransitionTime": "2022-02-24T00:24:04Z", "message": "SSP is progressing: Reconciling SSP resources", "observedGeneration": 73, "reason": "SSPProgressing", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2022-02-24T00:46:30Z", "message": "SSP is degraded: Reconciling SSP resources", "observedGeneration": 73, "reason": "SSPDegraded", "status": "True", "type": "Degraded" }, { "lastTransitionTime": "2022-02-24T00:24:04Z", "message": "SSP is progressing: Reconciling SSP resources", "observedGeneration": 73, "reason": "SSPProgressing", "status": "False", "type": "Upgradeable" } =================== SSP status: =================== { "conditions": [ { "lastHeartbeatTime": "2022-02-24T00:49:32Z", "lastTransitionTime": "2022-02-24T00:49:32Z", "message": "Reconciling SSP resources", "reason": "Available", "status": "False", "type": "Available" }, { "lastHeartbeatTime": "2022-02-24T00:49:32Z", "lastTransitionTime": "2022-02-24T00:49:32Z", "message": "Reconciling SSP resources", "reason": "Progressing", "status": "True", "type": "Progressing" }, { "lastHeartbeatTime": "2022-02-24T00:49:32Z", "lastTransitionTime": "2022-02-24T00:49:32Z", "message": "Reconciling SSP resources", "reason": "Degraded", "status": "True", "type": "Degraded" } ], "observedGeneration": 6, "observedVersion": "4.10.0", "operatorVersion": "4.10.0", "phase": "Deploying", "targetVersion": "4.10.0" } From SSP operator log, this message showing up again and again: ================= {"level":"error","ts":1645655734.9615152,"logger":"controller-runtime.manager.controller.ssp","msg":"Reconciler error","reconciler group":"ssp.kubevirt.io","reconciler kind":"SSP","name":"ssp-kubevirt-hyperconverged","namespace":"openshift-cnv","error":"Operation cannot be fulfilled on ssps.ssp.kubevirt.io \"ssp-kubevirt-hyperconverged\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"} ================ Attached is hco operator log and ssp operator log Expected results: Additional info: