Bug 2087608
| Summary: | Failure to create stable HCO CR after deleting it previously | |||
|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | SATHEESARAN <sasundar> | |
| Component: | Storage | Assignee: | Arnon Gilboa <agilboa> | |
| Status: | CLOSED MIGRATED | QA Contact: | SATHEESARAN <sasundar> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.11.0 | CC: | agilboa, akrejcir, alitke, dafrank, dholler, kbidarka, kmajcher, nunnatsa, ocohen, stirabos, ycui | |
| Target Milestone: | --- | Flags: | kmajcher:
needinfo+
|
|
| Target Release: | 4.13.7 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | kubevirt-ssp-operator-container-v4.12.0-45 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2238295 (view as bug list) | Environment: | ||
| Last Closed: | 2023-12-14 16:10:44 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2238295 | |||
Created attachment 1880759 [details]
CDI operator logs
The underlying problem is that cdi-operator doesn't remove its secrets and configmaps on CDI deletion process.
Then, on reinstallation, the previous secrets and configmaps are preventing cdi-operator to proceed due to orphan objects, e.g.
{"level":"error","ts":1652795514.7434072,"logger":"cdi-operator","msg":"error getting apiserver ca bundle","error":"ConfigMap \"cdi-apiserver-signer-bundle\" not found","stacktrace":"kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.getAPIServerCABundle\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:542\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDataImportCronValidatingWebhook\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:244\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDynamicAPIServerResources\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:57\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createResourceGroup\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:102\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createAllResources\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:88\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.CreateAllDynamicResources\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:77\nkubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).GetAllResources\n\t/remote-source/app/pkg/operator/controller/cr-manager.go:126\nkubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).CheckForOrphans\n\t/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:363\nkubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).Reconcile\n\t/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:152\nkubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).Reconcile\n\t/remote-source/app/pkg/operator/controller/controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1652795514.7443233,"logger":"cdi-operator","msg":"Orphan object exists","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","obj":{"apiVersion":"v1","kind":"Secret","namespace":"openshift-cnv","name":"cdi-uploadserver-client-signer"}}
W/A is to manually delete the CDI secrets and configmaps, to let cdi-operator complete the reconciliation.
Moving to CDI team.
Looking at CDI operator logs we see that quickly after the deletion of CDI CR,
CDI operator recreates secretes and config maps:
{"level":"info","ts":1652864475.2948112,"logger":"cdi-operator","msg":"Reconciling CDI","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652864475.2948842,"logger":"cdi-operator","msg":"Doing reconcile update","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652864476.3855922,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-insecure-registries","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864476.4569244,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.ServiceAccount"}
{"level":"info","ts":1652864476.5323896,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.RoleBinding"}
{"level":"info","ts":1652864476.5842297,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.Role"}
{"level":"info","ts":1652864476.7492573,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.ServiceAccount"}
{"level":"info","ts":1652864476.7851727,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.RoleBinding"}
{"level":"info","ts":1652864476.8056452,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.Role"}
{"level":"info","ts":1652864476.8206687,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"v1beta1.upload.cdi.kubevirt.io","type":"*v1.APIService"}
{"level":"info","ts":1652864476.927518,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"v1alpha1.upload.cdi.kubevirt.io","type":"*v1.APIService"}
{"level":"info","ts":1652864476.9706755,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-datavolume-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.0325625,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-datavolume-mutate","type":"*v1.MutatingWebhookConfiguration"}
{"level":"info","ts":1652864477.0888171,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.1155927,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"objecttransfer-api-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.1779423,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-dataimportcron-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.2343497,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.2936044,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.3253348,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-server-cert","type":"*v1.Secret"}
{"level":"info","ts":1652864477.3650944,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.3879237,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.414795,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-server-cert","type":"*v1.Secret"}
{"level":"info","ts":1652864477.4489758,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.4908574,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.5296195,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.5725944,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.6052103,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-cert","type":"*v1.Secret"}
and then it starts hot lopping on:
{"level":"info","ts":1652864508.1824028,"logger":"cdi-operator","msg":"CDI CR does not exist","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652865362.5791483,"logger":"cdi-operator","msg":"Reconciling CDI","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652865362.5931349,"logger":"cdi-operator","msg":"Orphan object exists","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","obj":{"apiVersion":"v1","kind":"Secret","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer"}}
and it's never going to progress until those objects are explicitly removed (or everything is removed removing the namespace).
Tested with the workaround suggested by Simone and Oren about removing CDI secrets and configmaps Removed CDI secrets [ ~]$ oc delete secret -n openshift-cnv cdi-apiserver-server-cert cdi-apiserver-signer cdi-uploadproxy-server-cert cdi-uploadproxy-signer cdi-uploadserver-client-cert cdi-uploadserver-client-signer cdi-uploadserver-signer Removed CDI configmaps [ ~]$ oc delete cm -n openshift-cnv cdi-apiserver-signer-bundle cdi-uploadproxy-signer-bundle cdi-uploadserver-client-signer-bundle cdi-uploadserver-signer-bundle Creating HCO CR after deleting secret and configmaps works good, and reaches stable condition SSP team, can you take a look? It seems a fix on your end might do the work here. Do you mean that SSP operator should remove the CDI Secrets and ConfigMaps? It seems unrelated to SSP. I'm not sure it's the configmap and the secret, maybe they just a side effect. I can see that SSP still trying to read dataSource and DataImportCron after CDI and its CRDs are removed. True, the ssp operator does not behave correctly when CRDs are removed. We can fix it, but I'm not sure if it will fix this bug. Removing target release due to the changed of component. Deferring this due to capacity. This bug may have been fixed by the same PR as Bug 2122236. This bug can be verified in 4.12. I've tried to verify this bug on my cluster with CNV 4.12.0-745, and it is still happening.
The problem is that after recreating HCO, the CDI CRDs are not created, and SSP is waiting for them (datasources.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io, datavolumes.cdi.kubevirt.io)
There is an error in the cdi-operator log:
{
"level": "error",
"ts": 1669714626.7532284,
"logger": "cdi-operator",
"msg": "error getting apiserver ca bundle",
"error": "ConfigMap \"cdi-apiserver-signer-bundle\" not found",
"stacktrace": "{STACKTRACE_BELOW}"
}
This is the stacktrace:
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.getAPIServerCABundle
/remote-source/app/pkg/operator/resources/cluster/apiserver.go:553
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDataVolumeMutatingWebhook
/remote-source/app/pkg/operator/resources/cluster/apiserver.go:541
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDynamicAPIServerResources
/remote-source/app/pkg/operator/resources/cluster/apiserver.go:54
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createResourceGroup
/remote-source/app/pkg/operator/resources/cluster/factory.go:102
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createAllResources
/remote-source/app/pkg/operator/resources/cluster/factory.go:88
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.CreateAllDynamicResources
/remote-source/app/pkg/operator/resources/cluster/factory.go:77
kubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).GetAllResources
/remote-source/app/pkg/operator/controller/cr-manager.go:126
kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).CheckForOrphans
/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:363
kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).Reconcile
/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:152
kubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).Reconcile
/remote-source/app/pkg/operator/controller/controller.go:236
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
*** Bug 2233413 has been marked as a duplicate of this bug. *** I see that this issue is also seen with CNV v4.14 interim builds. Do we need a separate bug to track this issue for CNV v4.14 ? @sasundar sure, please clone it to 4.14 as well. Arnon, can you please update? Neither fix attached nor verified, postponing to next z-stream. |
Description of problem: ------------------------ When the HCO CR is deleted and created back, HCO CR was not reaching a stable condition Version-Release number of selected component (if applicable): ------------------------------------------------------------- CNV 4.11 ( bundle - v4.11.0-360 ) Index Image - registry-proxy.engineering.redhat.com/rh-osbs/iib:233912 hyperconverged-cluster-operator - v4.11.0-65 OCP 4.11 nightly ( 4.11.0-0.nightly-2022-05-11-054135 ) How reproducible: ----------------- Always Steps to Reproduce: ------------------- 1. Set the 'uninstallStrategy' of HCO CR to 'RemoveWorkloads' # oc edit hco kubevirt-hyperconverged -n openshift-cnv 2. Remove the HCO CR # oc delete hco kubevirt-hyperconverged -n openshift-cnv 3. After successful deletion of 'oc delete' command, create HCO CR ( "HyperConverged") from web-console ( Installed Operators -> create HyperConverged ) Actual results: --------------- HCO CR never reaches stable condition <snip> { "lastTransitionTime": "2022-05-18T05:03:33Z", "message": "SSP is progressing: Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)", "observedGeneration": 2, "reason": "SSPProgressing", "status": "True", "type": "Progressing" }, { "lastTransitionTime": "2022-05-18T05:03:33Z", "message": "SSP is degraded: Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)", "observedGeneration": 2, "reason": "SSPDegraded", "status": "True", "type": "Degraded" }, </snip> Expected results: ----------------- HCO CR should reach stable condition after recreating the same Additional info: ----------------- SSP is stuck in deploying state [ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv NAME PHASE ssp-kubevirt-hyperconverged Deploying [ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv NAME PHASE ssp-kubevirt-hyperconverged Deploying [ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv -o json | jq '.status' { "conditions": [ { "lastHeartbeatTime": "2022-05-18T05:24:40Z", "lastTransitionTime": "2022-05-18T05:03:32Z", "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)", "reason": "Available", "status": "False", "type": "Available" }, { "lastHeartbeatTime": "2022-05-18T05:24:40Z", "lastTransitionTime": "2022-05-18T05:03:32Z", "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)", "reason": "Progressing", "status": "True", "type": "Progressing" }, { "lastHeartbeatTime": "2022-05-18T05:24:40Z", "lastTransitionTime": "2022-05-18T05:03:32Z", "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)", "reason": "Degraded", "status": "True", "type": "Degraded" } ], "observedGeneration": 1, "operatorVersion": "4.11.0", "phase": "Deploying", "targetVersion": "4.11.0" }