This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2087608 - Failure to create stable HCO CR after deleting it previously
Summary: Failure to create stable HCO CR after deleting it previously
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Storage
Version: 4.11.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.13.7
Assignee: Arnon Gilboa
QA Contact: SATHEESARAN
URL:
Whiteboard:
: 2233413 (view as bug list)
Depends On:
Blocks: 2238295
TreeView+ depends on / blocked
 
Reported: 2022-05-18 05:47 UTC by SATHEESARAN
Modified: 2023-12-14 16:10 UTC (History)
11 users (show)

Fixed In Version: kubevirt-ssp-operator-container-v4.12.0-45
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2238295 (view as bug list)
Environment:
Last Closed: 2023-12-14 16:10:44 UTC
Target Upstream Version:
Embargoed:
kmajcher: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2122236 0 high CLOSED Failing to delete HCO with SSP sticking around 2023-01-24 13:50:28 UTC
Red Hat Issue Tracker   CNV-18397 0 None None None 2023-12-14 16:10:44 UTC

Description SATHEESARAN 2022-05-18 05:47:02 UTC
Description of problem:
------------------------
When the HCO CR is deleted and created back, HCO CR was not reaching a stable condition

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
CNV 4.11 ( bundle - v4.11.0-360 )
Index Image - registry-proxy.engineering.redhat.com/rh-osbs/iib:233912
hyperconverged-cluster-operator - v4.11.0-65
OCP 4.11 nightly ( 4.11.0-0.nightly-2022-05-11-054135 )

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Set the 'uninstallStrategy' of HCO CR to 'RemoveWorkloads'
# oc edit hco kubevirt-hyperconverged -n openshift-cnv

2. Remove the HCO CR
# oc delete hco kubevirt-hyperconverged -n openshift-cnv

3. After successful deletion of 'oc delete' command, create HCO CR ( "HyperConverged") from web-console ( Installed Operators -> create HyperConverged )

Actual results:
---------------
HCO CR never reaches stable condition

<snip>
  {
    "lastTransitionTime": "2022-05-18T05:03:33Z",
    "message": "SSP is progressing: Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)",
    "observedGeneration": 2,
    "reason": "SSPProgressing",
    "status": "True",
    "type": "Progressing"
  },
  {
    "lastTransitionTime": "2022-05-18T05:03:33Z",
    "message": "SSP is degraded: Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)",
    "observedGeneration": 2,
    "reason": "SSPDegraded",
    "status": "True",
    "type": "Degraded"
  },
</snip>

Expected results:
-----------------
HCO CR should reach stable condition after recreating the same

Additional info:
-----------------
SSP is stuck in deploying state

[ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv
NAME                          PHASE
ssp-kubevirt-hyperconverged   Deploying

[ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv
NAME                          PHASE
ssp-kubevirt-hyperconverged   Deploying
[ ~]$ oc get ssps ssp-kubevirt-hyperconverged -n openshift-cnv -o json | jq '.status'
{
  "conditions": [
    {
      "lastHeartbeatTime": "2022-05-18T05:24:40Z",
      "lastTransitionTime": "2022-05-18T05:03:32Z",
      "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)",
      "reason": "Available",
      "status": "False",
      "type": "Available"
    },
    {
      "lastHeartbeatTime": "2022-05-18T05:24:40Z",
      "lastTransitionTime": "2022-05-18T05:03:32Z",
      "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)",
      "reason": "Progressing",
      "status": "True",
      "type": "Progressing"
    },
    {
      "lastHeartbeatTime": "2022-05-18T05:24:40Z",
      "lastTransitionTime": "2022-05-18T05:03:32Z",
      "message": "Error: the server could not find the requested resource (post datasources.cdi.kubevirt.io)",
      "reason": "Degraded",
      "status": "True",
      "type": "Degraded"
    }
  ],
  "observedGeneration": 1,
  "operatorVersion": "4.11.0",
  "phase": "Deploying",
  "targetVersion": "4.11.0"
}

Comment 2 Simone Tiraboschi 2022-05-18 09:35:37 UTC
Created attachment 1880759 [details]
CDI operator logs

Comment 3 Oren Cohen 2022-05-18 09:36:36 UTC
The underlying problem is that cdi-operator doesn't remove its secrets and configmaps on CDI deletion process.
Then, on reinstallation, the previous secrets and configmaps are preventing cdi-operator to proceed due to orphan objects, e.g.


{"level":"error","ts":1652795514.7434072,"logger":"cdi-operator","msg":"error getting apiserver ca bundle","error":"ConfigMap \"cdi-apiserver-signer-bundle\" not found","stacktrace":"kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.getAPIServerCABundle\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:542\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDataImportCronValidatingWebhook\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:244\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDynamicAPIServerResources\n\t/remote-source/app/pkg/operator/resources/cluster/apiserver.go:57\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createResourceGroup\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:102\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createAllResources\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:88\nkubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.CreateAllDynamicResources\n\t/remote-source/app/pkg/operator/resources/cluster/factory.go:77\nkubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).GetAllResources\n\t/remote-source/app/pkg/operator/controller/cr-manager.go:126\nkubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).CheckForOrphans\n\t/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:363\nkubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).Reconcile\n\t/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:152\nkubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).Reconcile\n\t/remote-source/app/pkg/operator/controller/controller.go:236\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227"}
{"level":"info","ts":1652795514.7443233,"logger":"cdi-operator","msg":"Orphan object exists","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","obj":{"apiVersion":"v1","kind":"Secret","namespace":"openshift-cnv","name":"cdi-uploadserver-client-signer"}}

W/A is to manually delete the CDI secrets and configmaps, to let cdi-operator complete the reconciliation.

Moving to CDI team.

Comment 4 Simone Tiraboschi 2022-05-18 09:41:47 UTC
Looking at CDI operator logs we see that quickly after the deletion of CDI CR,
CDI operator recreates secretes and config maps:

{"level":"info","ts":1652864475.2948112,"logger":"cdi-operator","msg":"Reconciling CDI","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652864475.2948842,"logger":"cdi-operator","msg":"Doing reconcile update","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652864476.3855922,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-insecure-registries","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864476.4569244,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.ServiceAccount"}
{"level":"info","ts":1652864476.5323896,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.RoleBinding"}
{"level":"info","ts":1652864476.5842297,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy","type":"*v1.Role"}
{"level":"info","ts":1652864476.7492573,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.ServiceAccount"}
{"level":"info","ts":1652864476.7851727,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.RoleBinding"}
{"level":"info","ts":1652864476.8056452,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-cronjob","type":"*v1.Role"}
{"level":"info","ts":1652864476.8206687,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"v1beta1.upload.cdi.kubevirt.io","type":"*v1.APIService"}
{"level":"info","ts":1652864476.927518,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"v1alpha1.upload.cdi.kubevirt.io","type":"*v1.APIService"}
{"level":"info","ts":1652864476.9706755,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-datavolume-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.0325625,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-datavolume-mutate","type":"*v1.MutatingWebhookConfiguration"}
{"level":"info","ts":1652864477.0888171,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.1155927,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"objecttransfer-api-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.1779423,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"","name":"cdi-api-dataimportcron-validate","type":"*v1.ValidatingWebhookConfiguration"}
{"level":"info","ts":1652864477.2343497,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.2936044,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.3253348,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-server-cert","type":"*v1.Secret"}
{"level":"info","ts":1652864477.3650944,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.3879237,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.414795,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadproxy-server-cert","type":"*v1.Secret"}
{"level":"info","ts":1652864477.4489758,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.4908574,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.5296195,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-signer","type":"*v1.Secret"}
{"level":"info","ts":1652864477.5725944,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-signer-bundle","type":"*v1.ConfigMap"}
{"level":"info","ts":1652864477.6052103,"logger":"cdi-operator","msg":"Resource created","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","namespace":"kubevirt-hyperconverged","name":"cdi-uploadserver-client-cert","type":"*v1.Secret"}

and then it starts hot lopping on:
{"level":"info","ts":1652864508.1824028,"logger":"cdi-operator","msg":"CDI CR does not exist","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652865362.5791483,"logger":"cdi-operator","msg":"Reconciling CDI","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged"}
{"level":"info","ts":1652865362.5931349,"logger":"cdi-operator","msg":"Orphan object exists","Request.Namespace":"","Request.Name":"cdi-kubevirt-hyperconverged","obj":{"apiVersion":"v1","kind":"Secret","namespace":"kubevirt-hyperconverged","name":"cdi-apiserver-signer"}}


and it's never going to progress until those objects are explicitly removed (or everything is removed removing the namespace).

Comment 5 SATHEESARAN 2022-05-19 01:42:56 UTC
Tested with the workaround suggested by Simone and Oren about removing CDI secrets and configmaps

Removed CDI secrets
[ ~]$ oc delete secret -n openshift-cnv cdi-apiserver-server-cert cdi-apiserver-signer cdi-uploadproxy-server-cert cdi-uploadproxy-signer cdi-uploadserver-client-cert cdi-uploadserver-client-signer cdi-uploadserver-signer

Removed CDI configmaps
[ ~]$ oc delete cm -n openshift-cnv cdi-apiserver-signer-bundle cdi-uploadproxy-signer-bundle cdi-uploadserver-client-signer-bundle cdi-uploadserver-signer-bundle


Creating HCO CR after deleting secret and configmaps works good, and reaches stable condition

Comment 6 Krzysztof Majcher 2022-06-14 13:20:33 UTC
SSP team, can you take a look? It seems a fix on your end might do the work here.

Comment 7 Andrej Krejcir 2022-06-15 12:19:48 UTC
Do you mean that SSP operator should remove the CDI Secrets and ConfigMaps? It seems unrelated to SSP.

Comment 8 Nahshon Unna-Tsameret 2022-06-15 12:58:57 UTC
I'm not sure it's the configmap and the secret, maybe they just a side effect. I can see that SSP still trying to read dataSource and DataImportCron after CDI and its CRDs are removed.

Comment 9 Andrej Krejcir 2022-06-15 15:03:01 UTC
True, the ssp operator does not behave correctly when CRDs are removed. We can fix it, but I'm not sure if it will fix this bug.

Comment 10 Dominik Holler 2022-06-21 05:58:55 UTC
Removing target release due to the changed of component.

Comment 11 sgott 2022-08-17 13:58:13 UTC
Deferring this due to capacity.

Comment 12 Andrej Krejcir 2022-11-21 09:49:52 UTC
This bug may have been fixed by the same PR as Bug 2122236.

Comment 13 Andrej Krejcir 2022-11-22 09:47:53 UTC
This bug can be verified in 4.12.

Comment 14 Andrej Krejcir 2022-11-29 09:51:27 UTC
I've tried to verify this bug on my cluster with CNV 4.12.0-745, and it is still happening.

The problem is that after recreating HCO, the CDI CRDs are not created, and SSP is waiting for them (datasources.cdi.kubevirt.io, dataimportcrons.cdi.kubevirt.io, datavolumes.cdi.kubevirt.io)

There is an error in the cdi-operator log:

{
  "level": "error",
  "ts": 1669714626.7532284,
  "logger": "cdi-operator",
  "msg": "error getting apiserver ca bundle",
  "error": "ConfigMap \"cdi-apiserver-signer-bundle\" not found",
  "stacktrace": "{STACKTRACE_BELOW}"
}


This is the stacktrace:

kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.getAPIServerCABundle
	/remote-source/app/pkg/operator/resources/cluster/apiserver.go:553
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDataVolumeMutatingWebhook
	/remote-source/app/pkg/operator/resources/cluster/apiserver.go:541
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createDynamicAPIServerResources
	/remote-source/app/pkg/operator/resources/cluster/apiserver.go:54
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createResourceGroup
	/remote-source/app/pkg/operator/resources/cluster/factory.go:102
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.createAllResources
	/remote-source/app/pkg/operator/resources/cluster/factory.go:88
kubevirt.io/containerized-data-importer/pkg/operator/resources/cluster.CreateAllDynamicResources
	/remote-source/app/pkg/operator/resources/cluster/factory.go:77
kubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).GetAllResources
	/remote-source/app/pkg/operator/controller/cr-manager.go:126
kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).CheckForOrphans
	/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:363
kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler.(*Reconciler).Reconcile
	/remote-source/app/vendor/kubevirt.io/controller-lifecycle-operator-sdk/pkg/sdk/reconciler/reconciler.go:152
kubevirt.io/containerized-data-importer/pkg/operator/controller.(*ReconcileCDI).Reconcile
	/remote-source/app/pkg/operator/controller/controller.go:236
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227

Comment 17 SATHEESARAN 2023-08-23 10:14:52 UTC
*** Bug 2233413 has been marked as a duplicate of this bug. ***

Comment 18 SATHEESARAN 2023-08-30 06:00:23 UTC
I see that this issue is also seen with CNV v4.14 interim builds. Do we need a separate bug to track this issue for CNV v4.14 ?

Comment 19 Arnon Gilboa 2023-08-30 10:17:52 UTC
@sasundar sure, please clone it to 4.14 as well.

Comment 21 dalia 2023-09-13 12:24:08 UTC
Arnon, can you please update?

Comment 22 Dominik Holler 2023-10-31 07:29:11 UTC
Neither fix attached nor verified, postponing to next z-stream.


Note You need to log in before you can comment on or make changes to this bug.