Description of problem: In deployer version v2.0.3, now https://issues.redhat.com/browse/RHSTOR-3353 -Prevent uninstallation if storage consumers are present in the Provider cluster. During testing, it has been observed that the provider cluster uninstallation getting stuck in deleting service status. ocs-osd-controller-manager shows log "INFO controllers.ManagedOCS Found OCS storage consumers, cannot proceed with uninstallation" Version-Release number of selected component (if applicable): oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.4 NooBaa Operator 4.10.4 mcg-operator.v4.10.3 Succeeded ocs-operator.v4.10.4 OpenShift Container Storage 4.10.4 ocs-operator.v4.10.3 Succeeded ocs-osd-deployer.v2.0.3 OCS OSD Deployer 2.0.3 ocs-osd-deployer.v2.0.2 Succeeded odf-csi-addons-operator.v4.10.4 CSI Addons 4.10.4 odf-csi-addons-operator.v4.10.3 Succeeded odf-operator.v4.10.4 OpenShift Data Foundation 4.10.4 odf-operator.v4.10.3 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.422-151be96 Route Monitor Operator 0.1.422-151be96 route-monitor-operator.v0.1.420-b65f47e Succeeded oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.21 True False 3h1m Cluster version is 4.10.21 Deployer Mediatype: image/svg+xml Image: quay.io/openshift/origin-kube-rbac-proxy:4.10.0 Image: quay.io/osd-addons/ocs-osd-deployer:2.0.3-2 Image: quay.io/osd-addons/ocs-osd-deployer:2.0.3-2 How reproducible: 3/4 Steps to Reproduce: 1. Deploy appliance mode provider cluster with 2 consumers 2. uninstall both consumers 3.rosa delete service --id=<cluster_service_id> Actual results: ocs-osd-controller-manager logs INFO controllers.ManagedOCS Found OCS storage consumers, cannot proceed with the uninstallation provider addon stuck in 'deleting' state and cluster uninstall stuck in 'deleting service' till the time manual workaround applied for uninstallation. Expected results: Provider cluster should get uninstall Additional info: Command o/p after initiating uninstallation $rosa list service SERVICE_ID SERVICE SERVICE_STATE CLUSTER_NAME 2C3n2uNkBWPzfrBZCVfqcdazVja ocs-provider-qe deleting service alayani-p17j $ rosa list addon -c alayani-p17j | grep ocs-provider-qe ocs-provider-qe Red Hat OpenShift Data Foundation Managed Service Provider (QE) deleting $$ rosa list cluster | grep alayani 1tgm0trfact2ed113uoq2c9o96rdek67 alayani-p17j ready $ oc get storageconsumer -n openshift-storage NAME AGE storageconsumer-73f24e41-4040-4394-a345-e93a7422a11e 31h storageconsumer-f4e18f4d-2bb2-4794-b561-8efc6583f09f 31h Workaround: Delete storageconsumers =====After applying workaround cluster uninsatllation resumed=========== $oc get storageconsumer -n openshift-storage | awk 'NR>1{print $1}' | xargs -t oc delete storageconsumer -n openshift-storage oc delete storageconsumer -n openshift-storage storageconsumer-f4e18f4d-2bb2-4794-b561-8efc6583f09f storageconsumer.ocs.openshift.io "storageconsumer-f4e18f4d-2bb2-4794-b561-8efc6583f09f" deleted storageconsumer.ocs.openshift.io "storageconsumer-73f24e41-4040-4394-a345-e93a7422a11e" deleted oc logs -f -n openshift-storage ocs-osd-controller-manager-6f67967567-fthw4 -c manager 2022-07-18T16:05:42.815Z INFO controllers.ManagedOCS Found OCS storage consumers, cannot proceed with uninstallation 2022-07-18T16:05:52.821Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:52.821Z INFO controllers.ManagedOCS Reconciling onboardingValidationKeySecret 2022-07-18T16:05:52.821Z INFO controllers.ManagedOCS Reconciling StorageCluster 2022-07-18T16:05:52.821Z INFO controllers.ManagedOCS Requested add-on settings {"size": "20", "enable-mcg": "false"} 2022-07-18T16:05:52.821Z INFO controllers.ManagedOCS Setting storage device set count {"Current": 5, "New": 5} 2022-07-18T16:05:52.822Z INFO controllers.ManagedOCS Reconciling CSVs 2022-07-18T16:05:52.822Z INFO controllers.ManagedOCS Reconciling alertRelabelConfigSecret 2022-07-18T16:05:52.822Z INFO controllers.ManagedOCS Reconciling kubeRBACConfigMap 2022-07-18T16:05:52.822Z INFO controllers.ManagedOCS Reconciling PrometheusService 2022-07-18T16:05:52.822Z INFO controllers.ManagedOCS Reconciling Prometheus 2022-07-18T16:05:52.832Z INFO controllers.ManagedOCS Reconciling Alertmanager 2022-07-18T16:05:52.832Z INFO controllers.ManagedOCS Reconciling AlertmanagerConfig secret 2022-07-18T16:05:52.832Z WARN controllers.ManagedOCS Customer Email for alert notification is not provided 2022-07-18T16:05:52.839Z INFO controllers.ManagedOCS Reconciling k8sMetricsServiceMonitorAuthSecret 2022-07-18T16:05:52.842Z INFO controllers.ManagedOCS Unable to find v1 grafana-datasources secret 2022-07-18T16:05:52.844Z INFO controllers.ManagedOCS Reconciling k8sMetricsServiceMonitor 2022-07-18T16:05:52.845Z INFO controllers.ManagedOCS reconciling monitoring resources 2022-07-18T16:05:52.908Z INFO controllers.ManagedOCS Reconciling DMS Prometheus Rule 2022-07-18T16:05:52.908Z INFO controllers.ManagedOCS Reconciling OCSInitialization 2022-07-18T16:05:52.908Z INFO controllers.ManagedOCS reconciling PrometheusProxyNetworkPolicy resources 2022-07-18T16:05:52.908Z INFO controllers.ManagedOCS Non converged deployment, skipping reconcile for egress network policy 2022-07-18T16:05:52.908Z INFO controllers.ManagedOCS starting OCS uninstallation - deleting managedocs 2022-07-18T16:05:52.919Z ERROR controller-runtime.manager.controller.managedocs Reconciler error {"reconciler group": "ocs.openshift.io", "reconciler kind": "ManagedOCS", "name": "managedocs", "namespace": "openshift-storage", "error": "Operation cannot be fulfilled on managedocs.ocs.openshift.io \"managedocs\": the object has been modified; please apply your changes to the latest version and try again"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /tmp/go/ocs-osd-deployer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /tmp/go/ocs-osd-deployer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 2022-07-18T16:05:52.919Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:52.919Z INFO controllers.ManagedOCS deleting storagecluster 2022-07-18T16:05:52.929Z INFO controllers.ManagedOCS deleting storageSystems 2022-07-18T16:05:53.054Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:53.055Z INFO controllers.ManagedOCS deleting storagecluster 2022-07-18T16:05:53.143Z INFO controllers.ManagedOCS deleting storageSystems 2022-07-18T16:05:53.336Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:53.336Z INFO controllers.ManagedOCS deleting storagecluster 2022-07-18T16:05:53.345Z INFO controllers.ManagedOCS deleting storageSystems 2022-07-18T16:05:57.502Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.502Z INFO controllers.ManagedOCS deleting OCS CSV 2022-07-18T16:05:57.549Z INFO controllers.ManagedOCS removing finalizer from the ManagedOCS resource 2022-07-18T16:05:57.577Z INFO controllers.ManagedOCS finallizer removed successfully 2022-07-18T16:05:57.596Z ERROR controller-runtime.manager.controller.managedocs Reconciler error {"reconciler group": "ocs.openshift.io", "reconciler kind": "ManagedOCS", "name": "managedocs", "namespace": "openshift-storage", "error": "Operation cannot be fulfilled on managedocs.ocs.openshift.io \"managedocs\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/ocs.openshift.io/managedocs/openshift-storage/managedocs, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 170869a8-17dc-40ea-8402-6d3101c73372, UID in object meta: "} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /tmp/go/ocs-osd-deployer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /tmp/go/ocs-osd-deployer/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 2022-07-18T16:05:57.596Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.596Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.596Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.621Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.621Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.621Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.621Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.645Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.724Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.724Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.724Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.725Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.729Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.729Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.729Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.729Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.729Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.729Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.729Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.730Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.739Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.739Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.739Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.739Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.744Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.744Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.744Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.744Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.744Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.744Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.745Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.745Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.745Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.745Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.745Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.745Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.746Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.746Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.746Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.746Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.751Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.751Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.751Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.751Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.763Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.763Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.763Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.763Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.768Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.768Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.768Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.769Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:57.984Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:57.984Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:57.984Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:57.984Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:58.000Z INFO controllers.ManagedOCS Starting reconcile for ManagedOCS {"req.Namespace": "openshift-storage", "req.Name": "managedocs"} 2022-07-18T16:05:58.000Z WARN controllers.ManagedOCS ManagedOCS resource not found 2022-07-18T16:05:58.000Z INFO controllers.ManagedOCS deleting deployer csv 2022-07-18T16:05:58.000Z INFO controllers.ManagedOCS Deployer csv removed successfully 2022-07-18T16:05:58.140Z INFO controller-runtime.manager.controller.managedocs Shutdown signal received, waiting for all workers to finish {"reconciler group": "ocs.openshift.io", "reconciler kind": "ManagedOCS"} 2022-07-18T16:05:58.140Z INFO controller-runtime.manager.controller.managedocs All workers finished {"reconciler group": "ocs.openshift.io", "reconciler kind": "ManagedOCS"} ===================================================== rosa cluster and service get deleted after some time
The root cause of looks like the leftover of storage consumers in storage consumer list even after offboarding of the consumer. This is already reported to this bug https://bugzilla.redhat.com/show_bug.cgi?id=2069389
Looking at this, deployer is working as expected. I think a bug needs to be raised on the product side to remove storage consumers after offboarding. Ohad WDYT?
@sgatfane Can you provide the storage consumer CR yaml after the consumer has offboarded? Seems after offboarding the Storage Consumer CR is marked for deletion but doesnot get deleted immediately. We might have to check our uninstallation logic to include deletionTimestamp.
@kmajumde Storage Consumer will not get deleted until all of the rook resources that are owned by it are deleted. The underlying rook resource might not get deleted because of an unclean removal of PV/PVC on the consumer cluster side. Let's try to identify which rook resource is stuck on deleting, but does not get deleted, and try to address that.
Try it on the latest build
As the issue is not always reproducible, I asked the team members who created clusters lately if anyone faced any issues with uninstallation. I'll gather the responses and update the BZ.
Verified on Deployer version v2.0.12 (on QE addon/Dev Addon ) . The issue is not reproducible single time with 5 uninstallations. so Considering it is fixed on the latest version. Marking BZ as verified.