Bug 2121786

Summary: Deployer upgrade from v2.0.4 to v2.0.5 stuck
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Elena Bondarenko <ebondare>
Component: odf-managed-serviceAssignee: Ohad <omitrani>
Status: CLOSED CURRENTRELEASE QA Contact: Filip Balák <fbalak>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, dbindra, fbalak, lgangava, ocs-bugs, odf-bz-bot, sgatfane
Target Milestone: ---Keywords: UpgradeBlocker
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.0.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-02 05:17:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Elena Bondarenko 2022-08-26 15:46:54 UTC
Description of problem:

During upgrade to v2.0.5 ocs-osd-deployer.v2.0.4 csv got stuck in phase Replacing and ocs-osd-deployer.v2.0.5 in phase Pending.

Version-Release number of selected component (if applicable):

oc get csv -n openshift-storage
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.5                      NooBaa Operator               4.10.5            mcg-operator.v4.10.4                      Succeeded
ocs-operator.v4.10.5                      OpenShift Container Storage   4.10.5            ocs-operator.v4.10.4                      Succeeded
ocs-osd-deployer.v2.0.4                   OCS OSD Deployer              2.0.4             ocs-osd-deployer.v2.0.3                   Replacing
ocs-osd-deployer.v2.0.5                   OCS OSD Deployer              2.0.5             ocs-osd-deployer.v2.0.4                   Pending
odf-csi-addons-operator.v4.10.5           CSI Addons                    4.10.5            odf-csi-addons-operator.v4.10.4           Succeeded
odf-operator.v4.10.5                      OpenShift Data Foundation     4.10.5            odf-operator.v4.10.4                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.422-151be96   Route Monitor Operator        0.1.422-151be96   route-monitor-operator.v0.1.420-b65f47e   Succeeded

How reproducible:

5/5 clusters

Steps to Reproduce:
1. install odf addon v2.0.4 
2. upgrade it to v2.0.5
3.

Actual results:

v2.0.5 ocs-osd-deployer.v2.0.4 csv got stuck in phase Replacing and ocs-osd-deployer.v2.0.5 in phase Pending

Expected results:

deployer.v2.0.5 is in Succeeded phase after 5-10 minutes

Additional info:

n odf-operator-controller-manager pod I see the following errors: ERROR    controllers.StorageSystem    failed to validate CSV    {"instance": "openshift-storage/ocs-storagecluster-storagesystem", "ClusterServiceVersion": "ocs-operator.v4.10.5", "error": "Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"ocs-operator.v4.10.5\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/red-hat-data-services/odf-operator/controllers.(*StorageSystemReconciler).reconcile
    /remote-source/app/controllers/storagesystem_controller.go:163
github.com/red-hat-data-services/odf-operator/controllers.(*StorageSystemReconciler).Reconcile
    /remote-source/app/controllers/storagesystem_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2022-08-26T13:39:55.740Z    INFO    controllers.StorageSystem    vendor CSV is installed and ready    {"instance": "openshift-storage/ocs-storagecluster-storagesystem", "ClusterServiceVersion": "odf-csi-addons-operator.v4.10.5"}
2022-08-26T13:39:55.761Z    ERROR    controller-runtime.manager.controller.subscription    Reconciler error    {"reconciler group": "operators.coreos.com", "reconciler kind": "Subscription", "name": "odf-operator-stable-4.10-redhat-operators-openshift-marketplace", "namespace": "openshift-storage", "error": "Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"odf-csi-addons-operator.v4.10.5\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214
2022-08-26T13:39:55.841Z    ERROR    controller-runtime.manager.controller.storagesystem    Reconciler error    {"reconciler group": "odf.openshift.io", "reconciler kind": "StorageSystem", "name": "ocs-storagecluster-storagesystem", "namespace": "openshift-storage", "error": "Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"ocs-operator.v4.10.5\": the object has been modified; please apply your changes to the latest version and try again"}

Comment 1 Elena Bondarenko 2022-08-26 16:04:29 UTC
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.26   True        False         11h     Cluster version is 4.10.26

$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   11h   Ready              2022-08-26T04:32:57Z

$ oc get managedocs managedocs -n openshift-storage -oyaml
apiVersion: ocs.openshift.io/v1alpha1
kind: ManagedOCS
metadata:
  creationTimestamp: "2022-08-26T04:32:28Z"
  finalizers:
  - managedocs.ocs.openshift.io
  generation: 1
  name: managedocs
  namespace: openshift-storage
  resourceVersion: "1160588"
  uid: ef45a009-f3a1-4b73-9e9b-db32043b49ba
spec: {}
status:
  components:
    alertmanager:
      state: Ready
    prometheus:
      state: Ready
    storageCluster:
      state: Ready
  reconcileStrategy: strict

$ rosa list addon -c sgatfane-26-pr | grep ocs-provider-qe
ocs-provider-qe             Red Hat OpenShift Data Foundation Managed Service Provider (QE)       ready

Comment 5 Elena Bondarenko 2022-09-05 11:30:54 UTC
The next attempt to upgrade provider addon on September 1st was successful. I'll update the bz if we hit the issue again.

Comment 6 Filip Balák 2022-09-19 10:13:24 UTC
Upgrade problem got resolved and the rollout is released. --> VERIFIED

Tested with:
ocs-osd-deployer.v2.0.5