Bug 2102498

Summary: [MS-ODF UPGRADE] MS-ODF-clusters with previous odf version (4.10.2-3) and deployer version 2.0.2. does not upgraded to deployer v2.0.3
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: suchita <sgatfane>
Component: odf-managed-serviceAssignee: Dhruv Bindra <dbindra>
Status: CLOSED CURRENTRELEASE QA Contact: suchita <sgatfane>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.10CC: aeyal, dbindra, ebenahar, fbalak, nberry, ocs-bugs, odf-bz-bot, vavuthu
Target Milestone: ---Keywords: Tracking
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 2.0.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-11-02 05:17:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description suchita 2022-06-30 04:41:09 UTC
Description of problem:

Consumer and provider Clusters with addon version v2.0.2 upgraded failed to addon deployer version v2.0.3

and OCP version 4.10.18 and 4.8.43  upgrade Failed to addon deployer version v2.0.3

while preparing for Deployer Upgrade v2.0.2 to v2.0.3 on the stagging QE add-on, we have 3 types of cluster setup 
Setup 1.  Provide OCP4.10.18 + ODF addon v2.0.2 and 2 Consumer with OCP4.10.43 and ODF Consumer add-on   v2.0.2
Setup 2.  Provide OCP4.10.18 + ODF addon v2.0.2 and 2 Consumer with OCP4.8.43 and ODF Consumer add-on   v2.0.2
Setup 3 . Private link cluster Provide OCP4.10.18 + ODF addon v2.0.2 and 2 Consumer with OCP4.10.43 and ODF Consumer add-on v2.0.2

Upgrade Failed on all the above setups. New Fresh deployed cluster has deployer version v2.0.3

Version-Release number of selected component (if applicable):

++++++++++++++++++++++++++++++++++
Wed Jun 29 17:40:10 UTC 2022
Deployer
    Mediatype:   image/svg+xml
                Image:  quay.io/openshift/origin-kube-rbac-proxy:4.10.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.2-3
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.2-3
-----------
ODF version
"4.10.2-3"

========CSV ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.4                      NooBaa Operator               4.10.4            mcg-operator.v4.10.3                      Succeeded
ocs-operator.v4.10.2                      OpenShift Container Storage   4.10.2            ocs-operator.v4.10.1                      Succeeded
ocs-osd-deployer.v2.0.2                   OCS OSD Deployer              2.0.2             ocs-osd-deployer.v2.0.1                   Succeeded
odf-csi-addons-operator.v4.10.4           CSI Addons                    4.10.4            odf-csi-addons-operator.v4.10.2           Succeeded
odf-operator.v4.10.2                      OpenShift Data Foundation     4.10.2            odf-operator.v4.10.1                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.422-151be96   Route Monitor Operator        0.1.422-151be96   route-monitor-operator.v0.1.420-b65f47e   Succeeded
--------------



How reproducible:
6/6

Steps to Reproduce:
1. Create an appliance provider cluster with OCP 4.10 and ocs-provider addon
(rosa create service --type ocs-provider-qe --name $CLUSTER_NAME --size 20 --onboarding-validation-key $CONSUMER_KEY  --subnet-ids $SUBNET_IDS )

2.Create rosa Consumer cluster with OCP4.8 and ocs-consumer-qe addon
3.Create rosa Consumer cluster with OCP4.10 and ocs-consumer-qe addon
 
4. Initiate upgrade
Provider: https://gitlab.cee.redhat.com/service/managed-tenants/-/merge_requests/2559
Consumer: https://gitlab.cee.redhat.com/service/managed-tenants/-/merge_requests/2558



Actual results:
Consumer and provider Cluster with OCP4.10/OCP4.8+ ODF4.10 and deployer v2.0.2 Failed to upgrade to deployer version v2.0.3

Expected results:
All clusters provider and consumer should upgrade from deployer v2.0.2 to deployer version v2.0.3

Additional info:


Merging of PRs around Wed Jun 29 13:03:00 UTC 2022
June 29 6:30 IST [ssotest01ue1] SelectorSyncSet addon-ocs-provider-qe applied
June 29 6:30 IST [hive-stage-01] SelectorSyncSet addon-ocs-provider-qe applied

June 29 6:33 IST [hives02ue1] SelectorSyncSet addon-ocs-consumer-qe applied
June 29 6:33 IST [ssotest01ue1] SelectorSyncSet addon-ocs-consumer-qe applied



Few OC command outputs tracing along with time here: 
Provider: 

http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-29jp1/sgatfane-29jp1_20220629T033315/logs/upgrade_logs/nohup.out

Consumer OCP4.8 deployer V2.0.2: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-29jc1/sgatfane-29jc1_20220629T044715/logs/upgrade_logs/nohup.out

Consumer OCP4.10 deployer V2.0.2:http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-29jc5/sgatfane-29jc5_20220629T050952/logs/upgrade_logs/nohup.out
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-29jc2/sgatfane-29jc2_20220629T044717/logs/upgrade_logs/nohup.out

Comment 1 suchita 2022-06-30 12:37:24 UTC
After discussing with odf-ms- engineering folks Dhruv and Leela, as per engineering the bundle image index and subscription are updated as expected for the deployer. 

The issue is raised with MTSRE: https://issues.redhat.com/browse/MTSRE-590?filter=-2

Comment 2 suchita 2022-07-01 06:56:28 UTC
More discussion on slack thread https://coreos.slack.com/archives/C01L46M0FQC/p1656569714574329.
New Big raised https://issues.redhat.com/browse/RHSTOR-3455

Comment 3 suchita 2022-07-19 06:04:24 UTC
Now, v2.0.2 to v2.0.3 upgraded successfully. 
Upgrade results: 
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12p3/sgatfane-p12p3_20220712T040139/logs/upgrade_test_report_1657612661.html
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12c3/sgatfane-p12c3_20220712T065127/logs/upgrade_test_report_1657615630.html
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12c4/sgatfane-p12c4_20220712T065124/logs/upgrade_test_report_1657615613.html
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12j/sgatfane-p12j_20220712T040156/logs/test_report_1657695621.html
 Cosumer1:(OCP4.8)
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12c3/sgatfane-p12c3_20220712T065127/logs/upgrade_test_report_1657615630.html

oc command output during upgrade:
Privatelink Cluster:
Provider: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12p3/sgatfane-p12p3_20220712T040139/logs/upgrade_logs/nohup.out
Consumer1:http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12c3/sgatfane-p12c3_20220712T065127/logs/upgrade_logs/nohup.out
Consumer2:http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12c4/sgatfane-p12c4_20220712T065124/logs/upgrade_logs/nohup.out

Non - Private Link appliance mode:
Provider: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-p12j/sgatfane-p12j_20220712T040156/logs/upgrade_logs/nohup.out
Consumer1: 
Consumer2:http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/sgatfane-j12c2/sgatfane-j12c2_20220712T051842/logs/upgrade_logs/nohup.out

Verifed on :
    Mediatype:   image/svg+xml
                Image:  quay.io/openshift/origin-kube-rbac-proxy:4.10.0
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.2-3
                Image:             quay.io/osd-addons/ocs-osd-deployer:2.0.2-3
========CSV ======
E0712 10:12:22.251346   51028 v2.go:105] read /dev/stdin: bad file descriptor
UID          PID    PPID  C STIME TTY          TIME CMD
1001050+       1       0  0 10:09 ?        00:00:00 /usr/bin/openshift-deploy
1001050+      85       0  0 10:12 ?        00:00:00 ps -ef
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.10.4                      NooBaa Operator               4.10.4            mcg-operator.v4.10.3                      Succeeded
ocs-operator.v4.10.2                      OpenShift Container Storage   4.10.2            ocs-operator.v4.10.1                      Succeeded
ocs-osd-deployer.v2.0.2                   OCS OSD Deployer              2.0.2             ocs-osd-deployer.v2.0.1                   Succeeded
odf-csi-addons-operator.v4.10.4           CSI Addons                    4.10.4            odf-csi-addons-operator.v4.10.2           Succeeded
odf-operator.v4.10.2                      OpenShift Data Foundation     4.10.2            odf-operator.v4.10.1                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.422-151be96   Route Monitor Operator        0.1.422-151be96   route-monitor-operator.v0.1.420-b65f47e   Succeeded
--------------
Verified on Clusters
Non- Private link : OCP Version: 4.10.22 provider, ocp4.10.22 and OCP4.8.46 consumer
privatelink: OCP Version: 4.10.22 provider, ocp4.10.22 and OCP4.8.46 consumer