Bug 1834327 - Upgrade from OCS 4.4 to OCS 4.5 does not upgrade csi-* pods
Summary: Upgrade from OCS 4.4 to OCS 4.5 does not upgrade csi-* pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Madhu Rajanna
QA Contact: Aviad Polak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-11 14:13 UTC by Neha Berry
Modified: 2020-09-23 09:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:17:01 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 512 0 None closed Bug 1834327: Add ROOK_CSI_ALLOW_UNSUPPORTED_VERSION to rook operator env 2021-01-23 12:26:35 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:17:23 UTC

Description Neha Berry 2020-05-11 14:13:11 UTC
Description of problem (please be detailed as possible and provide log
snippests):
----------------------------------------------------------------------
This BZ is spinned off from Bug 1832889 based on comment#8 - Bug 1832889#c8 to track the CSI version upgrade issue

On a cluster with OCP 4.5 and OCS 4.4, an upgrade to OCS 4.5 was performed via the CLI. The upgrade was reported as successful, however, the csi-* and osd pods were not upgraded and were still on OCS 4.4 builds.

Bug tracking OSD version issue- Bug 1832889

LOGS AND OUTPUTS FROM THE CLUSTER:
==================================

Logs available at: http://rhsqe-repo.lab.eng.blr.redhat.com/cns/ocs-qe-bugs/1832889/



$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES                     PHASE
ocs-operator.v4.5.0-419.ci   OpenShift Container Storage   4.5.0-419.ci   ocs-operator.v4.4.0-414.ci   Succeeded


sh-4.4# ceph versions
{
    "mon": {
        "ceph version 14.2.8-35.el8cp (b32eac9fd60c00c62a2d3c85d88b483be7b55ba1) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.8-35.el8cp (b32eac9fd60c00c62a2d3c85d88b483be7b55ba1) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.8-35.el8cp (b32eac9fd60c00c62a2d3c85d88b483be7b55ba1) nautilus (stable)": 2
    },
    "rgw": {
        "ceph version 14.2.8-35.el8cp (b32eac9fd60c00c62a2d3c85d88b483be7b55ba1) nautilus (stable)": 1
    },
    "overall": {
        "ceph version 14.2.4-125.el8cp (db63624068590e593c47150c7574d08c1ec0d3e4) nautilus (stable)": 3,
        "ceph version 14.2.8-35.el8cp (b32eac9fd60c00c62a2d3c85d88b483be7b55ba1) nautilus (stable)": 7
    }
}

Version of all relevant components (if applicable):

4.5.0-0.nightly-2020-05-04-113741
ocs-operator.v4.5.0-419.ci



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, having a partial upgrade results in a mismatch of versions across the ceph components and prevents the user from accessing the features of the latest release.


Is there any workaround available to the best of your knowledge?

Not that I am aware of


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
--------------------------------------------------------------------------------
2

Can this issue reproducible? 
--------------------------------------------------------------------------------
I have tried this only once


Can this issue reproduce from the UI?
--------------------------------------------------------------------------------
The upgrade was done via CLI

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
--------------------------------------------------------------------------------
On a OCP 4.5 + OCS 4.4 cluster, perform an upgrade to OCS 4.5 as follows:
  1. oc edit catsrc/ocs-catalogsource -n openshift-marketplace 
      image: quay.io/rhceph-dev/ocs-olm-operator:4.5.0-419.ci
   
  2. oc edit subscriptions.operators.coreos.com ocs-subscription
     spec:
       channel: stable-4.5

  3. Wait for the upgrade to complete. Check for csv status

$ oc get csv
NAME                         DISPLAY                       VERSION        REPLACES                     PHASE
ocs-operator.v4.5.0-419.ci   OpenShift Container Storage   4.5.0-419.ci   ocs-operator.v4.4.0-414.ci   Succeeded

  4. Check the status of the pods and their versions, esp. CSI plugin and provisioner pods


Actual results:
--------------------------------------------------------------------------------
  csi-* pods were not upgraded and were still running with OCS 4.4 builds

Expected results:
--------------------------------------------------------------------------------
  All the pods should be upgraded to the latest OCS 4.5 builds. There should be no mismatch in the versions across different components


Additional Info
=======================

none of the CSI pods also got upgraded to the required version:
   cephcsi:4.5-2.b38f2c5c.release_4.5
    quay.io/rhceph-dev/cephcsi@sha256:86087a7123945ce4f7f720539693395e5a6fc8175318d050d0d983af8ea0e216


>> Builds in the CSV


                - name: ROOK_CSI_CEPH_IMAGE
                  value: quay.io/rhceph-dev/cephcsi@sha256:86087a7123945ce4f7f720539693395e5a6fc8175318d050d0d983af8ea0e216
                - name: ROOK_CSI_REGISTRAR_IMAGE
                  value: registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:b17e943c72cfd2696db2388e817739c23c0427dde4737e14cf58a5f5db50ce60
                - name: ROOK_CSI_RESIZER_IMAGE
                  value: registry.redhat.io/openshift4/ose-csi-external-resizer-rhel7@sha256:e7302652fe3f698f8211742d08b2dcea9d77925de458eb30c20789e12ee7ae33
                - name: ROOK_CSI_PROVISIONER_IMAGE
                  value: registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:49b470f8f5ce1edb883a03a0b6a726add01fb762cfd42f8941d6841f7d776318
                - name: ROOK_CSI_ATTACHER_IMAGE
                  value: registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:fb9f73ed22b4241eba25e71b63aa6729daa2d7e9bce6a13a060fe4c236735140
                image: quay.io/rhceph-dev/rook-ceph@sha256:e4e20a1e8756a8b9847def42a60aa117d8ab5633c6eaec3f8013132c2800c72c



>> Builds in one of the prov pods

csi-cephfsplugin-provisioner-679dd5d8b5-67pwr
====
    Image:         registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:e07525ae9a8a772ac2e7db1b8f8d8df2dcbc79d66792f570577a7904858b6abb
    Image ID:      registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:e07525ae9a8a772ac2e7db1b8f8d8df2dcbc79d66792f570577a7904858b6abb
    Image:         registry.redhat.io/openshift4/ose-csi-external-resizer-rhel7@sha256:e7302652fe3f698f8211742d08b2dcea9d77925de458eb30c20789e12ee7ae33
    Image ID:      registry.redhat.io/openshift4/ose-csi-external-resizer-rhel7@sha256:e7302652fe3f698f8211742d08b2dcea9d77925de458eb30c20789e12ee7ae33
    Image:         registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:9fc69e94f111343a6482e94e413e267dfc3ba17973c321da8fe20f1ac4c09155
    Image ID:      registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:9fc69e94f111343a6482e94e413e267dfc3ba17973c321da8fe20f1ac4c09155
    Image:         quay.io/rhceph-dev/cephcsi@sha256:9c55c32aa16e719888c408effe4e800495a70501e82c7a463bc826e3d8b5130f
    Image ID:      quay.io/rhceph-dev/cephcsi@sha256:9c55c32aa16e719888c408effe4e800495a70501e82c7a463bc826e3d8b5130f
    Image:         quay.io/rhceph-dev/cephcsi@sha256:9c55c32aa16e719888c408effe4e800495a70501e82c7a463bc826e3d8b5130f

Comment 4 Michael Adam 2020-05-12 13:14:00 UTC
This is a bug against 4.5, and if I understand it correctly, Madhu's patch is also against the target version of the upgrade, not the starting version.

Comment 10 Aviad Polak 2020-08-06 14:51:02 UTC
looks ok in build ocs-operator.v4.5.0-508.ci

oc exec rook-ceph-tools-66b74bdf95-qft52 -- ceph versions
{
    "mon": {
        "ceph version 14.2.8-81.el8cp (0336e23b7404496341b988c8057538b8185ca5ec) nautilus (stable)": 4
    },
    "mgr": {
        "ceph version 14.2.8-81.el8cp (0336e23b7404496341b988c8057538b8185ca5ec) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.8-81.el8cp (0336e23b7404496341b988c8057538b8185ca5ec) nautilus (stable)": 3
    },
    "mds": {},
    "overall": {
        "ceph version 14.2.8-81.el8cp (0336e23b7404496341b988c8057538b8185ca5ec) nautilus (stable)": 8
    }
}

Comment 12 errata-xmlrpc 2020-09-15 10:17:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.