Bug 2208079

Summary: rbd mirror daemon is commonly not upgraded
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Annette Clewett <aclewett>
Component: rookAssignee: Travis Nielsen <tnielsen>
Status: CLOSED ERRATA QA Contact: Sidhant Agrawal <sagrawal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.13CC: ebenahar, kramdoss, ocs-bugs, odf-bz-bot, tnielsen
Target Milestone: ---Keywords: AutomationBackLog
Target Release: ODF 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 4.13.0-203 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-06-21 15:25:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Annette Clewett 2023-05-17 22:36:51 UTC
Created attachment 1965273 [details]
rook-ceph-operator log at DEBUG level

Description of problem (please be detailed as possible and provide log
snippests):
Upgrade of ODF 4.13 build 187 to 4.13 build 201 failed to update rbd-mirror ceph version.

$ kubectl rook-ceph -n openshift-storage ceph versions
{
    "mon": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "osd": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mds": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1
    },
    "rgw": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1,
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 10
    }
}


Version of all relevant components (if applicable):

OCP: 4.13.0-rc.5
ODF: 4.13.0-201

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, testing RDR and rbd-mirror ceph versions do not match for the 2 ceph clusters. One updated to latest ceph version and other did not.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Is intermittent

Steps to Reproduce:
1. Install ODF and enable mirroring
2. Update ODF to new version
3. Login via Rook toolbox and check "ceph versions"

Actual results:
ceph version for rbd-mirror does not match other ceph versions

Expected results:
ceph version for rbd-mirror does match other ceph versions


Additional info:
Expected result for ODF 4.13.0-201
$ kubectl rook-ceph -n openshift-storage ceph versions
{
    "mon": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "osd": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mds": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "rgw": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 11
    }
}

Comment 2 Travis Nielsen 2023-05-17 23:13:28 UTC
Troubleshooted Annette's cluster and found that there is a bug in the reconcile for the rbd mirror daemon that frequently will prevent the rbd mirror CRs from being reconciled after the operator is restarted. In particular, during an upgrade the rbd mirror daemon may not be upgraded, leaving the rbd mirror daemon running on the previous version.

This is a very low risk fix and important to ensure the rbd mirror daemon is running on the correct version after an upgrade.

Comment 12 errata-xmlrpc 2023-06-21 15:25:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742