Bug 2208079 - rbd mirror daemon is commonly not upgraded
Summary: rbd mirror daemon is commonly not upgraded
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ODF 4.13.0
Assignee: Travis Nielsen
QA Contact: Sidhant Agrawal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-17 22:36 UTC by Annette Clewett
Modified: 2023-08-09 17:03 UTC (History)
5 users (show)

Fixed In Version: 4.13.0-203
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-21 15:25:37 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 492 0 None open Bug 2208079: rbdmirror: Retry reconcile if cluster not initialized 2023-05-18 14:53:34 UTC
Github rook rook pull 12247 0 None open rbdmirror: Retry reconcile if cluster not initialized 2023-05-17 23:13:27 UTC
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:25:48 UTC

Description Annette Clewett 2023-05-17 22:36:51 UTC
Created attachment 1965273 [details]
rook-ceph-operator log at DEBUG level

Description of problem (please be detailed as possible and provide log
snippests):
Upgrade of ODF 4.13 build 187 to 4.13 build 201 failed to update rbd-mirror ceph version.

$ kubectl rook-ceph -n openshift-storage ceph versions
{
    "mon": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "osd": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mds": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1
    },
    "rgw": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1,
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 10
    }
}


Version of all relevant components (if applicable):

OCP: 4.13.0-rc.5
ODF: 4.13.0-201

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, testing RDR and rbd-mirror ceph versions do not match for the 2 ceph clusters. One updated to latest ceph version and other did not.

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Is intermittent

Steps to Reproduce:
1. Install ODF and enable mirroring
2. Update ODF to new version
3. Login via Rook toolbox and check "ceph versions"

Actual results:
ceph version for rbd-mirror does not match other ceph versions

Expected results:
ceph version for rbd-mirror does match other ceph versions


Additional info:
Expected result for ODF 4.13.0-201
$ kubectl rook-ceph -n openshift-storage ceph versions
{
    "mon": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "osd": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3
    },
    "mds": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2
    },
    "rbd-mirror": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "rgw": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1
    },
    "overall": {
        "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 11
    }
}

Comment 2 Travis Nielsen 2023-05-17 23:13:28 UTC
Troubleshooted Annette's cluster and found that there is a bug in the reconcile for the rbd mirror daemon that frequently will prevent the rbd mirror CRs from being reconciled after the operator is restarted. In particular, during an upgrade the rbd mirror daemon may not be upgraded, leaving the rbd mirror daemon running on the previous version.

This is a very low risk fix and important to ensure the rbd mirror daemon is running on the correct version after an upgrade.

Comment 12 errata-xmlrpc 2023-06-21 15:25:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742


Note You need to log in before you can comment on or make changes to this bug.