Created attachment 1965273 [details] rook-ceph-operator log at DEBUG level Description of problem (please be detailed as possible and provide log snippests): Upgrade of ODF 4.13 build 187 to 4.13 build 201 failed to update rbd-mirror ceph version. $ kubectl rook-ceph -n openshift-storage ceph versions { "mon": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3 }, "mgr": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1 }, "osd": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3 }, "mds": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2 }, "rbd-mirror": { "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1 }, "rgw": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1 }, "overall": { "ceph version 17.2.6-26.el9cp (ef7b8da24916178ade693b2fd0de13b917f53865) quincy (stable)": 1, "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 10 } } Version of all relevant components (if applicable): OCP: 4.13.0-rc.5 ODF: 4.13.0-201 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, testing RDR and rbd-mirror ceph versions do not match for the 2 ceph clusters. One updated to latest ceph version and other did not. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Is intermittent Steps to Reproduce: 1. Install ODF and enable mirroring 2. Update ODF to new version 3. Login via Rook toolbox and check "ceph versions" Actual results: ceph version for rbd-mirror does not match other ceph versions Expected results: ceph version for rbd-mirror does match other ceph versions Additional info: Expected result for ODF 4.13.0-201 $ kubectl rook-ceph -n openshift-storage ceph versions { "mon": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3 }, "mgr": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1 }, "osd": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 3 }, "mds": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 2 }, "rbd-mirror": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1 }, "rgw": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 1 }, "overall": { "ceph version 17.2.6-47.el9cp (6add4f24d1eff88e1db808ecdc16fd5b2db96dd4) quincy (stable)": 11 } }
Troubleshooted Annette's cluster and found that there is a bug in the reconcile for the rbd mirror daemon that frequently will prevent the rbd mirror CRs from being reconciled after the operator is restarted. In particular, during an upgrade the rbd mirror daemon may not be upgraded, leaving the rbd mirror daemon running on the previous version. This is a very low risk fix and important to ensure the rbd mirror daemon is running on the correct version after an upgrade.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742