Description of problem (please be detailed as possible and provide log snippests): I observe ceph version and corresponding image mismatch for rook-ceph-operator/toolbox and OSD/MON/MGR etc Version of all relevant components (if applicable): OCS-4.5.1 OCP-4.5.16 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No. but this can lead to confusion while debugging issue Is there any workaround available to the best of your knowledge? Nothing I am aware of. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 - very simple Can this issue reproducible? Yes. Deploy a fresh new OCS-4.5 cluster Can this issue reproduce from the UI? No If this is a regression, please provide more details to justify this: Not sure Steps to Reproduce: 1.Install OCP-4.5 2.Install OCS-4.5 3.check ceph version and ceph image corresponding to rook-ceph-operator, toolbox, OSD, MON, MGR etc Actual results: -------------- $ oc rsh rook-ceph-mon-a-69fdf4544b-4965x sh-4.4# ceph -v ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) -------------- $ oc rsh rook-ceph-mgr-a-68544f48bc-4qscx sh-4.4# ceph -v ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) -------------- $ oc rsh rook-ceph-osd-0-f79777f9-hp9p6 sh-4.4# ceph -v ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable) -------------- $ oc rsh rook-ceph-tools-6658bc55fb-gqb2w sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) -------------- $ oc rsh rook-ceph-operator-677cfd7cf8-67pjt sh-4.4$ ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) -------------- Here rook-ceph-operator and rook-ceph-tools pods show the version 14.2.8-111.el8cp whereas MON/OSD/MGR pods show the version 14.2.8-91.el8cp Expected results: All the pods should display the same ceph version. Additional info: Image information: -------------- *MON* Container ID: cri-o://83e1a1dc2624af1aab5d86c97968ad7e6f1ffb4b50fc89a947db80326db4bc81 Image: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:eafd1acb0ada5d7cf93699056118aca19ed7a22e4938411d307ef94048746cc8 Image ID: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:3def885ad9e8440c5bd6d5c830dafdd59edf9c9e8cce0042b0f44a5396b5b0f6 -------------- *MGR* Container ID: cri-o://7c2ef4c663cdb573d9c7475139779a55024e7252dede53502ef73e22b7dd3e7b Image: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:eafd1acb0ada5d7cf93699056118aca19ed7a22e4938411d307ef94048746cc8 Image ID: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:3def885ad9e8440c5bd6d5c830dafdd59edf9c9e8cce0042b0f44a5396b5b0f6 -------------- *OSD* Container ID: cri-o://ad61eaa5b7ef0df256f16305adf9b3899297944ae96c1c5bf5692a727e02ef14 Image: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:eafd1acb0ada5d7cf93699056118aca19ed7a22e4938411d307ef94048746cc8 Image ID: registry.redhat.io/rhceph/rhceph-4-rhel8@sha256:3def885ad9e8440c5bd6d5c830dafdd59edf9c9e8cce0042b0f44a5396b5b0f6 -------------- *rook-ceph-tools:* Container ID: cri-o://d3b356466a0a2e6af16a52555845dadf21e30b9968ca7f7c6a617023fb6ae3b1 Image: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:b9fd6c06423d6fbe213837089cac9a68cc0fb431c1b5c2b4fcf2cf6d19a910a4 Image ID: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:b9fd6c06423d6fbe213837089cac9a68cc0fb431c1b5c2b4fcf2cf6d19a910a4 -------------- *rook-ceph-operator* Container ID: cri-o://81f4a11afdbb4ec6ea751c4fe6a89a868d72f37bdc5401bb543100251a34593c Image: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:b9fd6c06423d6fbe213837089cac9a68cc0fb431c1b5c2b4fcf2cf6d19a910a4 Image ID: registry.redhat.io/ocs4/rook-ceph-rhel8-operator@sha256:b9fd6c06423d6fbe213837089cac9a68cc0fb431c1b5c2b4fcf2cf6d19a910a4
Marking as a regression - https://bugzilla.redhat.com/show_bug.cgi?id=1754892
This has zero user impact, to the best of my knowledge, thus, lowering severity. Please correct if you see any impact.
The Rook-Ceph Pods use a different image than the Ceph Pods, registry.redhat.io/ocs4/rook-ceph-rhel8-operator vs registry.redhat.io/rhceph/rhceph-4-rhel8. Any discrepancy in Ceph versions between the two are down to the DS build process and what dependencies the Rook-Ceph project pulls in. That said, if there is no technical problem then I don't think this matters. If there was a significant difference then we would probably have already made sure they were at advanced enough versions to remove the problem. Moving this to the rook component and OCS 4.7 in case there's further need for discussion.
Just like Jose said, the mismatch is on the DS build. Nothing Rook can do at this point. Also, I don't know how the build can always guarantee the exact same version between the Ceph image and Operator image Ceph packages. If you update the operator image, you might just get newer Ceph packages and the running cluster might have "older" pin-point release packages. Typically, higher Ceph packages is not an issue since they are compatible with earlier versions.
@Boris: We are suspecting a minor issue with the downstream builds: The rook-ceph-operator shows a different ceph build version (14.2.8-111) than the ceph component containers (osd/mon/mgr...) (14.2.8-91). Can you comment?
We are running dnf update -y in rook Dockerfile to download any security fixes. However, we are consuming rhceph-4 repos in rook build which also updates ceph packages in that container. We can remove that line but we won't get any security fixes for the other packages anymore. Maybe, we could add some --exclude (or --disablerepo) flags to the command to avoid updating the ceph packages.
I pushed a fix for this issue to ocs-4.4 (and onwards) dist-git branches. Do we want to do another RC of OCS 4.5.2 for this? It would be a relatively simple rebuild (we would just have to rebuild rook and the operator bundle). btw: We are still doing security updates in rook, I just disabled the rhceph repos to prevent the ceph packages from being updated.
This should already be fixed in 4.6.0 in the latest build (it did rebuild rook). Do we know if we want to target 4.5.2, too? Anyway, this would be fixed by any 4.5 rebuild (i.e. even in 4.5.3 if we ever release).
If the fix is in 4.6 already can I retarget this to 4.6? Unless another RC is needed? If so, this will be in only if one more RC will be required
Bug Fixed, all the pods display the same ceph version. SetUp: Provider: Vmware_Dynamic OCP Version:4.6.0-0.nightly-2020-11-21-194817 OCS Version:ocs-operator.v4.6.0-160.ci Test Process: 1.Check Ceph version on all relevnat components. ============================================================================================ $ oc rsh rook-ceph-mon-a-74b5fcb97d-s972f sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-mgr-a-5f8695cc48-7j7l7 sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-mon-b-54fdc9c889-wj47l sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-mon-c-8bbd8f9d4-rdp7l sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-osd-0-75cb6d6fc8-6gvnk sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-osd-1-55545b4fcc-4m6t2 sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-osd-2-5bbd9d949b-7fvzj sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-tools-85dc5f7bc8-tnlqj sh-4.4# ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================ $ oc rsh rook-ceph-operator-54f449df55-bzwrq sh-4.4$ ceph -v ceph version 14.2.8-111.el8cp (2e6029d57bc594eceba4751373da6505028c2650) nautilus (stable) ============================================================================================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.6.0 security, bug fix, enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5605