Bug 1960784
| Summary: | After editing CSV the Rook operator does not update MDS or RGW to apply new changes in CEPH image for hotfix until the operator is restarted | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Petr Balogh <pbalogh> |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> |
| Status: | VERIFIED --- | QA Contact: | Petr Balogh <pbalogh> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | bkunal, dkhandel, muagarwa, shan |
| Target Milestone: | --- | Keywords: | AutomationBackLog |
| Target Release: | OCS 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.8.0-416.ci | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Petr Balogh
2021-05-14 21:01:21 UTC
The operator does not need to be restarted, the operator should just respond to the event that the CephCluster was updated. I see in the log that the CephCluster was updated, but not sure why the mds and rgw controllers were not triggered to also update. I talked to Neha and she told me that if we reproduce that I should open BZ for that (that you Travis told her to do this), so I did. 2 times in the row reproduced so I think, that there is a bug if it should do that reload hence opened the BZ. Thanks Thanks, good to hear there is a consistent repro. I agree there is a bug here, my previous comment was just trying to say it still needs investigation. The issue is that the version of Ceph did not change. The Rook operator will notify the file and object controllers that they need to reconcile only when the Ceph version has changed. The version check is currently only based on the build number. The two versions in this test are: "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 7, "ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 4 In this case, the build numbers 14.2.11-139 are all equivalent. The rest of the build version is ignored by Rook for the comparison. If the version is detected as changed [1], the operator log would show the message: "upgrade in progress, notifying child CRs" @Petr Is the Ceph build number actually expected to be unchanged during the hotfix? Or is this just an artifact found during testing? [1] https://github.com/openshift/rook/blob/release-4.6/pkg/operator/ceph/cluster/cluster.go#L122 Thanks Travis for clarification. @bkunal this is more question to Bipin. I got the image I should test which is mentioned in article: quay.io/rh-storage-partners/rhceph:4-50.0.hotfix.bz1959254 . Bipin, can you please take a look at Travis input? As this will affecting applying of hotfix if the version is the same. (In reply to Travis Nielsen from comment #5) > The issue is that the version of Ceph did not change. The Rook operator will > notify the file and object controllers that they need to reconcile only when > the Ceph version has changed. The version check is currently only based on > the build number. > > The two versions in this test are: > "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp > (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 7, > "ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) > nautilus (stable)": 4 > > In this case, the build numbers 14.2.11-139 are all equivalent. The rest of > the build version is ignored by Rook for the comparison. Build number won't change for hotfix build. Hotfix must be created on the same build. we do add some suffix( 0.hotfix.bz1959254.el8cp) but I guess that doesn't get checked. Then Why did we see image getting updated from OSD, MON, etc? In my cluster, I did not even observe issues for MDS. In my cluster, I saw ceph-detect-version pods getting respin as well. > > If the version is detected as changed [1], the operator log would show the > message: > "upgrade in progress, notifying child CRs" > > @Petr Is the Ceph build number actually expected to be unchanged during the > hotfix? Or is this just an artifact found during testing? > > > [1] > https://github.com/openshift/rook/blob/release-4.6/pkg/operator/ceph/cluster/ > cluster.go#L122 @Bipin The main reconcile is triggered, which updates all the mon/mgr/osd daemons, but the mds and rgw need to have their controllers triggered also. This is being missed if the ceph version didn't change. Upstream issue opened for this: https://github.com/rook/rook/issues/7964 Santosh could you take a look? (In reply to Travis Nielsen from comment #9) > Santosh could you take a look? on it. In order to test this I will need to have the hotfix for some of the ceph image.
E.g. I just deployed latest 4.8 (ocs-operator.v4.8.0-432.ci) cluster and I see this image is used in CSV:
- name: CEPH_IMAGE
value: quay.io/rhceph-dev/rhceph@sha256:725f93133acc0fb1ca845bd12e77f20d8629cad0e22d46457b2736578698eb6c
ceph versions returns:
{
"mon": {
"ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1
},
"osd": {
"ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3
},
"mds": {
"ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 2
},
"overall": {
"ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 9
}
}
Can someone create hotfix build which will have version like 14.2.11-181.0.hotfix.bzXXXXXX.el8cp so I can really verify this in latest 4.8 build?
Maybe @branto or @muagarwa can help here?
Thanks
OCS 4.8 hotfix build is available now: quay.io/rhceph-dev/ocs-registry:4.8.0-449.ci Build artifacts can be found here: https://ceph-downstream-jenkins-csb-storage.apps.ocp4.prod.psi.redhat.com/job/OCS%20Build%20Pipeline%204.8/162/ ocs-ci is still running though: https://ceph-downstream-jenkins-csb-storage.apps.ocp4.prod.psi.redhat.com/job/ocs-ci/455/ "name": "rhceph",
"tag": "4-50.0.hotfix.bz1959254",
"image": "quay.io/rhceph-dev/rhceph@sha256:6dbe1a5abfe1f3bf054b584d82f4011c0b0fec817924583ad834b4ff2a63c769",
"nvr": "rhceph-container-4-50.0.hotfix.bz1959254"
},
Deepshikha is this: 4-50.0.hotfix.bz1959254 image has version like: 14.2.11-181.0.hotfix.bzXXXXXX.el8cp as I see it has 4-50.0 in name?
Deepshikha Please confirm that so I can continue with verification.
Thanks
I am preparing cluster for verification here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/4535/ As I didn't get answer from Deepshikha I will try with the image quay.io/rhceph-dev/rhceph@sha256:6dbe1a5abfe1f3bf054b584d82f4011c0b0fec817924583ad834b4ff2a63c769 and let you know the results. I did somehow missed the comment on this bz. So sorry about that, Petr. Yes, you will have the image version like `2:14.2.11-139.0.hotfix.bz1959254.el8cp`. I can confirm from here https://quay.io/repository/rhceph-dev/rhceph/manifest/sha256:286820cca8aa3d6b72eef6c59779c8931c14cf28dafabbb229235c3ccc26e763?tab=packages Deepshikha, the version is not good enough. I need to have exact same version. Which suppose to be: 14.2.11-181.el8cp in order to test it. So I need image like: 14.2.11-181.0.hotfix.bz1959254.el8cp For now I see ale versions changed but I cannot verify this BZ as I need to have exact same version like we have in the build itself in order to test this. $ cat versions-after-hotfix.txt { "mon": { "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 3 }, "mds": { "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 2 }, "overall": { "ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 9 } } $ cat versions-before-hotfix.txt { "mon": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 3 }, "mds": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 2 }, "overall": { "ceph version 14.2.11-181.el8cp (68fea1005601531fe60d2979c56ea63bc073c84f) nautilus (stable)": 9 } } Deepshikha, what Petr is asking is to create a temporary 4.8 build with ceph tag as 4-50.0.hotfix.bz1959254 So the rhceph hotfix build 4-50.0.hotfix.bz1959254 has the ceph version `14.2.11-139.0.hotfix.bz1959254.el8cp`. Currently there is no hotfix rhceph image available for the same version i.e, 4-57. We can probably create a recent 4.8 custom build with rhceph 4-50 and you can probably upgrade from this new build to the hotfix build I provided earlier for verifying. let me know if it is fine for you? I have triggered a custom build with rhceph tag 4-50. Link to the build pipeline: https://ceph-downstream-jenkins-csb-storage.apps.ocp4.prod.psi.redhat.com/job/OCS%20Build%20Pipeline%204.8/167/ It should probably help. Tested with custom build and looks like it works well now.
$ cat versions-after-hotfix.txt
{
"mon": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 1
},
"osd": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 3
},
"mds": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 2
},
"rgw": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.11-139.0.hotfix.bz1959254.el8cp (5c0dc966af809fd1d429ec7bac48962a746af243) nautilus (stable)": 10
}
}
$ cat versions-before-hotfix.txt
{
"mon": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 1
},
"osd": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 3
},
"mds": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 2
},
"rgw": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 1
},
"overall": {
"ceph version 14.2.11-139.el8cp (b8e1f91c99491fb2e5ede748a1c0738ed158d0f5) nautilus (stable)": 10
}
}
Marking as verified
|