Bug 1846095
| Summary: | During OCS upgrade: No wait time enforced between OSD pod respins in case there is change only in ROOK_CEPH_IMAGE and not CEPH_IMAGE | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Neha Berry <nberry> |
| Component: | rook | Assignee: | Travis Nielsen <tnielsen> |
| Status: | CLOSED ERRATA | QA Contact: | Aviad Polak <apolak> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.4 | CC: | madam, ocs-bugs, shan, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | OCS 4.5.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-09-15 10:17:41 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Neha Berry
2020-06-10 18:22:49 UTC
Since Rook v1.3 upstream (and OCS 4.5), the OSD upgrade behavior has already changed. Previously, there was only a wait when upgrading the OSDs if the ceph image was updated, but not if the Rook image was upgrade. This is what you are seeing in the 4.3 and 4.4 releases. Now the OSD upgrade behavior is to check if there is a difference in the pod spec to determine if we should wait during the upgrade. I wouldn't expect to hit this issue anymore. @leseb, Please correct me if needed. @Neha in that case, please confirm if it is already fixed in 4.5 builds. That's correct Travis. Acking as fixed in 4.5 and moving to ON_QA to validate. Hi Travis, With current Upgrade builds - OCS 4.4.2 and OCS 4.5, even if we select 2 builds whose Ceph version were same internally, e.g. OCS 4.5 (v4.5.0-43.ci) , OCS 4.4.2 -GA, these are the differences, hence replicating the exact same behavior to verify is tough 1. OCS 4.4 had both rhceph and rook-ceph-rhel8 versions in the pod containers. http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-vu1cs33-t1/jnk-vu1cs33-t1_20200805T161916/logs/failed_testcase_ocs_logs_1596649015/test_add_capacity_ocs_logs/ocs_must_gather/quay-io-rhceph-dev-ocs-must-gather-sha256-183cc9be0eaec7e3ecf74cce99cfe511f296f1e023798bb5296953d3c3ffb14f/ceph/namespaces/openshift-storage/pods/rook-ceph-osd-0-777ff99fcd-dxjv4/rook-ceph-osd-0-777ff99fcd-dxjv4.yaml 2. OCS 4.5 only has rhceph-dev http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bug-1860418/must-gather.local.6020862434318975465/ceph/namespaces/openshift-storage/pods/rook-ceph-osd-0-b7859999b-xr5q6/rook-ceph-osd-0-b7859999b-xr5q6.yaml So, could you let us know what all things we need to verify or is there any other upgrade path by which we can test this @Neha You could simulate the upgrade scenarios with the following: 1) Simulate only a change in the ceph image - Install OCS 4.5 - Change the ceph image tag so that it appears to be a different image and set it in the storage cluster CR - Watch that the ceph pods are all updated, and pod restarts wait for clean PGs 2) Simulate that the rook deployment has changed - Install OCS 4.5 - Change something in the deployment/pod spec for an OSD, such as add a new label - Restart the rook operator - Watch that the OSD is restarted because its pod spec changed Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754 |