Bug 2135626
| Summary: | Do not use rook master tag in job template [4.12] | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Subham Rai <srai> | |
| Component: | ocs-operator | Assignee: | Subham Rai <srai> | |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Itzhak <ikave> | |
| Severity: | low | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.9 | CC: | ebenahar, kramdoss, muagarwa, ocs-bugs, odf-bz-bot, sostapov | |
| Target Milestone: | --- | |||
| Target Release: | ODF 4.12.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | 4.12.0-113 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2135631 2135632 2135636 2135736 (view as bug list) | Environment: | ||
| Last Closed: | 2023-02-08 14:06:28 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2135631, 2135632, 2135636, 2135736 | |||
|
Description
Subham Rai
2022-10-18 06:14:25 UTC
*** Bug 2135736 has been marked as a duplicate of this bug. *** @shubam - 1) what would be the steps to verify this bug? 2) Do we need to add any new tests due to this change? I tested it with vSphere OCP 4.12 and ODF 4.12 dynamic cluster. The steps I did to reproduce the bug: 1. I deleted a disk from vSphere. 2. Check the osd status and observe the osd that is down: $ oc get pods -o wide | grep osd rook-ceph-osd-0-76748c9b6-vpwz9 2/2 Running 0 69m 10.130.2.22 compute-1 <none> <none> rook-ceph-osd-1-54749698d7-2jp48 1/2 CrashLoopBackOff 3 (41s ago) 69m 10.129.2.20 compute-0 <none> <none> rook-ceph-osd-2-99f58954-k42nk 2/2 Running 0 68m 10.128.2.20 compute-2 <none> <none> 3. I scaled the osd-1 deployment and deleted the osd-1 pod, as mentioned in the doc. 4. Run the "ocs-osd-removal" job, and see it completed successfully: $ oc get jobs NAME COMPLETIONS DURATION AGE ocs-osd-removal-job 1/1 11s 22s rook-ceph-osd-prepare-ocs-deviceset-0-data-0vbd6h 1/1 72s 76m rook-ceph-osd-prepare-ocs-deviceset-1-data-0269xm 1/1 40s 76m rook-ceph-osd-prepare-ocs-deviceset-2-data-0w6njt 0/1 1s 1s $ oc get pod -l job-name=ocs-osd-removal-job -n openshift-storage NAME READY STATUS RESTARTS AGE ocs-osd-removal-job-8b449 0/1 Completed 0 59s 5. Check the logs of the "ocs-osd-removal-job" $ oc logs ocs-osd-removal-job-8b449 2022-11-22 11:41:25.506021 I | rookcmd: starting Rook v4.12.0-0.e237b7ff0b9225db1a5f8a95dc50f9f8e2d55206 with arguments '/usr/local/bin/rook ceph osd remove --osd-ids=1 --force-osd-removal true' 2022-11-22 11:41:25.506069 I | rookcmd: flag values: --force-osd-removal=true, --help=false, --log-level=DEBUG, --operator-image=, --osd-ids=1, --preserve-pvc=false, --service-account= We can see that in the first line, the rook version is v4.12.0-0 without the master tag. additional info: Link to the Jenkins job: https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/18259/ Versions: OC version: Client Version: 4.10.24 Server Version: 4.12.0-0.nightly-2022-11-22-012345 Kubernetes Version: v1.25.2+5533733 OCS verison: ocs-operator.v4.12.0-114.stable OpenShift Container Storage 4.12.0-114.stable Succeeded Cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-11-22-012345 True False 6h44m Cluster version is 4.12.0-0.nightly-2022-11-22-012345 Rook version: rook: v4.12.0-0.e237b7ff0b9225db1a5f8a95dc50f9f8e2d55206 go: go1.18.7 Ceph version: ceph version 16.2.10-72.el8cp (3311949c2d1edf5cabcc20ba0f35b4bfccbf021e) pacific (stable) According to the two comments above, I am moving the bug to Verified. |