Bug 2216803
| Summary: | Rook ceph exporter pod remains stuck in terminating state when node is offline | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aman Agrawal <amagrawa> | |
| Component: | rook | Assignee: | Santosh Pillai <sapillai> | |
| Status: | CLOSED ERRATA | QA Contact: | Itzhak <ikave> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.13 | CC: | ikave, muagarwa, nberry, odf-bz-bot, sapillai, sheggodu, tnielsen | |
| Target Milestone: | --- | |||
| Target Release: | ODF 4.16.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | 4.16.0-94 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2227161 (view as bug list) | Environment: | ||
| Last Closed: | 2024-07-17 13:11:03 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2227161 | |||
|
Comment 7
Itzhak
2024-04-08 12:54:59 UTC
When testing with vSphere UPI 4.16, The test fails. The failure is different from the previous error. It occurs because after shutting down the worker node, the status of the rook-ceph pods not in the node did not change after 6 minutes. So, in summary, the current issue with 4.16 vSphere UPI is the following: 1. Shutting down a worker node. 2. Waiting for the status of the rook-ceph pods not in the node to change. Actual result: The status of the rook-ceph pods not in the node did not change after 6 minutes Expected result: The status of the rook-ceph pods not in the node should change, or the pods should be deleted Additional info: Report portal link: https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/20685/991918/991929/log. Versions: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-020vup1cs33-t4a/j-020vup1cs33-t4a_20240423T010622/logs/test_report_1713834101.html @ikave Comment 12 is a bit confusing for me. The BZ is about the `rook-ceph-exporter` stuck in terminating state when node if offline. We had fixed that by adding changes that removed any rook-ceph-exporter pod stuck in terminating state. >>Expected result: >> The status of the rook-ceph pods not in the node should change, or the pods should be deleted What rook-ceph pods are you referring to? Yes, this is a different issue. The test checks that the status of the rook-ceph pods not in the node will change after shutting down a worker node. This is the first step before checking the issue described in the BZ. Itzhak, please open a new issue for the recent failure. I raised a new BZ: https://bugzilla.redhat.com/show_bug.cgi?id=2279538, regarding the issue mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2216803#c12. I will wait another few days to see if the error here appears again, and if not, I will close it. I reran the test "test_check_pods_status_after_node_failure" with AWS and vSphere 4.16, and it went fine. Also, I checked the console output to see that the process was accurate. AWS test result: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ikave-aws416/ikave-aws416_20240521T052423/logs/test_report_1716300838.html. Therefore, I am moving the BZ to Verified. Please update the RDT flag/text appropriately. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591 |