+++ This bug was initially created as a clone of Bug #2216803 +++ Description of problem (please be detailed as possible and provide log snippests): Discussed here- https://chat.google.com/room/AAAAREGEba8/JEtejTNWSEI Version of all relevant components (if applicable): OCP 4.13 ODF 4.13.0-rhodf Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. In the test_check_pods_status_after_node_failure, when a node is powered off, all the pods running on that node gets deleted except the rook-ceph-exporter pod which remains stuck in Terminating state and gets deleted when the node is powered on. 2. 3. Actual results: Console logs- https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/26008/consoleFull Must gather logs- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr7865b4745/jnk-pr7865b4745_20230620T140055/logs/failed_testcase_ocs_logs_1687276075/test_check_pods_status_after_node_failure_ocs_logs/jnk-pr7865b4745/ Expected results: Rook ceph exporter pod should get deleted when node is offline and shouldn't remain stuck in Terminating state until the node is powered on. Additional info: --- Additional comment from RHEL Program Management on 2023-06-22 21:45:56 IST --- This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag. --- Additional comment from Aman Agrawal on 2023-07-27 17:28:47 IST --- Hi Santosh, Could we pls prioritize the fix for this BZ? The dependent tests are repeatedly failing in CI for every z-stream release. Thanks! --- Additional comment from Santosh Pillai on 2023-07-27 20:30:15 IST --- (In reply to Aman Agrawal from comment #2) > Hi Santosh, > > Could we pls prioritize the fix for this BZ? > The dependent tests are repeatedly failing in CI for every z-stream release. > > Thanks! Hi Aman. This was merged upstream a few days back. I forgot to update the status here. I'll create a backport for downstream soon. --- Additional comment from Travis Nielsen on 2023-07-28 00:20:23 IST --- This will be fixed for the 4.14 release with https://github.com/red-hat-storage/rook/pull/501. Aman want to open a clone for 4.13.z?
Santosh please open a backport PR for 4.13, thanks
Backport PR for 4.13 https://github.com/red-hat-storage/rook/pull/502
Wait before you get the acks
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.13.3 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:5376