Bug 2227161

Summary: Rook ceph exporter pod remains stuck in terminating state when node is offline
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aman Agrawal <amagrawa>
Component: rookAssignee: Santosh Pillai <sapillai>
Status: POST --- QA Contact: Neha Berry <nberry>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.13CC: kramdoss, muagarwa, nberry, odf-bz-bot, sapillai, tnielsen
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2216803 Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2216803    
Bug Blocks:    

Description Aman Agrawal 2023-07-28 08:00:16 UTC
+++ This bug was initially created as a clone of Bug #2216803 +++

Description of problem (please be detailed as possible and provide log
snippests):
Discussed here- https://chat.google.com/room/AAAAREGEba8/JEtejTNWSEI

Version of all relevant components (if applicable):
OCP 4.13
ODF 4.13.0-rhodf


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. In the test_check_pods_status_after_node_failure, when a node is powered off, all the pods running on that node gets deleted except the rook-ceph-exporter pod which remains stuck in Terminating state and gets deleted when the node is powered on.
2.
3.


Actual results: Console logs- https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/26008/consoleFull

Must gather logs- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr7865b4745/jnk-pr7865b4745_20230620T140055/logs/failed_testcase_ocs_logs_1687276075/test_check_pods_status_after_node_failure_ocs_logs/jnk-pr7865b4745/


Expected results: Rook ceph exporter pod should get deleted when node is offline and shouldn't remain stuck in Terminating state until the node is powered on.


Additional info:

--- Additional comment from RHEL Program Management on 2023-06-22 21:45:56 IST ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Aman Agrawal on 2023-07-27 17:28:47 IST ---

Hi Santosh,

Could we pls prioritize the fix for this BZ?
The dependent tests are repeatedly failing in CI for every z-stream release.

Thanks!

--- Additional comment from Santosh Pillai on 2023-07-27 20:30:15 IST ---

(In reply to Aman Agrawal from comment #2)
> Hi Santosh,
> 
> Could we pls prioritize the fix for this BZ?
> The dependent tests are repeatedly failing in CI for every z-stream release.
> 
> Thanks!

Hi Aman. This was merged upstream a few days back. I forgot to update the status here. I'll create a backport for downstream soon.

--- Additional comment from Travis Nielsen on 2023-07-28 00:20:23 IST ---

This will be fixed for the 4.14 release with https://github.com/red-hat-storage/rook/pull/501.
Aman want to open a clone for 4.13.z?

Comment 2 Travis Nielsen 2023-07-28 14:06:37 UTC
Santosh please open a backport PR for 4.13, thanks

Comment 3 Santosh Pillai 2023-07-31 02:50:33 UTC
Backport PR for 4.13 https://github.com/red-hat-storage/rook/pull/502

Comment 5 Mudit Agarwal 2023-07-31 06:04:07 UTC
Wait before you get the acks