Bug 2227161 - Rook ceph exporter pod remains stuck in terminating state when node is offline
Summary: Rook ceph exporter pod remains stuck in terminating state when node is offline
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.13.3
Assignee: Santosh Pillai
QA Contact: Aman Agrawal
URL:
Whiteboard:
Depends On: 2216803
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-28 08:00 UTC by Aman Agrawal
Modified: 2023-09-27 14:24 UTC (History)
6 users (show)

Fixed In Version: 4.13.3-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2216803
Environment:
Last Closed: 2023-09-27 14:22:42 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 502 0 None open Bug 2227161: core: force delete rook-ceph-exporter pod 2023-07-31 02:50:33 UTC
Red Hat Product Errata RHSA-2023:5376 0 None None None 2023-09-27 14:24:13 UTC

Description Aman Agrawal 2023-07-28 08:00:16 UTC
+++ This bug was initially created as a clone of Bug #2216803 +++

Description of problem (please be detailed as possible and provide log
snippests):
Discussed here- https://chat.google.com/room/AAAAREGEba8/JEtejTNWSEI

Version of all relevant components (if applicable):
OCP 4.13
ODF 4.13.0-rhodf


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?


Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. In the test_check_pods_status_after_node_failure, when a node is powered off, all the pods running on that node gets deleted except the rook-ceph-exporter pod which remains stuck in Terminating state and gets deleted when the node is powered on.
2.
3.


Actual results: Console logs- https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/26008/consoleFull

Must gather logs- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jnk-pr7865b4745/jnk-pr7865b4745_20230620T140055/logs/failed_testcase_ocs_logs_1687276075/test_check_pods_status_after_node_failure_ocs_logs/jnk-pr7865b4745/


Expected results: Rook ceph exporter pod should get deleted when node is offline and shouldn't remain stuck in Terminating state until the node is powered on.


Additional info:

--- Additional comment from RHEL Program Management on 2023-06-22 21:45:56 IST ---

This bug having no release flag set previously, is now set with release flag 'odf‑4.14.0' to '?', and so is being proposed to be fixed at the ODF 4.14.0 release. Note that the 3 Acks (pm_ack, devel_ack, qa_ack), if any previously set while release flag was missing, have now been reset since the Acks are to be set against a release flag.

--- Additional comment from Aman Agrawal on 2023-07-27 17:28:47 IST ---

Hi Santosh,

Could we pls prioritize the fix for this BZ?
The dependent tests are repeatedly failing in CI for every z-stream release.

Thanks!

--- Additional comment from Santosh Pillai on 2023-07-27 20:30:15 IST ---

(In reply to Aman Agrawal from comment #2)
> Hi Santosh,
> 
> Could we pls prioritize the fix for this BZ?
> The dependent tests are repeatedly failing in CI for every z-stream release.
> 
> Thanks!

Hi Aman. This was merged upstream a few days back. I forgot to update the status here. I'll create a backport for downstream soon.

--- Additional comment from Travis Nielsen on 2023-07-28 00:20:23 IST ---

This will be fixed for the 4.14 release with https://github.com/red-hat-storage/rook/pull/501.
Aman want to open a clone for 4.13.z?

Comment 2 Travis Nielsen 2023-07-28 14:06:37 UTC
Santosh please open a backport PR for 4.13, thanks

Comment 3 Santosh Pillai 2023-07-31 02:50:33 UTC
Backport PR for 4.13 https://github.com/red-hat-storage/rook/pull/502

Comment 5 Mudit Agarwal 2023-07-31 06:04:07 UTC
Wait before you get the acks

Comment 18 errata-xmlrpc 2023-09-27 14:22:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.13.3 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:5376


Note You need to log in before you can comment on or make changes to this bug.