Bug 2279538

Summary:	The status of all the rook-ceph pods not associated with the worker node doesn't change after the worker node is shut down
Product:	[Red Hat Storage] Red Hat OpenShift Data Foundation	Reporter:	Itzhak <ikave>
Component:	rook	Assignee:	Santosh Pillai <sapillai>
Status:	CLOSED NOTABUG	QA Contact:	Neha Berry <nberry>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	4.16	CC:	odf-bz-bot, sapillai, tnielsen
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-05-09 11:36:14 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Itzhak 2024-05-07 12:14:31 UTC

Description of problem (please be detailed as possible and provide log
snippests):
The status of all the rook-ceph pods not associated with the worker node doesn't change after the worker node is shut down.
However, the expectation is that after shutting down a worker node, at least one of the rook-ceph pods' statuses not associated with the worker node will change, or one of the pods will be deleted.

Version of all relevant components (if applicable):
vSphere UPI OCP 4.16, ODF 4.16

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
No.

Is there any workaround available to the best of your knowledge?
Yes. After powering on the worker node, the pods return to a Ready state.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes, but it's not consistent.

Can this issue reproduce from the UI?
No.

If this is a regression, please provide more details to justify this:
I am not sure. I have seen this error also in ODF 4.15.

Steps to Reproduce:
1. Shtting down a worker node
2. Check the status of the rook-ceph pods not associated with the worker node.

Actual results:
The status of all rook-ceph pods not associated with the worker node didn't change.

Expected results:
At least one of the rook-ceph pods not associated with the worker node should have its status changed, or one of the pods should be deleted.

Additional info:

Report portal link: https://reportportal-ocs4.apps.ocp-c1.prod.psi.redhat.com/ui/#ocs/launches/all/20685/991918/991929/log.

Versions: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-020vup1cs33-t4a/j-020vup1cs33-t4a_20240423T010622/logs/test_report_1713834101.html

Comment 3 Santosh Pillai 2024-05-08 11:55:04 UTC

(In reply to Itzhak from comment #0)
> Description of problem (please be detailed as possible and provide log
> snippests):

> Actual results:
> The status of all rook-ceph pods not associated with the worker node didn't
> change.

Slight confusion. Why would the status of the pods, "not" associated with the worker node that was shut down, will change?

Are you referring to the pods that were on the node that was shut down?

> 
> Expected results:
> At least one of the rook-ceph pods not associated with the worker node
> should have its status changed, or one of the pods should be deleted.

Comment 4 Itzhak 2024-05-08 15:44:10 UTC

TBH, I am not quite sure where I originally initiated this step. I remember we also expect a change from the pods "not" in the worker node. If this doesn't make sense, I can close the bug and delete this step from the ocs-ci test.

Comment 5 Santosh Pillai 2024-05-09 03:59:03 UTC

To the best of my knowledge, for both the graceful and non-graceful shutdown of the nodes, only the pods that were running on node (that was shutdown) should be affected. It should not affect the status of pods on other nodes. 

So my suggestion would be to close this BZ. 

Or if there is a valid reason why ocs-ci is having this test to check the status of "not" associated pods, then please update the BZ with that reasoning. 

For now, I'll move it to 4.17.

Comment 6 Itzhak 2024-05-09 11:36:14 UTC

After a discussion with other QE members, we decided that we could remove this step. 
So, I am closing the BZ and will fix the test in the ocs-ci accordingly.