Bug 2232106

Summary: Cluster need few hours to recover after shutting down 2 worker nodes (10 minutes shut down)
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Aviad Polak <apolak>
Component: unclassifiedAssignee: Mudit Agarwal <muagarwa>
Status: NEW --- QA Contact: Elad <ebenahar>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.14CC: odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Aviad Polak 2023-08-15 10:16:09 UTC
after running automated test: https://github.com/red-hat-storage/ocs-ci/blob/master/tests/manage/z_cluster/nodes/test_check_pod_status_after_two_nodes_shutdown_recovery.py as part of our Tier4 testing:
 flow is to shut down 2 (out of 3) worker nodes for 10 minutes, then start them again and check cluster status. after the run few pods went CLBO or init status. it took few hours before cluster recovered

Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
ODF: full_version: 4.14.0-102
OCP: 
openshiftVersion: 4.14.0-0.nightly-2023-08-08-005757
releaseClientVersion: 4.14.0-0.ci-2023-07-11-133509
serverVersion:
  buildDate: "2023-08-03T17:26:35Z"
  compiler: gc
  gitCommit: ee9c1a1f13b06f5e2a79dcbd06285ec3f8315448
  gitTreeState: clean
  gitVersion: v1.27.3+e123787
  goVersion: go1.20.5 X:strictfipsruntime
  major: "1"
  minor: "27"
  platform: linux/amd64


Steps to Reproduce:
1. Run automated test as described above