Bug 2178978

Summary: One monitor is down after OCP upgrade and doesn't recover
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Filip Balák <fbalak>
Component: odf-operatorAssignee: Nitin Goyal <nigoyal>
Status: CLOSED NOTABUG QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.8CC: muagarwa, ocs-bugs, odf-bz-bot, pbalogh, tnielsen
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-30 16:41:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Filip Balák 2023-03-16 10:45:30 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After OCP upgrade one monitor is reported as down and it doesn't recover in time.

Version of all relevant components (if applicable):
OCS 4.8.18-2

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Steps to Reproduce:
1. Prepare AWS IPI 3AZ RHCOS 3M 3W Cluster.
2. Perform OCP upgrade from OCP 4.8 to OCP 4.9.
3. Check ceph health.

Actual results:
Upgrade is done but a ceph monitor is down (Ceph cluster health is not OK. Health: HEALTH_WARN 1/3 mons down, quorum a,c).

Expected results:
Ceph should be healthy after upgrade.

Additional info:
Runs where this was observed:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7177/#showFailuresLink
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7139/#showFailuresLink

Comment 1 Nitin Goyal 2023-03-16 11:41:00 UTC
Filip, Can you pls give us the working cluster, Above both clusters are gone.