Bug 2178978 - One monitor is down after OCP upgrade and doesn't recover
Summary: One monitor is down after OCP upgrade and doesn't recover
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: odf-operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Nitin Goyal
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-03-16 10:45 UTC by Filip Balák
Modified: 2023-08-09 17:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-30 16:41:46 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 2178602 0 unspecified CLOSED [UI] ODF dashboard crashes when OCP is upgraded 2023-08-09 17:00:26 UTC

Description Filip Balák 2023-03-16 10:45:30 UTC
Description of problem (please be detailed as possible and provide log
snippests):
After OCP upgrade one monitor is reported as down and it doesn't recover in time.

Version of all relevant components (if applicable):
OCS 4.8.18-2

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Steps to Reproduce:
1. Prepare AWS IPI 3AZ RHCOS 3M 3W Cluster.
2. Perform OCP upgrade from OCP 4.8 to OCP 4.9.
3. Check ceph health.

Actual results:
Upgrade is done but a ceph monitor is down (Ceph cluster health is not OK. Health: HEALTH_WARN 1/3 mons down, quorum a,c).

Expected results:
Ceph should be healthy after upgrade.

Additional info:
Runs where this was observed:
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7177/#showFailuresLink
https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/7139/#showFailuresLink

Comment 1 Nitin Goyal 2023-03-16 11:41:00 UTC
Filip, Can you pls give us the working cluster, Above both clusters are gone.


Note You need to log in before you can comment on or make changes to this bug.