Description of problem (please be detailed as possible and provide log snippests): On one vSphere UPI ENCRYPTION 1AZ RHCOS VSAN LSO VMDK 3M 3W cluster two ceph mon crashed: HEALTH_WARN 2 daemons have recently crashed [WRN] RECENT_CRASH: 2 daemons have recently crashed mon.a crashed on host rook-ceph-mon-a-76df6b948c-fdlpj at 2022-05-08T08:47:27.674276Z mon.b crashed on host rook-ceph-mon-b-57c6466c5d-zvp5w at 2022-05-08T08:47:42.688845Z Version of all relevant components (if applicable): OCP 4.10.0-0.nightly-2022-05-07-205137 ODF: ocs-registry:4.10.1-5 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? We saw this issue only once, re-triggered job passed, so not sure about reproducibility. Can this issue reproduce from the UI? N/A If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy OCP cluster. 2. Deploy ODF on top of OCP. 3. Check ceph health status. Actual results: Ceph cluster health is not OK. Health: HEALTH_WARN 2 daemons have recently crashed Expected results: Ceph health will be HEALTH_OK, not daemon will crash. Additional info: I'll post links to the job and must-gather logs in following comment.
Verifying based on multiple recent passed executions and also based on the fact, that the main bug 2086419 was fixed in ceph 16.2.8-49.el8cp and verified against 16.2.8-50.el8cp. The ODF CI executions were triggered against following versions (this is only quick selection): * ODF: 4.11.0-105, Ceph: 16.2.8-59.el8cp * ODF: 4.11.0-109, Ceph: 16.2.8-59.el8cp * ODF: 4.11.0-111, Ceph: 16.2.8-65.el8cp * ODF: 4.11.0-113, Ceph: 16.2.8-65.el8cp * ODF: 4.11.0-127, Ceph: 16.2.8-80.el8cp * ODF: 4.11.0-129, Ceph: 16.2.8-80.el8cp * ODF: 4.11.0-131, Ceph: 16.2.8-84.el8cp * ODF: 4.11.0-137, Ceph: 16.2.8-84.el8cp >> VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.11.0 security, enhancement, & bugfix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6156