This is a crash during shutdown, so it has very little user impact. Additionally it is a race condition seen rarely in the thousands of runs upstream. Thus marking it low/low severity and priority.
*** Bug 1842536 has been marked as a duplicate of this bug. ***
*** Bug 1879962 has been marked as a duplicate of this bug. ***
(In reply to Josh Durgin from comment #4) > This is a crash during shutdown, so it has very little user impact. > Additionally it is a race condition seen rarely in the thousands of runs > upstream. Thus marking it low/low severity and priority. It was seen in OCS in a customer deployment, where 2 MONs crashed. I'm raising it to High/High for the time being, in the hope to understand when it happens and how OCS recovers from it.
Assigning this to the 5.0 rc so it can be attached to the OCS 4.8 release.
(In reply to Yaniv Kaul from comment #8) > (In reply to Josh Durgin from comment #4) > > This is a crash during shutdown, so it has very little user impact. > > Additionally it is a race condition seen rarely in the thousands of runs > > upstream. Thus marking it low/low severity and priority. > > It was seen in OCS in a customer deployment, where 2 MONs crashed. I'm > raising it to High/High for the time being, in the hope to understand when > it happens and how OCS recovers from it. Obviously we should not crash, however there's no user impact here. It's an assert hit when the monitor is already shutting down. OCS recovers by continuing to do what it was already going to do - start up new monitors. That we expose things with no user impact as alerts in OCS is a supportability bug.
(In reply to Josh Durgin from comment #10) > (In reply to Yaniv Kaul from comment #8) > > (In reply to Josh Durgin from comment #4) > > > This is a crash during shutdown, so it has very little user impact. > > > Additionally it is a race condition seen rarely in the thousands of runs > > > upstream. Thus marking it low/low severity and priority. > > > > It was seen in OCS in a customer deployment, where 2 MONs crashed. I'm > > raising it to High/High for the time being, in the hope to understand when > > it happens and how OCS recovers from it. > > Obviously we should not crash, however there's no user impact here. > > It's an assert hit when the monitor is already shutting down. > OCS recovers by continuing to do what it was already going to do - start up > new monitors. > > That we expose things with no user impact as alerts in OCS is a > supportability bug. The impact is indeed indirect - the health is not OK and cannot be solved without support.
Created attachment 1721173 [details] mon logs
We'd need a coredump or logs messenger debugging to debug this. Is it reproducible?
Raz - can we reproduce as Josh asked in comment 14 above?
*** Bug 1953345 has been marked as a duplicate of this bug. ***
Pawan adding needinfo on you for tracking this BZ recreation.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.1 Security, Enhancement, and Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:1174