Bug 2176079

Summary: [MDR][Stretch Cluster] Monitor crash observed during upgrade from 5.3 to 5.3z1 GA Versions
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Pawan <pdhiran>
Component: RADOSAssignee: Neha Ojha <nojha>
Status: NEW --- QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.3CC: bhubbard, ceph-eng-bugs, cephqe-warriors, sostapov, vumrao
Target Milestone: ---   
Target Release: 6.1z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 Brad Hubbard 2023-03-10 04:54:58 UTC
https://github.com/ceph/ceph/blob/19428c64d26b4faec85822b08771a37159f8442f/src/mon/OSDMonitor.cc#L14668-L14679

So it appears we are trying to manipulate a bucket that no longer exists in the
current crushmap and that is considered a fatal error.

Looking at that it looks like we might be able to get a much better idea of
what's happening by gathering the monitor log at debug_mon=20 and debug_paxos=20
when this occurs. Are you in a position to be able to attempt to reproduce this
with the above log settings?

Comment 4 Scott Ostapovicz 2023-07-12 12:36:43 UTC
Missed the 6.1 z1 window.  Retargeting to 6.1 z2.