Bug 2357179

Summary: cross namespace mirror group enters into split-brain on a normal relocate operation
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Chaitanya <cdommeti>
Component: RBD-MirrorAssignee: Prasanna Kumar Kalever <prasanna.kalever>
Status: CLOSED ERRATA QA Contact: Chaitanya <cdommeti>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 8.1CC: ceph-eng-bugs, cephqe-warriors, idryomov, tserlin
Target Milestone: ---   
Target Release: 8.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-19.2.1-109.el9cp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2025-06-26 12:22:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chaitanya 2025-04-03 12:35:29 UTC
Description of problem:
seeing 'up+error' and 'split-brain detected' when I do 'demote' on primary and 'promote' on secondary. (initial state is up+stopped on primary and Up+ Replaying on secondary)

root@ceph-rbd1-cd-cg-x49tk0-node2 ~]# rbd mirror group demote p1/ns1/g1
2025-04-03T07:59:50.180+0000 7f5791140640 -1 librbd::mirror::snapshot::GroupUnlinkPeerRequest: 0x559bac3c9ba0 handle_remove_group_snapshot: failed to remove image snapshot metadata: (30) Read-only file system
Group demoted to non-primary

[root@ceph-rbd2-cd-cg-x49tk0-node2 ~]# rbd mirror group promote p1/ns1/g1
Group promoted to primary

[root@ceph-rbd1-cd-cg-x49tk0-node2 ~]# rbd mirror group status p1/ns1/g1
g1:
 global_id:  6c3e1329-ac3a-46cf-8234-27c0a854d11a
 state:    up+error
 description: split-brain detected
 service:   ceph-rbd1-cd-cg-x49tk0-node5.qphwpz on ceph-rbd1-cd-cg-x49tk0-node5
 last_update: 2025-04-03 08:03:01
 images:
 peer_sites:
  name: ceph-rbd2
  state: up+stopped
  description: 
  last_update: 2025-04-03 08:03:05
  images:
   image:    3/49bec464-6559-49bd-a2b3-61982915cb07
   state:    up+stopped
   description: local image is primary

   image:    3/5904fee2-e6da-4e2f-ba1a-a80f9dfb2f9c
   state:    up+stopped
   description: local image is primary


This is happening only with the groups on namespaces.

As per dev, this is already WIP and is available in the next build.

Raising this BZ for tracking/reference purpose. 

Version-Release number of selected component (if applicable):
ceph version 19.2.1-57.el9cp

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
split brain seen

Expected results:
No split brain should be seen

Additional info:

Comment 6 errata-xmlrpc 2025-06-26 12:22:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2025:9775