Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2105308

Summary:	[rbd-mirror]: secondary images reporting error (stopping_replay, stopped, error) which secondary seen split-brain and client blocklist
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Vasishta <vashastr>
Component:	RBD-Mirror	Assignee:	Ilya Dryomov <idryomov>
Status:	CLOSED INSUFFICIENT_DATA	QA Contact:	Sunil Angadi <sangadi>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	5.2	CC:	ceph-eng-bugs, cephqe-warriors, idryomov, jdurgin, sangadi, vereddy
Target Milestone:	---	Flags:	sangadi: needinfo+
Target Release:	7.1
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2024-01-23 16:32:09 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Vasishta 2022-07-08 13:50:08 UTC

Description of problem:
Configured mirroring with 26 images on both clusters with snapshot schedule set to 2 min on individual images. Ran IOs on few images.
(No relocate operations)

Created and deleted 25-75 images on one of the clusters with snapshot schedule.

Upon observing backlog of mirror snapshot copy to peer clusters, changed rbd-mirroring daemon on both clusters to a node with higher network capacity. relocated mirroring daemon in cluster with images with issue to another host.

scaled up number of monitors in cluster with primary images (with above issues) appending public_network.

Observed that set of images in cluster with 102 primary images + 26 secondary images reported that all images are in error state (some images fluctuating between (stopping_replay, stopped, error).

mirror image description were-
failed to refresh remote image
failed to unlink local peer from remote image
stopping replay
stopped

Version-Release number of selected component (if applicable):
16.2.8-65.el8cp

How reproducible:
Tried once

Steps to Reproduce:
(Mentioned in description )

Actual results:
All secondary images reporting error (some images fluctuating between (stopping_replay, stopped, error).

Expected results:
Secondary images were up+replying

Additional info: