Bug 2105308 - [rbd-mirror]: secondary images reporting error (stopping_replay, stopped, error) which secondary seen split-brain and client blocklist
Summary: [rbd-mirror]: secondary images reporting error (stopping_replay, stopped, err...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD-Mirror
Version: 5.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 7.1
Assignee: Ilya Dryomov
QA Contact: Sunil Angadi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-08 13:50 UTC by Vasishta
Modified: 2024-01-23 16:32 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-01-23 16:32:09 UTC
Embargoed:
sangadi: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 56490 0 None None None 2022-07-08 13:50:08 UTC
Red Hat Issue Tracker RHCEPH-4688 0 None None None 2022-07-08 13:53:25 UTC

Description Vasishta 2022-07-08 13:50:08 UTC
Description of problem:
Configured mirroring with 26 images on both clusters with snapshot schedule set to 2 min on individual images. Ran IOs on few images.
(No relocate operations)

Created and deleted 25-75 images on one of the clusters with snapshot schedule.

Upon observing backlog of mirror snapshot copy to peer clusters, changed rbd-mirroring daemon on both clusters to a node with higher network capacity. relocated mirroring daemon in cluster with images with issue to another host.

scaled up number of monitors in cluster with primary images (with above issues) appending public_network.

Observed that set of images in cluster with 102 primary images + 26 secondary images reported that all images are in error state (some images fluctuating between (stopping_replay, stopped, error).

mirror image description were-
failed to refresh remote image
failed to unlink local peer from remote image
stopping replay
stopped

Version-Release number of selected component (if applicable):
16.2.8-65.el8cp

How reproducible:
Tried once

Steps to Reproduce:
(Mentioned in description )

Actual results:
All secondary images reporting error (some images fluctuating between (stopping_replay, stopped, error).

Expected results:
Secondary images were up+replying

Additional info:


Note You need to log in before you can comment on or make changes to this bug.