Bug 2162479

Summary: rbd mirror daemon unresponsive to images while and after rolling reprovision was being done to the cluster
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Vasishta <vashastr>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: NEW --- QA Contact:
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.3CC: ceph-eng-bugs, cephqe-warriors
Target Milestone: ---   
Target Release: 6.1z2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Vasishta 2023-01-19 17:22:51 UTC
Description of problem:
As part of rhcs 5.3 to 6.0 cluster upgrade, we performed rolling re-provision
*********************
That is -
drain out **all** (mgr, mon, rbd-mirror, node-exporter, osds , etc) daemons in a node to other nodes
re-provision node from 8.7 to 9.0 and upgrade to 9.1
Add node back to the cluster
Add back daemons other re-provisioned node

Repeat on other nodes of the cluster

Repeat on peer cluster.
********************

While re-provisioning initial cluster primary cluster, primary cluster daemon nodes have gone unresponsive to images.
- All images reported down
- newly created primary images reported unknown initially.
- secondary image snapshot fetching was too slow/confusing.

Version-Release number of selected component (if applicable):
16.2.10-87.el8cp

How reproducible:
Tried once

Steps to Reproduce:
MEntioned in description

Actual results:
Mirroing daemon unresponsive to local images

Expected results:
Snapshot Mirroring funstionality working as expected

Additional info: