Tested using ceph version 16.2.10-260.el8cp (b20e1a5452628262667a6b060687917fde010343) pacific (stable) Followed these test steps --- 1. Set up bidirectional mirroring on a test pool as usual 2. Verify that "rbd mirror pool status" reports "health: OK" on both clusters 3. Grab service_id and instance_id from "rbd mirror pool status --verbose" output on cluster B 4. Grab peer UUID ("UUID: ...", not "Mirror UUID: ...") from "rbd mirror pool info" output on cluster B 5. Run "rbd mirror pool peer set <peer UUID from step 4> client client.invalid" command on cluster B 6. Wait 30-90 seconds and verify that "rbd mirror pool status" reports "health: ERROR" on cluster B and "health: WARNING" on cluster A 7. Run "rbd mirror pool peer set <peer UUID from step 4> client client.rbd-mirror-peer" command on cluster B 8. Wait 30-90 seconds and verify that "rbd mirror pool status" reports "health: OK" on both clusters again 9. Grab service_id and instance_id from "rbd mirror pool status --verbose" output on cluster B again 10. Verify that service_id from step 3 is equal to the one from step 9 11. Verify that instance_id from step 3 is less than the one from step 9 On primary --- [ceph: root@ceph-rbd1-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: OK daemon health: OK image health: OK images: 1 total 1 replaying DAEMONS service 24425: instance_id: 24611 client_id: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw hostname: ceph-rbd1-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: true health: OK IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: up+stopped description: local image is primary service: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw on ceph-rbd1-sangadi-bz-tfrmy6-node5 last_update: 2024-06-04 08:54:15 peer_sites: name: ceph-rbd2 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1717491240,"remote_snapshot_timestamp":1717491240,"replay_state":"idle"} last_update: 2024-06-04 08:54:20 On secondary --- [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: OK daemon health: OK image health: OK images: 1 total 1 replaying DAEMONS service 24418: instance_id: 24586 client_id: ceph-rbd2-sangadi-bz-tfrmy6-node5.hfmyci hostname: ceph-rbd2-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: true health: OK IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1717491000,"remote_snapshot_timestamp":1717491000,"replay_state":"idle"} service: ceph-rbd2-sangadi-bz-tfrmy6-node5.hfmyci on ceph-rbd2-sangadi-bz-tfrmy6-node5 last_update: 2024-06-04 08:50:20 peer_sites: name: ceph-rbd1 state: up+stopped description: local image is primary last_update: 2024-06-04 08:50:45 [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool info -p ec_img_pool_EXnhoJmOKa Mode: image Site Name: ceph-rbd2 Peer Sites: UUID: c468309d-1c30-4e4f-83df-a4c5550a84d5 Name: ceph-rbd1 Mirror UUID: 51e60cf3-b64f-4efd-a3ee-ab6240c36f40 Direction: rx-tx Client: client.rbd-mirror-peer set the invalid client id for the mirror peer [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool peer set --pool ec_img_pool_EXnhoJmOKa c468309d-1c30-4e4f-83df-a4c5550a84d5 client client.invalid [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# as expected, the status gets changed accordingly On secondary --- [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: ERROR daemon health: ERROR image health: OK images: 1 total 1 stopped DAEMONS service 24418: instance_id: 24586 client_id: ceph-rbd2-sangadi-bz-tfrmy6-node5.hfmyci hostname: ceph-rbd2-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: false health: ERROR callouts: unable to connect to remote cluster IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: down+stopped description: stopped last_update: 2024-06-04 08:59:52 peer_sites: name: ceph-rbd1 state: up+stopped description: local image is primary last_update: 2024-06-04 09:02:15 On primary --- [ceph: root@ceph-rbd1-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: WARNING daemon health: OK image health: WARNING images: 1 total 1 unknown DAEMONS service 24425: instance_id: 24611 client_id: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw hostname: ceph-rbd1-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: true health: OK IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: up+stopped description: local image is primary service: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw on ceph-rbd1-sangadi-bz-tfrmy6-node5 last_update: 2024-06-04 09:03:15 peer_sites: name: ceph-rbd2 state: down+stopped description: stopped last_update: 2024-06-04 08:59:52 after resetting it back to the client as client.rbd-mirror-peer status came back properly on secondary --- [ceph: root@ceph-rbd2-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: OK daemon health: OK image health: OK images: 1 total 1 replaying DAEMONS service 24418: instance_id: 24742 client_id: ceph-rbd2-sangadi-bz-tfrmy6-node5.hfmyci hostname: ceph-rbd2-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: true health: OK IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1717492080,"remote_snapshot_timestamp":1717492080,"replay_state":"idle"} service: ceph-rbd2-sangadi-bz-tfrmy6-node5.hfmyci on ceph-rbd2-sangadi-bz-tfrmy6-node5 last_update: 2024-06-04 09:08:53 peer_sites: name: ceph-rbd1 state: up+stopped description: local image is primary last_update: 2024-06-04 09:08:45 on primary --- [ceph: root@ceph-rbd1-sangadi-bz-tfrmy6-node1-installer /]# rbd mirror pool status -p ec_img_pool_EXnhoJmOKa --verbose health: OK daemon health: OK image health: OK images: 1 total 1 replaying DAEMONS service 24425: instance_id: 24611 client_id: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw hostname: ceph-rbd1-sangadi-bz-tfrmy6-node5 version: 16.2.10-260.el8cp leader: true health: OK IMAGES ec_imageBkzgATmZuh: global_id: 8b2121a1-93d6-4d02-80e0-d95324a285e9 state: up+stopped description: local image is primary service: ceph-rbd1-sangadi-bz-tfrmy6-node5.qxijuw on ceph-rbd1-sangadi-bz-tfrmy6-node5 last_update: 2024-06-04 09:09:45 peer_sites: name: ceph-rbd2 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1717492140,"remote_snapshot_timestamp":1717492140,"replay_state":"idle"} last_update: 2024-06-04 09:09:53 also noted that service_id remains the same and instance_id less than it's previous to next. also RBD mirror sanity looks good Moving it to Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 5.3 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4118