Description of problem: In a cluster with different cluster and public networks, While upgrading the cluster using ceph orch upgrade, while ceph orch upgrade status was reporting that osds were being upgraded, rbd mirror pool status reported that ============================= health: ERROR daemon health: WARNING image health: ERROR images: 106 total 79 error 26 replaying 1 stopping_replay ============================ ***After 13+ minutes, pool status, image status were back to OK.*** There was no RECENT image operations involved. (Failover and failback) The cluster was hosting ~26 secondary and ~80 primary images The peer cluster reported all images as unknown Even peer cluster reported all okay after 13+ minutes. Later After ~2.5 hours when pool mirror status was observed, ============================== health: ERROR daemon health: WARNING image health: ERROR images: 106 total 3 error 26 replaying 2 stopping_replay 75 stopped ============================== Observed after around 10 hours later, pool status is same. Upon observing snapshot schedule, it was stopped on 1/26 primary images. Mirroring on images stopped some time around after the upgrade was success. Version-Release number of selected component (if applicable): (from ceph orch ps) rbd-mirror.e22-h24-b01-fc640.xrklhe e22-h24-b01-fc640.rdu2.scalelab.redhat.com running (21h) 7m ago 21h 1221M - 16.2.10-82.el8cp 9600fe784925 79bd65b3b55d How reproducible: Observed once Steps to Reproduce: 1. Explained above in description Actual results: Expected results: No snapshot schedule miss and healthy mirroring Additional info: Observed multiple blocklists from OSDs, will provide more details in upcoming
Observed snapshot scheduling for over multiple upgrades, for over week of period. Tried multiple rbd mirror daemon restarts. Did not observe snapshot schedule being stuck for images. Moving to Verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat Ceph Storage 6.1 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:3623