Description of problem (please be detailed as possible and provide log snippests): with ceph version 17.2.6-70.0.TEST.bz2119217.el9cp ( this is test image ) we use for snaptrim bz verification ( bz2119217 ) we see that "rbd mirror snapshot schedule ls -R" and "rbd mirror snapshot schedule status" fails. Error message: $ rbd mirror snapshot schedule ls -R rbd: rbd mirror snapshot schedule list failed: (11) Resource temporarily unavailable $ rbd mirror snapshot schedule status rbd: rbd mirror snapshot schedule status failed: (11) Resource temporarily unavailable rbd: invalid schedule status JSON received At same time on same cluster : $ rbd -p ocs-storagecluster-cephblockpool mirror pool status health: OK daemon health: OK image health: OK images: 100 total 100 replaying Version of all relevant components (if applicable): oc rsh -n openshift-storage $TOOLS_POD sh-5.1$ ceph versions { "mon": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 3 }, "mgr": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 1 }, "osd": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 21 }, "mds": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 2 }, "rbd-mirror": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 1 }, "rgw": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 1 }, "overall": { "ceph version 17.2.6-70.0.TEST.bz2119217.el9cp (6d74fefa15d1216867d1d112b47bb83c4913d28f) quincy (stable)": 29 } } OCP version : get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-rc.5 True Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 3 Can this issue reproducible? yes, got it twice ( out of two runs ) on my 10h running tests Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: NA Steps to Reproduce: 1. Create ODF DR with above ceph version ( or upgrade ) 2. create 100 pods with one pvc per pod, write 10 GB per pod ( fio randrw ) with 70% write, 30 % read and --runtime=36000 ( 10h ) 3. leave test running for 10-15 hours, after some time "rbd mirror snapshot schedule ls -R" and "rbd mirror snapshot schedule status" will not work. Actual results: below queries fails: "rbd mirror snapshot schedule ls -R" and "rbd mirror snapshot schedule status" must gather from cluster1/cluster2 ( ocp / odf ) http://perf148b.perf.lab.eng.bos.redhat.com/bz/bz-snapshot-schedule-not/ this cluster has enabled "ceph config set mgr mgr/rbd_support/log_level debug" seems that replication is not happening when issue is present, check below "ceph df" outputs from cluster1 and cluster2. cluster1: ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 18 TiB 11 TiB 6.9 TiB 6.9 TiB 37.58 TOTAL 18 TiB 11 TiB 6.9 TiB 6.9 TiB 37.58 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 38 MiB 11 113 MiB 0 1.9 TiB ocs-storagecluster-cephblockpool 2 512 4.9 TiB 1.35M 6.8 TiB 53.85 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.otp 3 8 0 B 0 0 B 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.control 4 8 0 B 8 0 B 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.index 5 8 8.0 KiB 11 24 KiB 0 1.9 TiB .rgw.root 6 8 5.7 KiB 16 180 KiB 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 7 8 0 B 0 0 B 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.log 8 8 1.2 MiB 340 5.4 MiB 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.meta 9 8 7.8 KiB 14 136 KiB 0 1.9 TiB ocs-storagecluster-cephfilesystem-metadata 10 16 15 MiB 27 45 MiB 0 1.9 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.data 11 32 1 KiB 1 12 KiB 0 1.9 TiB ocs-storagecluster-cephfilesystem-data0 12 32 0 B 0 0 B 0 1.9 TiB cluster2: ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 18 TiB 15 TiB 3.4 TiB 3.4 TiB 18.72 TOTAL 18 TiB 15 TiB 3.4 TiB 3.4 TiB 18.72 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 40 MiB 11 120 MiB 0 4.0 TiB ocs-storagecluster-cephblockpool 2 512 2.1 TiB 548.12k 3.4 TiB 22.10 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.control 3 8 0 B 8 0 B 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.index 4 8 9.6 KiB 11 29 KiB 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.log 5 8 1.3 MiB 340 5.7 MiB 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.meta 6 8 10 KiB 14 144 KiB 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec 7 8 0 B 0 0 B 0 4.0 TiB .rgw.root 8 8 5.7 KiB 16 180 KiB 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.otp 9 8 0 B 0 0 B 0 4.0 TiB ocs-storagecluster-cephfilesystem-metadata 10 16 33 KiB 22 189 KiB 0 4.0 TiB ocs-storagecluster-cephobjectstore.rgw.buckets.data 11 128 1 KiB 1 12 KiB 0 4.0 TiB ocs-storagecluster-cephfilesystem-data0 12 128 0 B 0 0 B 0 4.0 TiB
*** Bug 2221716 has been marked as a duplicate of this bug. ***