Created attachment 2018018 [details] ceph.log Description of problem: Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
[1] cluster stuck in backfilling for more than 24 hours # ceph -s cluster: id: d41b35fc-20ef-11ef-bda4-000af7995d6c health: HEALTH_WARN noscrub,nodeep-scrub flag(s) set Degraded data redundancy: 313071902/1477314630 objects degraded (21.192%), 3732 pgs degraded, 3732 pgs undersized services: mon: 3 daemons, quorum f28-h21-000-r630,f28-h22-000-r630,f28-h23-000-r630 (age 43h) mgr: f28-h23-000-r630.nbspst(active, since 2d), standbys: f28-h22-000-r630.kaarli, f28-h21-000-r630.eksini osd: 192 osds: 192 up (since 38h), 192 in (since 38h); 3953 remapped pgs flags noscrub,nodeep-scrub rgw: 8 daemons active (8 hosts, 1 zones) data: pools: 7 pools, 4769 pgs objects: 246.22M objects, 16 TiB usage: 36 TiB used, 325 TiB / 361 TiB avail pgs: 313071902/1477314630 objects degraded (21.192%) 38533716/1477314630 objects misplaced (2.608%) 3731 active+undersized+degraded+remapped+backfill_wait 816 active+clean 221 active+remapped+backfill_wait 1 active+undersized+degraded+remapped+backfilling io: client: 31 KiB/s rd, 0 B/s wr, 31 op/s rd, 20 op/s wr [2] PG 7.109 stuck in backfilling for more than 24 hours. Due to this more than 3000 PGs are stuck in backfill_wait. # ceph pg ls backfilling PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG LOG_DUPS STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP LAST_SCRUB_DURATION SCRUB_SCHEDULING 7.109 60223 120446 0 0 4162054144 0 0 1343 3000 active+undersized+degraded+remapped+backfilling 35h 1907'68103 2411:157724 [103,146,142,156,180,89]p103 [NONE,146,142,NONE,180,89]p146 2024-06-02T15:03:35.735370+0000 2024-06-02T15:03:35.735370+0000 0 queued for scrub [3] the backfill_targets are 103(0) and 156(3) which are missing all 60223 objects # cat pg_7.109.query { "snap_trimq": "[]", "snap_trimq_len": 0, "state": "active+undersized+degraded+remapped+backfilling", "epoch": 2250, "up": [ 103, 146, 142, 156, 180, 89 ], "acting": [ 2147483647, 146, 142, 2147483647, 180, 89 ], "backfill_targets": [ "103(0)", "156(3)" ], "acting_recovery_backfill": [ "89(5)", "103(0)", "142(2)", "146(1)", "156(3)", "180(4)" ], .... "up": [ 103, 146, 142, 156, 180, 89 ], "acting": [ 2147483647, 146, 142, 2147483647, 180, 89 ], "avail_no_missing": [ "146(1)", "89(5)", "142(2)", "180(4)" ], "object_location_counts": [ { "shards": "89(5),142(2),146(1),180(4)", "objects": 60223 } ], "blocked_by": [], "up_primary": 103, "acting_primary": 146, "purged_snaps": [] }, [4] The 7.109 PG started backfiling for shard0 from osd.1(0) to osd.103(0) and for shard3 from osd.125(3) to osd.156(3). # grep -r "7\.109" /var/log/ceph/d41b35fc-20ef-11ef-bda4-000af7995d6c/ceph.log 2024-06-02T18:46:07.109482+0000 mgr.f28-h23-000-r630.nbspst (mgr.14283) 6806 : cluster 0 pgmap v6810: 4769 pgs: 4769 active+clean; 10 TiB data, 28 TiB used, 333 TiB / 361 TiB avail; 586 MiB/s rd, 406 MiB/s wr, 190.59k op/s 2024-06-02T19:43:17.109450+0000 mgr.f28-h23-000-r630.nbspst (mgr.14283) 8524 : cluster 0 pgmap v8508: 4769 pgs: 4769 active+clean; 11 TiB data, 30 TiB used, 331 TiB / 361 TiB avail; 428 MiB/s rd, 302 MiB/s wr, 140.31k op/s 2024-06-02T21:25:46.258120+0000 osd.146 (osd.146) 30 : cluster 0 7.109s1 starting backfill to osd.1(0) from (0'0,0'0] MAX to 620'50964 2024-06-02T21:53:27.109562+0000 osd.115 (osd.115) 1508 : cluster 3 10 slow requests (by type [ 'queued for pg' : 10 ] most affected pool [ 'default.rgw.buckets.data' : 10 ]) 2024-06-02T23:34:19.734731+0000 osd.146 (osd.146) 1128 : cluster 1 7.109s1 continuing backfill to osd.1(0) from (615'50002,1471'51574] 7:9085d2cc:::b5aa4475-a87c-41d2-a21a-65355e281454.19284.189_hX2YOVGF%2f77781.NmZhvuybxNiNp2xC.rnd:head to 1471'51574 2024-06-02T23:44:25.878502+0000 osd.146 (osd.146) 1149 : cluster 1 7.109s1 continuing backfill to osd.1(0) from (615'50602,1505'52181] 7:9085d2cc:::b5aa4475-a87c-41d2-a21a-65355e281454.19284.189_hX2YOVGF%2f77781.NmZhvuybxNiNp2xC.rnd:head to 1505'52181 2024-06-02T23:44:25.878518+0000 osd.146 (osd.146) 1150 : cluster 0 7.109s1 starting backfill to osd.125(3) from (0'0,0'0] MAX to 1505'52181 2024-06-03T00:30:47.109852+0000 osd.40 (osd.40) 225 : cluster 3 14 slow requests (by type [ 'queued for pg' : 14 ] most affected pool [ 'default.rgw.buckets.data' : 14 ]) 2024-06-03T02:15:39.176352+0000 osd.146 (osd.146) 1194 : cluster 1 7.109s1 continuing backfill to osd.1(0) from (1635'57260,1773'58638] 7:9085d2cc:::b5aa4475-a87c-41d2-a21a-65355e281454.19284.189_hX2YOVGF%2f77781.NmZhvuybxNiNp2xC.rnd:head to 1773'58638 2024-06-03T02:15:39.176482+0000 osd.146 (osd.146) 1196 : cluster 0 7.109s1 starting backfill to osd.7(3) from (0'0,0'0] MAX to 1773'58638 2024-06-03T02:15:56.207617+0000 osd.146 (osd.146) 1263 : cluster 0 7.109s1 starting backfill to osd.15(0) from (0'0,0'0] MAX to 1786'58641 2024-06-03T02:15:56.221911+0000 osd.146 (osd.146) 1268 : cluster 0 7.109s1 starting backfill to osd.125(3) from (1635'57260,1773'58638] MIN to 1786'58641 2024-06-03T02:16:10.467454+0000 osd.146 (osd.146) 1346 : cluster 0 7.109s1 starting backfill to osd.13(3) from (0'0,0'0] MAX to 1798'58643 2024-06-03T02:16:10.482341+0000 osd.146 (osd.146) 1349 : cluster 1 7.109s1 continuing backfill to osd.15(0) from (1635'57260,1798'58643] MIN to 1798'58643 2024-06-03T02:16:13.583865+0000 osd.146 (osd.146) 1363 : cluster 0 7.109s1 starting backfill to osd.13(3) from (0'0,0'0] MAX to 1798'58643 2024-06-03T02:16:13.590155+0000 osd.146 (osd.146) 1364 : cluster 0 7.109s1 starting backfill to osd.103(0) from (611'47500,611'49542] MAX to 1798'58643 2024-06-03T02:16:15.288379+0000 osd.146 (osd.146) 1378 : cluster 1 7.109s1 continuing backfill to osd.103(0) from (1635'57260,1802'58644] MIN to 1802'58644 2024-06-03T02:16:15.288394+0000 osd.146 (osd.146) 1379 : cluster 0 7.109s1 starting backfill to osd.156(3) from (615'50002,1471'51574] MAX to 1802'58644 2024-06-03T10:34:17.109297+0000 mgr.f28-h23-000-r630.nbspst (mgr.14283) 35064 : cluster 0 pgmap v36537: 4769 pgs: 1 active+undersized+degraded+remapped+backfilling, 221 active+remapped+backfill_wait, 3731 active+undersized+degraded+remapped+backfill_wait, 816 active+clean; 16 TiB data, 36 TiB used, 325 TiB / 361 TiB avail; 313071902/1477314627 objects degraded (21.192%); 38533716/1477314627 objects misplaced (2.608%) 2024-06-03T17:47:27.109839+0000 mgr.f28-h23-000-r630.nbspst (mgr.14283) 47921 : cluster 0 pgmap v49427: 4769 pgs: 1 active+undersized+degraded+remapped+backfilling, 221 active+remapped+backfill_wait, 3731 active+undersized+degraded+remapped+backfill_wait, 816 active+clean; 16 TiB data, 36 TiB used, 325 TiB / 361 TiB avail; 313071902/1477314630 objects degraded (21.192%); 38533716/1477314630 objects misplaced (2.608%) [5] Nothing stands out from debug logs collected for osd.146(1) but osd.146 only reports osd.103(0) and osd.156(3) missing all the 60223 objects to be backfilled osd.146 : 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 op_wq(0) _process 7.109s1 to_process <> waiting <> waiting_peering {} 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 op_wq(0) _process OpSchedulerItem(7.109s1 PGPeeringEvent(epoch_sent: 2291 epoch_requested: 2291 MLeaseAck epoch 2291 from osd.180 pg_lease_ack(ruub 106835.609375000s)) class_id 2 prio 255 cost 10 e2291) queued 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 op_wq(0) _process 7.109s1 to_process <OpSchedulerItem(7.109s1 PGPeeringEvent(epoch_sent: 2291 epoch_requested: 2291 MLeaseAck epoch 2291 from osd.180 pg_lease_ack(ruub 106835.609375000s)) class_id 2 prio 255 cost 10 e2291)> waiting <> waiting_peering {} 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 op_wq(0) _process OpSchedulerItem(7.109s1 PGPeeringEvent(epoch_sent: 2291 epoch_requested: 2291 MLeaseAck epoch 2291 from osd.180 pg_lease_ack(ruub 106835.609375000s)) class_id 2 prio 255 cost 10 e2291) pg 0x55eafb3f4000 2024-06-03T20:41:09.478+0000 7fd764cdd700 10 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] do_peering_event: epoch_sent: 2291 epoch_requested: 2291 MLeaseAck epoch 2291 from osd.180 pg_lease_ack(ruub 106835.609375000s) 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] recalc_readable_until peer osd.142 ruub 106835.609375000s 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] recalc_readable_until peer osd.180 ruub 106835.609375000s 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] recalc_readable_until peer osd.89 ruub 106835.609375000s 2024-06-03T20:41:09.478+0000 7fd764cdd700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] recalc_readable_until readable_until[_ub] 106835.609375000s (sent 106835.609375000s) 2024-06-03T20:41:11.198+0000 7fd77ff16700 1 -- 172.17.42.221:0/3633088982 <== osd.111 v2:172.17.42.172:6910/1246231084 20624 ==== osd_ping(ping_reply e2291 up_from 1791 ping_stamp 2024-06-03T20:41:11.191449+0000/106821.320312500s send_stamp 66337.109375000s delta_ub -40484.214843750s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55eafbcf1080 con 0x55eca23f0800 2024-06-03T20:41:11.198+0000 7fd77ff16700 1 -- 172.16.42.221:0/3633088982 <== osd.112 v2:172.16.43.90:6812/1726708053 23629 ==== osd_ping(ping_reply e2291 up_from 209 ping_stamp 2024-06-03T20:41:11.191449+0000/106821.320312500s send_stamp 106857.109375000s delta_ub 35.789901733s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55eafbcf1080 con 0x55eafe0f7400 2024-06-03T20:41:11.198+0000 7fd77f715700 1 -- 172.17.42.221:0/3633088982 <== osd.112 v2:172.17.43.90:6814/1726708053 23629 ==== osd_ping(ping_reply e2291 up_from 209 ping_stamp 2024-06-03T20:41:11.191449+0000/106821.320312500s send_stamp 106857.109375000s delta_ub 35.790058136s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55eafb3fa180 con 0x55eafe0f6800 2024-06-03T20:41:11.199+0000 7fd77f715700 1 -- 172.16.42.221:0/3633088982 <== osd.111 v2:172.16.42.172:6908/1246231084 20624 ==== osd_ping(ping_reply e2291 up_from 1791 ping_stamp 2024-06-03T20:41:11.191449+0000/106821.320312500s send_stamp 66337.109375000s delta_ub -40484.214843750s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55eafb3fa180 con 0x55eca23f1800 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] scrubber<NotActive>: update_scrub_stats: period: 15s/145s-> 145s last:2024-06-03T20:38:46 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats actingset 89(5),142(2),146(1),180(4) upset 89(5),103(0),142(2),146(1),156(3),180(4) acting_recovery_backfill 89(5),103(0),142(2),146(1),156(3),180(4) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 146(1) primary objects 60223 missing 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats no peer_missing found for 1(0) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 1(0) objects 53145 missing 7078 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats no peer_missing found for 7(3) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 7(3) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats no peer_missing found for 13(3) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 13(3) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats no peer_missing found for 15(0) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 15(0) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 89(5) objects 53150 missing 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 103(0) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats no peer_missing found for 125(3) 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 125(3) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 142(2) objects 53150 missing 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 156(3) objects 0 missing 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats shard 180(4) objects 53150 missing 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats object_location_counts {89(5),142(2),146(1),180(4)=60223} 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 89(5) missing= 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 142(2) missing= 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 146(1) missing= 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 180(4) missing= 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 103(0) missing= 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats missing shard 156(3) missing= 60223 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats degraded 120446 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] update_calc_stats misplaced 0 2024-06-03T20:41:11.667+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] prepare_stats_for_publish reporting purged_snaps [] 2024-06-03T20:41:11.667+0000 7fd770614700 15 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] publish_stats_to_osd 2291:157075 2024-06-03T20:41:13.022+0000 7fd77ff16700 1 -- [v2:172.17.42.221:6854/3633088982,v1:172.17.42.221:6855/3633088982] <== osd.69 172.17.43.86:0/1315089877 23513 ==== osd_ping(ping e2291 up_from 344 ping_stamp 2024-06-03T20:41:13.021291+0000/106717.109375000s send_stamp 106717.109375000s delta_ub -106.039131165s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55ecbb3b1b00 con 0x55eca23f1c00 2024-06-03T20:41:13.022+0000 7fd77ff16700 1 -- [v2:172.17.42.221:6854/3633088982,v1:172.17.42.221:6855/3633088982] --> 172.17.43.86:0/1315089877 -- osd_ping(ping_reply e2291 up_from 245 ping_stamp 2024-06-03T20:41:13.021291+0000/106717.109375000s send_stamp 106823.148437500s delta_ub 106.045753479s) v5 -- 0x55ecbb3b1080 con 0x55eca23f1c00 2024-06-03T20:41:13.022+0000 7fd77f715700 1 -- [v2:172.16.42.221:6852/3633088982,v1:172.16.42.221:6853/3633088982] <== osd.69 172.16.43.86:0/1315089877 23513 ==== osd_ping(ping e2291 up_from 344 ping_stamp 2024-06-03T20:41:13.021291+0000/106717.109375000s send_stamp 106717.109375000s delta_ub -106.039131165s) v5 ==== 2033+0+0 (crc 0 0 0) 0x55ecbb3b0480 con 0x55eafdd6f800 2024-06-03T20:41:13.022+0000 7fd77f715700 1 -- [v2:172.16.42.221:6852/3633088982,v1:172.16.42.221:6853/3633088982] --> 172.16.43.86:0/1315089877 -- osd_ping(ping_reply e2291 up_from 245 ping_stamp 2024-06-03T20:41:13.021291+0000/106717.109375000s send_stamp 106823.148437500s delta_ub 106.045982361s) v5 -- 0x55ecbb3b1680 con 0x55eafdd6f800 2024-06-03T20:41:16.668+0000 7fd770614700 20 osd.146 pg_epoch: 2291 pg[7.109s1( v 1907'68103 (1891'66760,1907'68103] local-lis/les=1803/1804 n=60223 ec=412/387 lis/c=1803/610 les/c/f=1804/611/0 sis=1803) [103,146,142,156,180,89]/[NONE,146,142,NONE,180,89]p146(1) backfill=[103(0),156(3)] r=1 lpr=1803 pi=[610,1803)/1 crt=1907'68103 lcod 1907'68102 mlcod 1907'68102 active+undersized+degraded+remapped+backfilling mbc={}] scrubber<NotActive>: update_scrub_stats: period: 15s/145s-> 145s last:2024-06-03T20:41:11 2024-06-03T20:41:16.975+0000 7fd780717700 1 -- [v2:172.16.42.221:6852/3633088982,v1:172.16.42.221:6853/3633088982] --> 172.16.43.67:0/2023387295 -- osd_ping(ping_reply e2291 up_from 245 ping_stamp 2024-06-03T20:41:16.974985+0000/106766.507812500s send_stamp 106827.109375000s delta_ub 60.594390869s) v5 -- 0x55eafddac480 con 0x55eafab5d000 2024-06-03T20:41:16.975+0000 7fd77ff16700 1 -- [v2:172.17.42.221:6854/3633088982,v1:172.17.42.221:6855/3633088982] --> 172.17.43.67:0/2023387295 -- osd_ping(ping_reply e2291 up_from 245 ping_stamp 2024-06-03T20:41:16.974985+0000/106766.507812500s send_stamp 106827.109375000s delta_ub 60.594425201s) v5 -- 0x55eafd2eb080 con 0x55eafbca1400 2024-06-03T20:41:17.477+0000 7fd765cdf700 15 osd.146 2291 enqueue_peering_evt 7.109s1 epoch_sent: 1803 epoch_requested: 1803 RenewLease 2024-06-03T20:41:17.477+0000 7fd765cdf700 20 osd.146 op_wq(0) _enqueue OpSchedulerItem(7.109s1 PGPeeringEvent(epoch_sent: 1803 epoch_requested: 1803 RenewLease) class_id 2 prio 255 cost 10 e1803) 2024-06-03T20:41:17.477+0000 7fd764cdd700 20 osd.146 op_wq(0) _process 7.109s1 to_process <> waiting <> waiting_peering {} We are yet to collect some additional debug logs with debug_osd=30, debug_ms=5 and debug_mclock=5 level from the up/acting set OSDs. Meanwhile I have asked Aishwarya to check for the possible deadlock from mclock side as I am suspecting a mclock issue something on a similare line compared to backfill stuck issue reported by some of the upstream users https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/IPHBE3DLW5ABCZHSNYOBUBSI3TLWVD22/#RBR4PIODUSHH3C3WE2BEAJ4UPSHHDO5X
*** Bug 2212945 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 8.1 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2025:9775