Bug 2231152

Summary: [EC2+2@4] Pgs are wrongly marked backfill_toofull without any OSDs crossing backfill_toofull ratio
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Pawan <pdhiran>
Component: RADOSAssignee: Prashant Dhange <pdhange>
Status: ASSIGNED --- QA Contact: Pawan <pdhiran>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1CC: bhubbard, ceph-eng-bugs, cephqe-warriors, nojha, pdhange, vumrao
Target Milestone: ---   
Target Release: 7.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pawan 2023-08-10 18:25:47 UTC
Description of problem:
We are observing that cluster is displaying health warning, stating that PGs in the cluster are in backfill_toofull state. But none of the OSDs are filled beyond 50%.

# ceph osd df
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
 3    hdd  0.40619   1.00000  416 GiB  212 GiB  196 GiB  178 MiB  1.3 GiB  204 GiB  50.90  1.22   55      up
 6    hdd  0.40619   1.00000  416 GiB  136 GiB  120 GiB   50 MiB  923 MiB  280 GiB  32.65  0.78   27      up
10    hdd  0.40619   1.00000  416 GiB  157 GiB  141 GiB   41 MiB  1.0 GiB  259 GiB  37.86  0.91   33      up
15    hdd  0.40619   1.00000  416 GiB  212 GiB  196 GiB   18 MiB  1.3 GiB  204 GiB  50.87  1.22   51      up
19    hdd  0.40619   1.00000  416 GiB  135 GiB  119 GiB  232 MiB  1.0 GiB  281 GiB  32.52  0.78   35      up
23    hdd  0.40619   1.00000  416 GiB  125 GiB  109 GiB   33 KiB  683 MiB  291 GiB  30.09  0.72   34      up
27    hdd  0.40619   1.00000  416 GiB  191 GiB  175 GiB  2.9 MiB  1.2 GiB  225 GiB  45.87  1.10   49      up
31    hdd  0.40619   1.00000  416 GiB  158 GiB  142 GiB  222 MiB  1.2 GiB  258 GiB  37.88  0.91   45      up
35    hdd  0.40619   1.00000  416 GiB  227 GiB  211 GiB  9.6 MiB  1.9 GiB  189 GiB  54.56  1.30   52      up
 0    hdd  0.40619   1.00000  416 GiB  146 GiB  130 GiB  274 MiB  1.1 GiB  270 GiB  35.16  0.84   39      up
 4    hdd  0.40619   1.00000  416 GiB  179 GiB  163 GiB  117 MiB  1.2 GiB  237 GiB  43.01  1.03   49      up
 8    hdd  0.40619   1.00000  416 GiB  136 GiB  120 GiB  153 MiB  998 MiB  280 GiB  32.61  0.78   39      up
12    hdd  0.40619   1.00000  416 GiB  168 GiB  152 GiB    1 KiB  951 MiB  248 GiB  40.45  0.97   37      up
16    hdd  0.40619   1.00000  416 GiB  169 GiB  153 GiB   31 MiB  1.1 GiB  247 GiB  40.69  0.97   43      up
20    hdd  0.40619   1.00000  416 GiB  219 GiB  203 GiB   29 MiB  1.7 GiB  197 GiB  52.53  1.26   54      up
24    hdd  0.40619   1.00000  416 GiB  190 GiB  174 GiB  101 MiB  1.1 GiB  226 GiB  45.60  1.09   45      up
28    hdd  0.40619   1.00000  416 GiB  125 GiB  109 GiB   90 MiB  776 MiB  291 GiB  30.14  0.72   25      up
32    hdd  0.40619   1.00000  416 GiB  212 GiB  196 GiB  186 MiB  1.4 GiB  204 GiB  50.91  1.22   52      up
 2    hdd  0.40619   1.00000  416 GiB  201 GiB  185 GiB   81 MiB  1.3 GiB  215 GiB  48.32  1.16   52      up
 7    hdd  0.40619   1.00000  416 GiB  136 GiB  120 GiB  4.7 MiB  851 MiB  280 GiB  32.69  0.78   38      up
11    hdd  0.40619   1.00000  416 GiB  230 GiB  214 GiB    1 KiB  2.1 GiB  186 GiB  55.26  1.32   51      up
14    hdd  0.40619   1.00000  416 GiB  125 GiB  109 GiB  5.1 MiB  799 MiB  291 GiB  29.94  0.72   25      up
18    hdd  0.40619   1.00000  416 GiB  190 GiB  174 GiB  117 MiB  1.3 GiB  226 GiB  45.56  1.09   52      up
22    hdd  0.40619   1.00000  416 GiB  125 GiB  109 GiB   77 MiB  822 MiB  291 GiB  30.00  0.72   34      up
26    hdd  0.40619   1.00000  416 GiB  201 GiB  185 GiB   21 MiB  1.1 GiB  215 GiB  48.21  1.15   44      up
30    hdd  0.40619   1.00000  416 GiB  169 GiB  153 GiB  190 MiB  1.2 GiB  247 GiB  40.61  0.97   45      up
34    hdd  0.40619   1.00000  416 GiB  179 GiB  163 GiB   53 MiB  1.2 GiB  237 GiB  43.10  1.03   48      up
 1    hdd  0.40619         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down
 5    hdd  0.40619   1.00000  416 GiB  205 GiB  189 GiB  112 MiB  1.4 GiB  211 GiB  49.33  1.18   56      up
 9    hdd  0.40619   1.00000  416 GiB  127 GiB  111 GiB   64 MiB  1.1 GiB  289 GiB  30.43  0.73   44      up
13    hdd  0.40619         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down
17    hdd  0.40619         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down
21    hdd  0.40619         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down
25    hdd  0.40619         0      0 B      0 B      0 B      0 B      0 B      0 B      0     0    0    down
29    hdd  0.40619   1.00000  416 GiB  194 GiB  178 GiB  266 MiB  1.5 GiB  222 GiB  46.58  1.11   56      up
33    hdd  0.40619   1.00000  416 GiB  216 GiB  200 GiB   56 MiB  1.6 GiB  200 GiB  51.84  1.24   58      up
                       TOTAL   13 TiB  5.3 TiB  4.8 TiB  2.7 GiB   37 GiB  7.3 TiB  41.81
 
[root@argo012 ~]# ceph pg dump| grep backfill_toofull
6.fa       12033                   0     12033          0        0  11661737984            0           0  5126      5126  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.629659+0000   15495'32562  15495:112019   [2,33,28,10]           2   [2,NONE,28,10]               2   15282'30407  2023-08-09T22:37:38.489306+0000      13086'26911  2023-08-08T21:09:30.273125+0000              0                   99  queued for scrub                                                            22043                0
6.f2       12172                   0     12172          0        0  11571953664            0           0  5174      5174  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.368370+0000   15495'33236  15495:114972     [5,22,3,8]           5    [NONE,22,3,8]              22   15271'29802  2023-08-09T15:59:35.735286+0000      15271'29802  2023-08-09T15:59:35.735286+0000              0                  698  queued for scrub                                                            21064                0
6.e8       12182                   0     12182          0        0  11750047744            0           0  5095      5095  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.811080+0000   15495'33273  15495:111295   [30,27,16,5]          30  [30,27,16,NONE]              30   15273'29812  2023-08-09T16:14:33.353209+0000      11370'27174  2023-08-08T12:02:03.840314+0000              0                  106  queued for scrub                                                            21064                0
6.d1       12113                   0     12113          0        0  11579490304            0           0  4690      4690  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.973076+0000   15495'33014   15495:77594   [22,10,5,32]          22  [22,10,NONE,32]              22   15281'30514  2023-08-09T21:04:39.171251+0000      11843'26759  2023-08-08T13:36:55.184941+0000              0                   96  queued for scrub                                                            21891                0
6.c1       12105                   0     12105          0        0  11783143424            0           0  4456      4456  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.175027+0000   15495'32053   15495:86571   [5,11,15,20]           5  [NONE,11,15,20]              11   15234'28481  2023-08-09T14:40:23.005390+0000       2928'25276  2023-08-05T15:47:01.820495+0000              0                  111  queued for scrub                                                            20770                0
6.a7       12164                   0     12164          0        0  11856150528            0           0  4176      4176  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.441079+0000   15495'31787   15495:78543  [11,23,24,33]          11  [11,23,24,NONE]              11   14672'27334  2023-08-09T07:59:38.108345+0000      14672'27334  2023-08-09T07:59:38.108345+0000              0                  327  queued for scrub                                                             9874                0
6.4        12260                   0     12260          0        0  11732123648            0           0  3794      3794  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.972959+0000   15495'31479  15495:105034    [5,22,12,3]           5   [NONE,22,12,3]              22   14835'27336  2023-08-09T09:48:34.527402+0000       7419'25161  2023-08-06T00:49:04.630245+0000              0                   54  queued for scrub                                                            10031                0
6.25       12100                   0     12100          0        0  11746050048            0           0  4028      4028  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.683270+0000   15495'31974  15495:117496  [14,35,24,33]          14  [14,35,24,NONE]              14   15285'30354  2023-08-10T01:32:02.219802+0000      15285'30354  2023-08-10T01:32:02.219802+0000              0                  597  queued for scrub                                                            22697                0
6.27       12162                   0     12162          0        0  11862376448            0           0  4170      4170  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.128745+0000   15495'31781   15495:78538  [11,23,24,33]          11  [11,23,24,NONE]              11   14672'27334  2023-08-09T07:59:38.108345+0000      14672'27334  2023-08-09T07:59:38.108345+0000              0                  327  queued for scrub                                                             9874                0
6.2f       12055                   0     12055          0        0  11575820288            0           0  4190      4190  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.230112+0000   15495'32299   15495:83449    [7,5,10,28]           7   [7,NONE,10,28]               7   15288'30877  2023-08-10T02:48:00.901927+0000      15288'30877  2023-08-10T02:48:00.901927+0000              0                  532  queued for scrub                                                            22821                0
6.41       12113                   0     12113          0        0  11782094848            0           0  4469      4469  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.208852+0000   15495'32066   15495:86585   [5,11,15,20]           5  [NONE,11,15,20]              11   15234'28481  2023-08-09T14:40:23.005390+0000       2928'25276  2023-08-05T15:47:01.820495+0000              0                  111  queued for scrub                                                            20770                0
6.51       12118                   0     12118          0        0  11602362368            0           0  4698      4698  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.671645+0000   15495'33022   15495:77603   [22,10,5,32]          22  [22,10,NONE,32]              22   15281'30514  2023-08-09T21:04:39.171251+0000      11843'26759  2023-08-08T13:36:55.184941+0000              0                   96  queued for scrub                                                            21891                0
6.68       12183                   0     12183          0        0  11748999168            0           0  5097      5097  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.830281+0000   15495'33275  15495:111298   [30,27,16,5]          30  [30,27,16,NONE]              30   15273'29812  2023-08-09T16:14:33.353209+0000      11370'27174  2023-08-08T12:02:03.840314+0000              0                  106  queued for scrub                                                            21064                0
6.72       12178                   0     12178          0        0  11592007680            0           0  5179      5179  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:17.973003+0000   15495'33241  15495:114978     [5,22,3,8]           5    [NONE,22,3,8]              22   15271'29802  2023-08-09T15:59:35.735286+0000      15271'29802  2023-08-09T15:59:35.735286+0000              0                  698  queued for scrub                                                            21064                0
6.7a       12027                   0     12027          0        0  11654266880            0           0  5120      5120  active+undersized+degraded+remapped+backfill_toofull  2023-08-10T09:20:18.451102+0000   15495'32556  15495:112014   [2,33,28,10]           2   [2,NONE,28,10]               2   15282'30407  2023-08-09T22:37:38.489306+0000      13086'26911  2023-08-08T21:09:30.273125+0000              0                   99  queued for scrub                                                            22043                0
[root@argo012 ~]#
 
[root@argo012 ~]# ceph -s
  cluster:
    id:     66070a80-2f84-11ee-bc2c-0cc47af3ea56
    health: HEALTH_WARN
            Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull
            Degraded data redundancy: 1603314/12392491 objects degraded (12.938%), 140 pgs degraded, 62 pgs undersized
 
  services:
    mon: 3 daemons, quorum argo012,argo013,argo014 (age 4h)
    mgr: argo013.akdhka(active, since 4h), standbys: argo014.xfhnzv, argo012.odttqx
    osd: 36 osds: 31 up (since 10m), 31 in (since 54s); 148 remapped pgs
    rgw: 4 daemons active (4 hosts, 1 zones)
 
  data:
    pools:   7 pools, 417 pgs
    objects: 3.10M objects, 2.7 TiB
    usage:   5.3 TiB used, 7.3 TiB / 13 TiB avail
    pgs:     1603314/12392491 objects degraded (12.938%)
             187344/12392491 objects misplaced (1.512%)
             269 active+clean
             64  active+undersized+degraded+remapped+backfilling
             61  active+undersized+degraded+remapped+backfill_wait
             15  active+undersized+degraded+remapped+backfill_toofull
             7   active+remapped+backfilling
             1   active+remapped+backfill_wait
 
  io:
    client:   18 KiB/s rd, 3.8 MiB/s wr, 25 op/s rd, 43 op/s wr
    recovery: 204 MiB/s, 224 objects/s
 
  progress:
    Global Recovery Event (10m)
      [==================..........] (remaining: 5m)
 
[root@argo012 ~]# ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    13 TiB  7.3 TiB  5.3 TiB   5.3 TiB      41.81
TOTAL  13 TiB  7.3 TiB  5.3 TiB   5.3 TiB      41.81
 
--- POOLS ---
POOL                       ID  PGS    STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                        1    1    11 MiB        3   32 MiB      0    1.9 TiB
.rgw.root                   2   32   1.3 KiB        4   48 KiB      0    1.9 TiB
default.rgw.log             3   32   4.3 KiB      209  464 KiB      0    2.0 TiB
default.rgw.control         4   32       0 B        9      0 B      0    1.9 TiB
default.rgw.meta            5   32   9.8 KiB       18  233 KiB      0    1.9 TiB
ec22-pool                   6  256   3.2 TiB    3.10M  5.5 TiB  48.58    3.3 TiB
default.rgw.buckets.index   7   32  1020 MiB       90  2.9 GiB   0.05    2.0 TiB
Version-Release number of selected component (if applicable):


[root@argo012 ~]# ceph health detail
HEALTH_WARN Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull; Degraded data redundancy: 1600982/12392683 objects degraded (12.919%), 140 pgs degraded, 140 pgs undersized
[WRN] PG_BACKFILL_FULL: Low space hindering backfill (add storage if this doesn't resolve itself): 15 pgs backfill_toofull
    pg 6.4 is active+undersized+degraded+remapped+backfill_toofull, acting [NONE,22,12,3]
    pg 6.25 is active+undersized+degraded+remapped+backfill_toofull, acting [14,35,24,NONE]
    pg 6.27 is active+undersized+degraded+remapped+backfill_toofull, acting [11,23,24,NONE]
    pg 6.2f is active+undersized+degraded+remapped+backfill_toofull, acting [7,NONE,10,28]
    pg 6.41 is active+undersized+degraded+remapped+backfill_toofull, acting [NONE,11,15,20]
    pg 6.51 is active+undersized+degraded+remapped+backfill_toofull, acting [22,10,NONE,32]
    pg 6.68 is active+undersized+degraded+remapped+backfill_toofull, acting [30,27,16,NONE]
    pg 6.72 is active+undersized+degraded+remapped+backfill_toofull, acting [NONE,22,3,8]
    pg 6.7a is active+undersized+degraded+remapped+backfill_toofull, acting [2,NONE,28,10]
    pg 6.a7 is active+undersized+degraded+remapped+backfill_toofull, acting [11,23,24,NONE]
    pg 6.c1 is active+undersized+degraded+remapped+backfill_toofull, acting [NONE,11,15,20]
    pg 6.d1 is active+undersized+degraded+remapped+backfill_toofull, acting [22,10,NONE,32]
    pg 6.e8 is active+undersized+degraded+remapped+backfill_toofull, acting [30,27,16,NONE]
    pg 6.f2 is active+undersized+degraded+remapped+backfill_toofull, acting [NONE,22,3,8]
    pg 6.fa is active+undersized+degraded+remapped+backfill_toofull, acting [2,NONE,28,10]

# ceph osd dump
epoch 15495
fsid 66070a80-2f84-11ee-bc2c-0cc47af3ea56
created 2023-07-31T09:27:20.333211+0000
modified 2023-08-10T09:08:48.391487+0000
flags sortbitwise,recovery_deletes,purged_snapdirs,pglog_hardlimit
crush_version 169
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client luminous
min_compat_client luminous
require_osd_release quincy
stretch_mode_enabled false
pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 61 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 113 lfor 0/0/109 flags hashpspool stripe_width 0 application rgw
pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 198 lfor 0/0/109 flags hashpspool stripe_width 0 application rgw
pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 113 lfor 0/0/111 flags hashpspool stripe_width 0 application rgw
pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 113 lfor 0/0/111 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
pool 6 'ec22-pool' erasure profile ec22 size 4 min_size 3 crush_rule 1 object_hash rjenkins pg_num 256 pgp_num 128 pg_num_target 512 pgp_num_target 512 autoscale_mode on last_change 15465 lfor 0/15233/15465 flags hashpspool stripe_width 8192 application rgw
pool 7 'default.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 418 lfor 0/0/416 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
max_osd 36
osd.0 up   in  weight 1 up_from 15372 up_thru 15492 down_at 15364 last_clean_interval [329,15363) [v2:10.8.128.213:6800/1960352398,v1:10.8.128.213:6801/1960352398] [v2:10.8.128.213:6802/1960352398,v1:10.8.128.213:6803/1960352398] exists,up 6bfdc645-9cf9-4c5f-8227-621468bcd219
osd.1 down out weight 0 up_from 15421 up_thru 15421 down_at 15454 last_clean_interval [12539,15415) [v2:10.8.128.217:6800/2045021352,v1:10.8.128.217:6801/2045021352] [v2:10.8.128.217:6802/2045021352,v1:10.8.128.217:6803/2045021352] autoout,exists b631d67e-5687-472a-81f2-8060f8ac7f11
osd.2 up   in  weight 1 up_from 15400 up_thru 15492 down_at 15391 last_clean_interval [356,15390) [v2:10.8.128.214:6824/1267294927,v1:10.8.128.214:6825/1267294927] [v2:10.8.128.214:6826/1267294927,v1:10.8.128.214:6827/1267294927] exists,up 2e8bfd8a-c71f-4fea-9d89-de69f96d87a5
osd.3 up   in  weight 1 up_from 15305 up_thru 15491 down_at 15298 last_clean_interval [265,15297) [v2:10.8.128.212:6808/1674113188,v1:10.8.128.212:6809/1674113188] [v2:10.8.128.212:6842/1674113188,v1:10.8.128.212:6843/1674113188] exists,up ae682a2e-7c22-40c1-b0ca-755a30af6e8b
osd.4 up   in  weight 1 up_from 15364 up_thru 15492 down_at 15356 last_clean_interval [320,15355) [v2:10.8.128.213:6856/2877006602,v1:10.8.128.213:6857/2877006602] [v2:10.8.128.213:6858/2877006602,v1:10.8.128.213:6859/2877006602] exists,up 19ec550a-4b5a-40d3-a9c3-0683600bb820
osd.5 up   in  weight 1 up_from 15424 up_thru 15491 down_at 15418 last_clean_interval [12572,15417) [v2:10.8.128.217:6856/3439258045,v1:10.8.128.217:6857/3439258045] [v2:10.8.128.217:6858/3439258045,v1:10.8.128.217:6859/3439258045] exists,up 014ff108-8c85-46c6-9f6e-0c720f84eeac
osd.6 up   in  weight 1 up_from 15334 up_thru 15487 down_at 15328 last_clean_interval [296,15327) [v2:10.8.128.212:6864/2352429355,v1:10.8.128.212:6865/2352429355] [v2:10.8.128.212:6866/2352429355,v1:10.8.128.212:6867/2352429355] exists,up ece9252a-ef39-4af6-8b47-1d2088971c65
osd.7 up   in  weight 1 up_from 15389 up_thru 15494 down_at 15381 last_clean_interval [347,15380) [v2:10.8.128.214:6864/2436227132,v1:10.8.128.214:6865/2436227132] [v2:10.8.128.214:6866/2436227132,v1:10.8.128.214:6867/2436227132] exists,up 6ba5e8d2-72ed-4767-9b82-846ca0c8e3b3
osd.8 up   in  weight 1 up_from 15372 up_thru 15487 down_at 15369 last_clean_interval [333,15368) [v2:10.8.128.213:6864/2077144181,v1:10.8.128.213:6865/2077144181] [v2:10.8.128.213:6866/2077144181,v1:10.8.128.213:6867/2077144181] exists,up 7c5978b0-7d50-42d0-8c6d-8ea0d05c106d
osd.9 up   in  weight 1 up_from 15442 up_thru 15491 down_at 15435 last_clean_interval [12576,15434) [v2:10.8.128.217:6864/1859588829,v1:10.8.128.217:6865/1859588829] [v2:10.8.128.217:6866/1859588829,v1:10.8.128.217:6867/1859588829] exists,up 1d9aabdb-8286-44c3-8fc4-e1614e1bafdc
osd.10 up   in  weight 1 up_from 15301 up_thru 15487 down_at 15296 last_clean_interval [260,15295) [v2:10.8.128.212:6800/152068372,v1:10.8.128.212:6801/152068372] [v2:10.8.128.212:6802/152068372,v1:10.8.128.212:6803/152068372] exists,up fad0e532-cf28-4380-b7d7-badecc7a5507
osd.11 up   in  weight 1 up_from 15411 up_thru 15492 down_at 15402 last_clean_interval [369,15401) [v2:10.8.128.214:6800/1821456158,v1:10.8.128.214:6801/1821456158] [v2:10.8.128.214:6802/1821456158,v1:10.8.128.214:6803/1821456158] exists,up b51fc3b0-bcb5-459d-add5-fbd3ef38e996
osd.12 up   in  weight 1 up_from 15339 up_thru 15492 down_at 15336 last_clean_interval [308,15335) [v2:10.8.128.213:6808/3030502462,v1:10.8.128.213:6809/3030502462] [v2:10.8.128.213:6810/3030502462,v1:10.8.128.213:6811/3030502462] exists,up e4fbca6f-2cfd-4adf-aa76-934f1ee9b4de
osd.13 down out weight 0 up_from 15433 up_thru 15433 down_at 15456 last_clean_interval [12543,15423) [v2:10.8.128.217:6808/4066800377,v1:10.8.128.217:6809/4066800377] [v2:10.8.128.217:6810/4066800377,v1:10.8.128.217:6811/4066800377] autoout,exists 127b3545-1c25-4843-b8c0-58129b975744
osd.14 up   in  weight 1 up_from 15384 up_thru 15492 down_at 15376 last_clean_interval [342,15375) [v2:10.8.128.214:6808/1415681408,v1:10.8.128.214:6809/1415681408] [v2:10.8.128.214:6810/1415681408,v1:10.8.128.214:6811/1415681408] exists,up 59a6f548-842e-4ebb-9436-5a115b7a51a2
osd.15 up   in  weight 1 up_from 15322 up_thru 15492 down_at 15316 last_clean_interval [283,15315) [v2:10.8.128.212:6810/2593498333,v1:10.8.128.212:6811/2593498333] [v2:10.8.128.212:6812/2593498333,v1:10.8.128.212:6813/2593498333] exists,up 7c1d296d-f101-4419-a5dd-8c82d43497cf
osd.16 up   in  weight 1 up_from 15361 up_thru 15492 down_at 15348 last_clean_interval [315,15347) [v2:10.8.128.213:6816/2615629254,v1:10.8.128.213:6817/2615629254] [v2:10.8.128.213:6818/2615629254,v1:10.8.128.213:6819/2615629254] exists,up fa0b85ef-5abd-47d0-bb8f-d24f1d293bb7
osd.17 down out weight 0 up_from 15445 up_thru 15445 down_at 15458 last_clean_interval [12547,15438) [v2:10.8.128.217:6816/1188944202,v1:10.8.128.217:6817/1188944202] [v2:10.8.128.217:6818/1188944202,v1:10.8.128.217:6819/1188944202] autoout,exists b80610ca-0ab1-4fa8-87fc-8901fbe0d51d
osd.18 up   in  weight 1 up_from 15414 up_thru 15479 down_at 15407 last_clean_interval [380,15406) [v2:10.8.128.214:6816/2467598827,v1:10.8.128.214:6817/2467598827] [v2:10.8.128.214:6818/2467598827,v1:10.8.128.214:6819/2467598827] exists,up f109dde3-809a-4341-a4c3-ae76bc94f7a6
osd.19 up   in  weight 1 up_from 15309 up_thru 15493 down_at 15302 last_clean_interval [270,15301) [v2:10.8.128.212:6818/3366057300,v1:10.8.128.212:6819/3366057300] [v2:10.8.128.212:6820/3366057300,v1:10.8.128.212:6821/3366057300] exists,up 0289665b-cc42-48f7-a55f-bb673799c9ca
osd.20 up   in  weight 1 up_from 15351 up_thru 15490 down_at 15341 last_clean_interval [307,15340) [v2:10.8.128.213:6824/876450911,v1:10.8.128.213:6825/876450911] [v2:10.8.128.213:6826/876450911,v1:10.8.128.213:6827/876450911] exists,up bfa91559-6140-444c-849e-2a65a404d3d1
osd.21 down out weight 0 up_from 15429 up_thru 15429 down_at 15460 last_clean_interval [12552,15420) [v2:10.8.128.217:6824/1099962364,v1:10.8.128.217:6825/1099962364] [v2:10.8.128.217:6826/1099962364,v1:10.8.128.217:6827/1099962364] autoout,exists 346f3adc-7505-44c4-9c09-704a6e5d7090
osd.22 up   in  weight 1 up_from 15408 up_thru 15492 down_at 15397 last_clean_interval [365,15396) [v2:10.8.128.214:6832/1443432180,v1:10.8.128.214:6833/1443432180] [v2:10.8.128.214:6834/1443432180,v1:10.8.128.214:6835/1443432180] exists,up 60b77f38-edd4-41d1-80d6-5c0ba4b20246
osd.23 up   in  weight 1 up_from 15314 up_thru 15491 down_at 15306 last_clean_interval [274,15305) [v2:10.8.128.212:6826/3174798235,v1:10.8.128.212:6827/3174798235] [v2:10.8.128.212:6828/3174798235,v1:10.8.128.212:6829/3174798235] exists,up a2e1020f-497b-42a0-9c69-424907dfc60f
osd.24 up   in  weight 1 up_from 15367 up_thru 15492 down_at 15358 last_clean_interval [325,15357) [v2:10.8.128.213:6832/872631187,v1:10.8.128.213:6833/872631187] [v2:10.8.128.213:6834/872631187,v1:10.8.128.213:6835/872631187] exists,up 7dccce66-a574-4622-8866-fa6a68648c45
osd.25 down out weight 0 up_from 15439 up_thru 15439 down_at 15462 last_clean_interval [12557,15430) [v2:10.8.128.217:6832/1872376923,v1:10.8.128.217:6833/1872376923] [v2:10.8.128.217:6834/1872376923,v1:10.8.128.217:6835/1872376923] autoout,exists d3a9bd68-adf9-4094-b621-aa864475b2d0
osd.26 up   in  weight 1 up_from 15379 up_thru 15470 down_at 15374 last_clean_interval [337,15373) [v2:10.8.128.214:6840/4160146302,v1:10.8.128.214:6841/4160146302] [v2:10.8.128.214:6842/4160146302,v1:10.8.128.214:6843/4160146302] exists,up 5149280c-7369-41d8-ae6a-7bd5c816dacb
osd.27 up   in  weight 1 up_from 15327 up_thru 15491 down_at 15319 last_clean_interval [288,15318) [v2:10.8.128.212:6834/89520191,v1:10.8.128.212:6835/89520191] [v2:10.8.128.212:6836/89520191,v1:10.8.128.212:6837/89520191] exists,up 5b4ce98c-03d1-4d59-9945-801f82b80ce0
osd.28 up   in  weight 1 up_from 15354 up_thru 15492 down_at 15343 last_clean_interval [311,15342) [v2:10.8.128.213:6840/2973682997,v1:10.8.128.213:6841/2973682997] [v2:10.8.128.213:6842/2973682997,v1:10.8.128.213:6843/2973682997] exists,up 8bc5f3c5-4767-4944-8084-b3f26c2d5ef6
osd.29 up   in  weight 1 up_from 15436 up_thru 15491 down_at 15426 last_clean_interval [12561,15425) [v2:10.8.128.217:6840/3417836056,v1:10.8.128.217:6841/3417836056] [v2:10.8.128.217:6842/3417836056,v1:10.8.128.217:6843/3417836056] exists,up 5f589ff9-1f75-4f61-a626-5bfced1bcb99
osd.30 up   in  weight 1 up_from 15405 up_thru 15492 down_at 15393 last_clean_interval [360,15392) [v2:10.8.128.214:6848/2405030462,v1:10.8.128.214:6849/2405030462] [v2:10.8.128.214:6850/2405030462,v1:10.8.128.214:6851/2405030462] exists,up d9a2b3fa-adce-44a7-8157-39dcb9ada8fb
osd.31 up   in  weight 1 up_from 15331 up_thru 15493 down_at 15324 last_clean_interval [292,15323) [v2:10.8.128.212:6850/2562535729,v1:10.8.128.212:6851/2562535729] [v2:10.8.128.212:6852/2562535729,v1:10.8.128.212:6853/2562535729] exists,up ca0f750a-ba77-4107-ba10-ba6c389d5fc0
osd.32 up   in  weight 1 up_from 15346 up_thru 15492 down_at 15338 last_clean_interval [304,15337) [v2:10.8.128.213:6848/1342419880,v1:10.8.128.213:6849/1342419880] [v2:10.8.128.213:6850/1342419880,v1:10.8.128.213:6851/1342419880] exists,up 21287e7b-9429-472a-bdcb-9dc5e7aa3f47
osd.33 up   in  weight 1 up_from 15448 up_thru 15491 down_at 15441 last_clean_interval [12567,15440) [v2:10.8.128.217:6848/468425815,v1:10.8.128.217:6849/468425815] [v2:10.8.128.217:6850/468425815,v1:10.8.128.217:6851/468425815] exists,up d640f7a8-5ece-4942-b436-41d3446b4583
osd.34 up   in  weight 1 up_from 15395 up_thru 15480 down_at 15386 last_clean_interval [352,15385) [v2:10.8.128.214:6856/13339455,v1:10.8.128.214:6857/13339455] [v2:10.8.128.214:6858/13339455,v1:10.8.128.214:6859/13339455] exists,up 5a6db3ff-d874-4685-9563-d71746061363
osd.35 up   in  weight 1 up_from 15319 up_thru 15491 down_at 15311 last_clean_interval [562,15310) [v2:10.8.128.212:6848/2471831620,v1:10.8.128.212:6849/2471831620] [v2:10.8.128.212:6858/2471831620,v1:10.8.128.212:6859/2471831620] exists,up 5c883920-b9bb-44e7-bec5-8ca258d29722

How reproducible:
1/1

Steps to Reproduce:
1. Deploy RHCS cluster, with 4 hosts, and 9 OSDs per host.
2. Fill the cluster to around 45%, and keep IOs going.
3. Bring down few OSDs ( 5 ) down on one host. After 10 minutes, the down OSDs would be marked "OUT"
4. Observe that Once they are marked out, the PGs remap onto other OSDs on the host. 
5. Once remap has started, we are observing PGs stuck in backfill_toofull. but No OSDs have crossed backfill_tofull ratio. 

Actual results:
PGs are wrongly marked backfill_toofull

Expected results:
No PGs stuck in backfill_toofull state until OSDs actually cross the backfill_toofull ratio

Additional info:

Issue looks similar to tracker : https://tracker.ceph.com/issues/62248