Bug 1896959

Summary:

PG autoscaler did not respond to storage pool consuming space

Product:

[Red Hat Storage] Red Hat OpenShift Container Storage

Reporter:

Ben England <bengland>

Component:

ceph

Assignee:

Neha Ojha <nojha>

Status:

CLOSED NOTABUG

QA Contact:

Raz Tamir <ratamir>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

4.2

CC:

assingh, bbenshab, bniver, ebenahar, jdurgin, jhopper, kramdoss, madam, muagarwa, nojha, ocs-bugs, owasserm

Target Milestone:

---

Keywords:

AutomationBackLog, Performance

Target Release:

---

Flags:

kramdoss: needinfo+

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-01-22 09:01:12 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
must_gather for comment#6	none

Description Ben England 2020-11-11 22:44:44 UTC

Description of problem (please be detailed as possible and provide log
snippests):

Ceph PG autoscaler did not increase PG count for storage pool consuming most of available space (all the used space).  This results in an OSD being near full well before the other OSDs are, when the cluster is only supposedly 68% full.   The whole point of the PG autoscaler was to prevent these kind of OSD imbalance problems (also the rebalancer).

It is not clear if the rebalancer module was engaged or could have helped this problem.

Why do I care? efficient storage space utilization is a performance dimension of sorts - increasing utilization of storage devices means lower total system cost.


Version of all relevant components (if applicable):

sh-4.2# ceph version
ceph version 14.2.8-91.el8cp (75b4845da7d469665bd48d1a49badcc3677bf5cd) nautilus (stable)

[kni@e23-h15-740xd ~]$ oc -n openshift-storage get csv
NAME                  DISPLAY                       VERSION   REPLACES   PHASE
ocs-operator.v4.5.2   OpenShift Container Storage   4.5.2                Succeeded



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No.  Just testing at this time, have a little less space to work with.


Is there any workaround available to the best of your knowledge?

We can force the PG autoscaler into action with: 
ceph osd pool set ocs-storagecluster-cephblockpool target_size_ratio 0.95


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1


Can this issue reproducible?

probably



Can this issue reproduce from the UI?

probably



Steps to Reproduce:
1. Create an OCS cluster on baremetal with 6 OCS nodes, 2 NVM partitions/node
2. Install OpenShift Virtualization
3. Create 100 VMs that together use up or overcommit the available space
4. Run workload that fills up the storage space given to the VMs

I do not suspect this has anything to do with OpenShift Virtualization, I think you could get the same result by just writing any kind of data to Ceph using a single pool.


Actual results:

1 OSD is overflowing when others are not.  Worst OSD used 1094 GB out of 1453 GB, 75%, whereas least used OSD had 880 GB, or 60%.   



Expected results:

All OSDs have within 10% of each other in used space.



Additional info:


oc adm must-gather info is in this tarball:

http://perf1.perf.lab.eng.bos.redhat.com/pub/bengland/tmp/ocp4/cnv-boaz-must-gather-bz.tgz


[kni@e23-h15-740xd ~]$ ocos rsh rook-ceph-tools-6c4ff47568-pr4dt



we add up the GB promised to every RBD volume created by OpenShift Virtualization via the ocs-storagecluster-ceph-rbd storageclass

sh-4.2# for v in $(rbd -p ocs-storagecluster-cephblockpool ls) ; do \
   rbd -p ocs-storagecluster-cephblockpool info $v ; done \
   | awk '/size/{sum+=$2}END{print sum}'
8070



sh-4.2# ceph -s
  cluster:
    id:     ea293d9e-b5c4-4858-9b14-30724100c548
    health: HEALTH_WARN
            1 nearfull osd(s)
            10 pool(s) nearfull
 
  services:
    mon: 3 daemons, quorum a,b,c (age 15h)
    mgr: a(active, since 15h)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 12 osds: 12 up (since 14h), 12 in (since 14h)
    rgw: 2 daemons active (ocs.storagecluster.cephobjectstore.a, ocs.storagecluster.cephobjectstore.b)
 
  task status:
    scrub status:
        mds.ocs-storagecluster-cephfilesystem-a: idle
        mds.ocs-storagecluster-cephfilesystem-b: idle
 
  data:
    pools:   10 pools, 368 pgs
    objects: 1.03M objects, 3.9 TiB
    usage:   12 TiB used, 5.3 TiB / 17 TiB avail
    pgs:     368 active+clean
 
  io:
    client:   852 B/s rd, 40 KiB/s wr, 1 op/s rd, 6 op/s wr



all the space is in -cephblockpool

sh-4.2# ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL       USED       RAW USED     %RAW USED 
    ssd       17 TiB     5.3 TiB     12 TiB       12 TiB         68.99 
    TOTAL     17 TiB     5.3 TiB     12 TiB       12 TiB         68.99 
 
POOLS:
    POOL                                                      ID     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    ocs-storagecluster-cephblockpool                           1     3.9 TiB       1.03M      12 TiB     87.65       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.control             2         0 B           8         0 B         0       564 GiB 
    ocs-storagecluster-cephfilesystem-metadata                 3     2.2 KiB          22      96 KiB         0       564 GiB 
    ocs-storagecluster-cephfilesystem-data0                    4         0 B           0         0 B         0       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.meta                5     1.4 KiB           7      72 KiB         0       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.log                 6     3.5 KiB         179     408 KiB         0       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.buckets.index       7         0 B          11         0 B         0       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec      8         0 B           0         0 B         0       564 GiB 
    .rgw.root                                                  9     4.7 KiB          16     180 KiB         0       564 GiB 
    ocs-storagecluster-cephobjectstore.rgw.buckets.data       10       1 KiB           1      12 KiB         0       564 GiB 


see osd 5 here

sh-4.2# ceph osd status
+----+-----------+-------+-------+--------+---------+--------+---------+--------------------+
| id |    host   |  used | avail | wr ops | wr data | rd ops | rd data |       state        |
+----+-----------+-------+-------+--------+---------+--------+---------+--------------------+
| 0  | worker002 |  938G |  514G |    0   |  1843   |    4   |     0   |     exists,up      |
| 1  | worker001 |  880G |  573G |    0   |     0   |    2   |     0   |     exists,up      |
| 2  | worker000 |  970G |  482G |    0   |   819   |    1   |     0   |     exists,up      |
| 3  | worker000 |  908G |  545G |    1   |  8908   |    3   |     0   |     exists,up      |
| 4  | worker001 | 1063G |  389G |    0   |     0   |    3   |     0   |     exists,up      |
| 5  | worker002 | 1094G |  359G |    0   |  4505   |    3   |     0   | exists,nearfull,up |
| 6  |  master-0 | 1064G |  388G |    0   |  1638   |    5   |     0   |     exists,up      |
| 7  |  master-1 |  910G |  542G |    0   |  13.6k  |    5   |     0   |     exists,up      |
| 8  |  master-2 | 1065G |  388G |    0   |     0   |    5   |   106   |     exists,up      |
| 9  |  master-2 | 1066G |  386G |    1   |  13.7k  |    2   |     0   |     exists,up      |
| 10 |  master-0 | 1001G |  451G |    0   |   819   |    2   |     0   |     exists,up      |
| 11 |  master-1 | 1067G |  386G |    0   |  2355   |    3   |     0   |     exists,up      |
+----+-----------+-------+-------+--------+---------+--------+---------+--------------------+


each OSD is partition containing 1/2 an NVM device and fed to the LSO

sh-4.2# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF 
 -1       17.03156 root default                                   
-12        5.67719     rack rack0                                 
-17        2.83859         host master-2                          
  8   ssd  1.41930             osd.8          up  1.00000 1.00000 
  9   ssd  1.41930             osd.9          up  1.00000 1.00000 
-11        2.83859         host worker000                         
  2   ssd  1.41930             osd.2          up  1.00000 1.00000 
  3   ssd  1.41930             osd.3          up  1.00000 1.00000 
 -8        5.67719     rack rack1                                 
-15        2.83859         host master-0                          
  6   ssd  1.41930             osd.6          up  1.00000 1.00000 
 10   ssd  1.41930             osd.10         up  1.00000 1.00000 
 -7        2.83859         host worker001                         
  1   ssd  1.41930             osd.1          up  1.00000 1.00000 
  4   ssd  1.41930             osd.4          up  1.00000 1.00000 
 -4        5.67719     rack rack2                                 
-19        2.83859         host master-1                          
  7   ssd  1.41930             osd.7          up  1.00000 1.00000 
 11   ssd  1.41930             osd.11         up  1.00000 1.00000 
 -3        2.83859         host worker002                         
  0   ssd  1.41930             osd.0          up  1.00000 1.00000 
  5   ssd  1.41930             osd.5          up  1.00000 1.00000 



cephblockpool has autoscale_mode on

sh-4.2# ceph osd pool ls detail
pool 1 'ocs-storagecluster-cephblockpool' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 151 lfor 0/0/86 flags hashpspool,nearfull,selfmanaged_snaps stripe_width 0 target_size_ratio 0.49 application rbd
	removed_snaps [1~3]
pool 2 'ocs-storagecluster-cephobjectstore.rgw.control' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 3 'ocs-storagecluster-cephfilesystem-metadata' replicated size 3 min_size 2 crush_rule 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs
pool 4 'ocs-storagecluster-cephfilesystem-data0' replicated size 3 min_size 2 crush_rule 4 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 151 lfor 0/0/88 flags hashpspool,nearfull stripe_width 0 target_size_ratio 0.49 application cephfs
pool 5 'ocs-storagecluster-cephobjectstore.rgw.meta' replicated size 3 min_size 2 crush_rule 5 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 6 'ocs-storagecluster-cephobjectstore.rgw.log' replicated size 3 min_size 2 crush_rule 6 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 7 'ocs-storagecluster-cephobjectstore.rgw.buckets.index' replicated size 3 min_size 2 crush_rule 7 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 8 'ocs-storagecluster-cephobjectstore.rgw.buckets.non-ec' replicated size 3 min_size 2 crush_rule 8 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 9 '.rgw.root' replicated size 3 min_size 2 crush_rule 9 object_hash rjenkins pg_num 8 pgp_num 8 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 pg_num_min 8 application rook-ceph-rgw
pool 10 'ocs-storagecluster-cephobjectstore.rgw.buckets.data' replicated size 3 min_size 2 crush_rule 10 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 151 flags hashpspool,nearfull stripe_width 0 application rook-ceph-rgw


OCS operator has enabled osd_pool_default_pg_autoscale_mode 

sh-4.2# ceph config dump
WHO                                                 MASK LEVEL    OPTION                             VALUE                              RO 
global                                                   basic    log_file                                                              *  
global                                                   advanced mon_allow_pool_delete              true                                  
global                                                   advanced mon_cluster_log_file                                                     
global                                                   advanced mon_pg_warn_min_per_osd            0                                     
global                                                   advanced osd_pool_default_pg_autoscale_mode on                                    
global                                                   advanced rbd_default_features               3                                     
  mgr                                                    advanced mgr/balancer/active                true                                  
  mgr                                                    advanced mgr/balancer/mode                  upmap                                 
  mgr                                                    advanced mgr/orchestrator_cli/orchestrator  rook                               *  
    mds.ocs-storagecluster-cephfilesystem-a              basic    mds_cache_memory_limit             4294967296                            
    mds.ocs-storagecluster-cephfilesystem-b              basic    mds_cache_memory_limit             4294967296                            
    client.rgw.ocs.storagecluster.cephobjectstore.a      advanced rgw_enable_usage_log               true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.a      advanced rgw_log_nonexistent_bucket         true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.a      advanced rgw_log_object_name_utc            true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.a      advanced rgw_zone                           ocs-storagecluster-cephobjectstore *  
    client.rgw.ocs.storagecluster.cephobjectstore.a      advanced rgw_zonegroup                      ocs-storagecluster-cephobjectstore *  
    client.rgw.ocs.storagecluster.cephobjectstore.b      advanced rgw_enable_usage_log               true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.b      advanced rgw_log_nonexistent_bucket         true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.b      advanced rgw_log_object_name_utc            true                                  
    client.rgw.ocs.storagecluster.cephobjectstore.b      advanced rgw_zone                           ocs-storagecluster-cephobjectstore *  
    client.rgw.ocs.storagecluster.cephobjectstore.b      advanced rgw_zonegroup                      ocs-storagecluster-cephobjectstore *  


Yes the pg autoscaler and balancer module are enabled

sh-4.2# ceph mgr module ls
{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator_cli",
        "progress",
        "rbd_support",
        "status",
        "volumes"
    ],
    "enabled_modules": [
        "iostat",
        "pg_autoscaler",
        "prometheus",
        "restful",
        "rook"
    ],

Comment 2 Yaniv Kaul 2020-11-12 09:23:21 UTC

Most likely a Ceph issue, that I don't see us fixing right now for 4.6.0 (also unlikely a new issue) - deferring to 4.7.

Comment 3 Elad 2020-11-12 22:01:25 UTC

This was already seen before and fixed as part of 1782756. Therefore, most likely a regression. 
In addition to that, the situation is exposed more easily for CNV - bug 1897351

Comment 4 Elad 2020-11-12 22:03:48 UTC

(In reply to Elad from comment #3)
> This was already seen before and fixed as part of 1782756. 
The correct one is https://bugzilla.redhat.com/show_bug.cgi?id=1797918

Comment 6 krishnaram Karthick 2020-11-19 17:59:26 UTC

I was able to reproduce this issue by performing the following steps

1) On a 3 node OCS cluster with one 512 GB OSD per node, fill up the capacity
2) Add capacity; 1 more OSD per node
3) Follow https://access.redhat.com/solutions/3001761 so that recovery IOs can start
4) Allow the rebalance to new OSDs to complete
[At this point, we could see that the PGs are not quite equally distributed]
cat ceph_osd_df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS 
 1   ssd 0.50000  1.00000 512 GiB 380 GiB 379 GiB  43 KiB  1.5 GiB 132 GiB 74.27 1.49 189     up 
 5   ssd 0.50000  1.00000 512 GiB 129 GiB 128 GiB  20 KiB 1024 MiB 383 GiB 25.15 0.50  78     up 
 2   ssd 0.50000  1.00000 512 GiB 388 GiB 386 GiB  39 KiB  1.5 GiB 124 GiB 75.70 1.52 198     up 
 4   ssd 0.50000  1.00000 512 GiB 129 GiB 128 GiB  27 KiB 1024 MiB 383 GiB 25.28 0.51  72     up 
 0   ssd 0.50000  1.00000 512 GiB 366 GiB 364 GiB  47 KiB  1.5 GiB 146 GiB 71.43 1.43 167     up 
 3   ssd 0.50000  1.00000 512 GiB 140 GiB 139 GiB  20 KiB 1024 MiB 372 GiB 27.35 0.55  97     up 
                    TOTAL   3 TiB 1.5 TiB 1.5 TiB 199 KiB  7.5 GiB 1.5 TiB 49.86                 
MIN/MAX VAR: 0.50/1.52  STDDEV: 23.98

5) write more data to fill up the cluster. 
[At this point, one of the OSDs hit full ratio way ahead of the other (new) OSDs]

Comment 7 krishnaram Karthick 2020-11-19 18:06:22 UTC

Created attachment 1731025 [details]
must_gather for comment#6

Comment 8 Josh Durgin 2020-11-20 06:25:58 UTC

(In reply to krishnaram Karthick from comment #7)
> Created attachment 1731025 [details]
> must_gather for comment#6

Thanks for the detailed data Karthick. In your case, the cluster was not rebalanced yet, there were still many backfilling pgs:

  data:
    pools:   3 pools, 288 pgs
    objects: 130.90k objects, 505 GiB
    usage:   1.5 TiB used, 1.5 TiB / 3 TiB avail
    pgs:     96496/392685 objects misplaced (24.573%)
             232 active+clean
             54  active+remapped+backfill_wait
             2   active+remapped+backfilling
 
  io:
    client:   853 B/s rd, 303 MiB/s wr, 1 op/s rd, 94 op/s wr
    recovery: 15 MiB/s, 3 objects/s
 
The balancer won't run until <5% of objects are misplaced. As you can see at this point in time, nearly 25% of the objects were still being rebalanced. Thus in this case, the balancer hasn't run at all. You can verify this by observing that there are no upmaps in ceph_osd_dump, which is how the balancer redistributes pgs.

What happened after this must-gather was taken? I'd expect backfill to complete, and then the balancer to redistribute pgs as needed at that point.

Comment 9 krishnaram Karthick 2020-11-20 07:28:11 UTC

(In reply to Josh Durgin from comment #8)
> (In reply to krishnaram Karthick from comment #7)
> > Created attachment 1731025 [details]
> > must_gather for comment#6
> 
> Thanks for the detailed data Karthick. In your case, the cluster was not
> rebalanced yet, there were still many backfilling pgs:
> 
>   data:
>     pools:   3 pools, 288 pgs
>     objects: 130.90k objects, 505 GiB
>     usage:   1.5 TiB used, 1.5 TiB / 3 TiB avail
>     pgs:     96496/392685 objects misplaced (24.573%)
>              232 active+clean
>              54  active+remapped+backfill_wait
>              2   active+remapped+backfilling
>  
>   io:
>     client:   853 B/s rd, 303 MiB/s wr, 1 op/s rd, 94 op/s wr
>     recovery: 15 MiB/s, 3 objects/s
>  
> The balancer won't run until <5% of objects are misplaced. As you can see at
> this point in time, nearly 25% of the objects were still being rebalanced.
> Thus in this case, the balancer hasn't run at all. You can verify this by
> observing that there are no upmaps in ceph_osd_dump, which is how the
> balancer redistributes pgs.
> 
> What happened after this must-gather was taken? I'd expect backfill to
> complete, and then the balancer to redistribute pgs as needed at that point.

Thanks Josh. 
I don't have the cluster anymore. QE's AWS cluster automatically get deleted after 12 hours. I'll rerun this test and update once I have the results.

Comment 10 krishnaram Karthick 2020-11-23 06:34:57 UTC

I reran the test and waited for a long time. 
I see that this time the OSDs are more evenly distributed. 

After expanding to 6 OSDs:

ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS 
 2   ssd 0.50000  1.00000 512 GiB 216 GiB 215 GiB  72 KiB  1.5 GiB 296 GiB 42.24 0.96 133     up 
 3   ssd 0.50000  1.00000 512 GiB 236 GiB 235 GiB  27 KiB 1024 MiB 276 GiB 46.01 1.04 155     up 
 1   ssd 0.50000  1.00000 512 GiB 226 GiB 224 GiB  72 KiB  1.5 GiB 286 GiB 44.04 1.00 146     up 
 4   ssd 0.50000  1.00000 512 GiB 226 GiB 225 GiB  27 KiB 1024 MiB 286 GiB 44.21 1.00 140     up 
 0   ssd 0.50000  1.00000 512 GiB 239 GiB 238 GiB  75 KiB  1.6 GiB 273 GiB 46.75 1.06 149     up 
 5   ssd 0.50000  1.00000 512 GiB 213 GiB 212 GiB  45 KiB 1024 MiB 299 GiB 41.51 0.94 139     up 
                    TOTAL   3 TiB 1.3 TiB 1.3 TiB 321 KiB  7.6 GiB 1.7 TiB 44.13                 
MIN/MAX VAR: 0.94/1.06  STDDEV: 1.86

After expanding to 9 OSDs:
ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS 
 2   ssd 0.50000  1.00000 512 GiB 293 GiB 292 GiB  42 KiB  1.4 GiB 219 GiB 57.25 0.96  93     up 
 3   ssd 0.50000  1.00000 512 GiB 307 GiB 305 GiB  40 KiB  1.7 GiB 205 GiB 59.92 1.01 103     up 
 6   ssd 0.50000  1.00000 512 GiB 313 GiB 312 GiB  32 KiB 1024 MiB 199 GiB 61.19 1.03  92     up 
 1   ssd 0.50000  1.00000 512 GiB 279 GiB 277 GiB  55 KiB  1.8 GiB 233 GiB 54.54 0.92  93     up 
 4   ssd 0.50000  1.00000 512 GiB 307 GiB 305 GiB  43 KiB  1.5 GiB 205 GiB 59.87 1.01  95     up 
 7   ssd 0.50000  1.00000 512 GiB 328 GiB 327 GiB  35 KiB 1024 MiB 184 GiB 63.98 1.08  99     up 
 0   ssd 0.50000  1.00000 512 GiB 328 GiB 327 GiB  51 KiB  1.4 GiB 184 GiB 64.05 1.08 101     up 
 5   ssd 0.50000  1.00000 512 GiB 272 GiB 271 GiB  39 KiB  1.4 GiB 240 GiB 53.20 0.89  89     up 
 8   ssd 0.50000  1.00000 512 GiB 313 GiB 312 GiB  24 KiB 1024 MiB 199 GiB 61.05 1.03  98     up 
                    TOTAL 4.5 TiB 2.7 TiB 2.7 TiB 366 KiB   12 GiB 1.8 TiB 59.45                 
MIN/MAX VAR: 0.89/1.08  STDDEV: 3.58

Comment 11 Yaniv Kaul 2020-11-23 12:52:47 UTC

(In reply to krishnaram Karthick from comment #10)
> I reran the test and waited for a long time. 
> I see that this time the OSDs are more evenly distributed. 

Good - so what's the next step?

> 
> After expanding to 6 OSDs:
> 
> ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL  
> %USE  VAR  PGS STATUS 
>  2   ssd 0.50000  1.00000 512 GiB 216 GiB 215 GiB  72 KiB  1.5 GiB 296 GiB
> 42.24 0.96 133     up 
>  3   ssd 0.50000  1.00000 512 GiB 236 GiB 235 GiB  27 KiB 1024 MiB 276 GiB
> 46.01 1.04 155     up 
>  1   ssd 0.50000  1.00000 512 GiB 226 GiB 224 GiB  72 KiB  1.5 GiB 286 GiB
> 44.04 1.00 146     up 
>  4   ssd 0.50000  1.00000 512 GiB 226 GiB 225 GiB  27 KiB 1024 MiB 286 GiB
> 44.21 1.00 140     up 
>  0   ssd 0.50000  1.00000 512 GiB 239 GiB 238 GiB  75 KiB  1.6 GiB 273 GiB
> 46.75 1.06 149     up 
>  5   ssd 0.50000  1.00000 512 GiB 213 GiB 212 GiB  45 KiB 1024 MiB 299 GiB
> 41.51 0.94 139     up 
>                     TOTAL   3 TiB 1.3 TiB 1.3 TiB 321 KiB  7.6 GiB 1.7 TiB
> 44.13                 
> MIN/MAX VAR: 0.94/1.06  STDDEV: 1.86
> 
> After expanding to 9 OSDs:
> ceph osd df
> ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL  
> %USE  VAR  PGS STATUS 
>  2   ssd 0.50000  1.00000 512 GiB 293 GiB 292 GiB  42 KiB  1.4 GiB 219 GiB
> 57.25 0.96  93     up 
>  3   ssd 0.50000  1.00000 512 GiB 307 GiB 305 GiB  40 KiB  1.7 GiB 205 GiB
> 59.92 1.01 103     up 
>  6   ssd 0.50000  1.00000 512 GiB 313 GiB 312 GiB  32 KiB 1024 MiB 199 GiB
> 61.19 1.03  92     up 
>  1   ssd 0.50000  1.00000 512 GiB 279 GiB 277 GiB  55 KiB  1.8 GiB 233 GiB
> 54.54 0.92  93     up 
>  4   ssd 0.50000  1.00000 512 GiB 307 GiB 305 GiB  43 KiB  1.5 GiB 205 GiB
> 59.87 1.01  95     up 
>  7   ssd 0.50000  1.00000 512 GiB 328 GiB 327 GiB  35 KiB 1024 MiB 184 GiB
> 63.98 1.08  99     up 
>  0   ssd 0.50000  1.00000 512 GiB 328 GiB 327 GiB  51 KiB  1.4 GiB 184 GiB
> 64.05 1.08 101     up 
>  5   ssd 0.50000  1.00000 512 GiB 272 GiB 271 GiB  39 KiB  1.4 GiB 240 GiB
> 53.20 0.89  89     up 
>  8   ssd 0.50000  1.00000 512 GiB 313 GiB 312 GiB  24 KiB 1024 MiB 199 GiB
> 61.05 1.03  98     up 
>                     TOTAL 4.5 TiB 2.7 TiB 2.7 TiB 366 KiB   12 GiB 1.8 TiB
> 59.45                 
> MIN/MAX VAR: 0.89/1.08  STDDEV: 3.58

Comment 12 Mudit Agarwal 2020-11-24 08:08:58 UTC

As the balancer is working expected, this is not a regression or blocker.
Removing the blocker flag as discussed in the OCS meeting yesterday, probably this should be closed as not a bug.

Comment 13 krishnaram Karthick 2020-11-26 03:06:22 UTC

(In reply to Yaniv Kaul from comment #11)
> (In reply to krishnaram Karthick from comment #10)
> > I reran the test and waited for a long time. 
> > I see that this time the OSDs are more evenly distributed. 
> 
> Good - so what's the next step?
> 

Reaching out to the performance team running CNV workloads to see if this is seen on a scaled-up cluster with CNV workload as that is where the issue was originally seen.

Comment 14 Mudit Agarwal 2020-11-26 12:47:46 UTC

Moving out of 4.6, once we have the inputs from perf team we can move forward.

Comment 15 Josh Durgin 2021-01-15 23:06:03 UTC

(In reply to krishnaram Karthick from comment #13)
> (In reply to Yaniv Kaul from comment #11)
> > (In reply to krishnaram Karthick from comment #10)
> > > I reran the test and waited for a long time. 
> > > I see that this time the OSDs are more evenly distributed. 
> > 
> > Good - so what's the next step?
> > 
> 
> Reaching out to the performance team running CNV workloads to see if this is
> seen on a scaled-up cluster with CNV workload as that is where the issue was
> originally seen.

Any update on this?

Comment 16 krishnaram Karthick 2021-01-18 06:09:57 UTC

(In reply to Josh Durgin from comment #15)
> (In reply to krishnaram Karthick from comment #13)
> > (In reply to Yaniv Kaul from comment #11)
> > > (In reply to krishnaram Karthick from comment #10)
> > > > I reran the test and waited for a long time. 
> > > > I see that this time the OSDs are more evenly distributed. 
> > > 
> > > Good - so what's the next step?
> > > 
> > 
> > Reaching out to the performance team running CNV workloads to see if this is
> > seen on a scaled-up cluster with CNV workload as that is where the issue was
> > originally seen.
> 
> Any update on this?

The last time I reached out, I couldn't get a CNV system that runs with a storage capacity as described in the bug. 
But, I'm retaining the needinfo to check once again, Or maybe see if there is an automated test that we could run on our test environments.

Comment 17 Mudit Agarwal 2021-01-22 09:01:12 UTC

Please reopen if you see this again.

Comment 18 krishnaram Karthick 2021-03-30 04:02:00 UTC

Removing the needinfo flag. We weren't able to reproduce this scenario.

Comment 19 Red Hat Bugzilla 2023-09-15 00:51:02 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days