Description of problem (please be detailed as possible and provide log snippests): The ability to set "osd crush reweight" on an OSD will help us control how much data the system can allocate to that OSD. This is well documented in https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/storage_strategies_guide/crush_administration#crush_weights
Asking after internal discussion. This will likely not only involve ocs-operator but also rook work.
Raz - can you provide a QE ACK?
Acked
This is critically important for a strategic customer. Please prioritize for 4.8.
This has been done as https://issues.redhat.com/browse/KNIP-1616
@muagarwa Please review the revised doc text and share feedback.
LGTM, thanks
Test Environment: ------------------- GS configuration : ----------------- * Platform - BM * Replica 2 compression enabled * Root osd weight 0.167TiB * Primary affinity for root disks 0 * RBD only enabled * Total 6 osds in cluster (3 - master root disk, 3 - worker root disk) Versions: ---------- OCP - 4.8.0-fc.8 OCS - ocs-operator.v4.8.0-450.ci Observations: ------------------ * Set primary affinity as 0 and init weight as 167GiB on root disks during deployment in storagecluster.yaml for each storageDeviceSets * Ran IOs, Filled up a cluster till nearfull and one osd full * The root disk osds are filled less compared to full disk osds, the proportion of pgs assigned matches the weight as expected HENCE MOVING THE BZ TO VERIFIED STATE Console Output: ----------------- sh-4.4# ceph -s cluster: id: 601ba532-40f7-419e-bb30-0b6c995354aa health: HEALTH_ERR 1 backfillfull osd(s) 1 full osd(s) 1 nearfull osd(s) 1 pool(s) full 1/3 mons down, quorum a,c services: mon: 3 daemons, quorum a,c (age 13m), out of quorum: b mgr: a(active, since 5d) osd: 6 osds: 6 up (since 29h), 6 in (since 5d) data: pools: 1 pools, 256 pgs objects: 766.29k objects, 2.9 TiB usage: 5.8 TiB used, 1.7 TiB / 7.5 TiB avail pgs: 256 active+clean sh-4.4# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 1 hdd 0.16699 1.00000 335 GiB 151 GiB 150 GiB 28 KiB 1024 MiB 183 GiB 45.23 0.59 13 up 4 hdd 2.18120 1.00000 2.2 TiB 1.9 TiB 1.9 TiB 170 KiB 3.4 GiB 334 GiB 85.06 1.11 164 up 0 hdd 0.16699 1.00000 335 GiB 105 GiB 104 GiB 16 KiB 1024 MiB 230 GiB 31.31 0.41 9 up 3 hdd 2.18120 1.00000 2.2 TiB 1.8 TiB 1.8 TiB 171 KiB 3.2 GiB 355 GiB 84.11 1.09 163 up 5 hdd 2.18120 1.00000 2.2 TiB 1.7 TiB 1.7 TiB 171 KiB 3.0 GiB 452 GiB 79.74 1.04 154 up 2 hdd 0.16699 1.00000 335 GiB 105 GiB 104 GiB 64 KiB 1024 MiB 230 GiB 31.39 0.41 9 up TOTAL 7.5 TiB 5.8 TiB 5.8 TiB 622 KiB 13 GiB 1.7 TiB 76.85 MIN/MAX VAR: 0.41/1.11 STDDEV: 29.63
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003