Description of problem (please be detailed as possible and provide log snippests): The ability to set primary-affinity for an OSD will help us to disallow a particular OSD from becoming the primary for any PG. This can be achieved by setting primary-affinity 0 for that OSD.
Use case: using a partition on the root disk (which hosts the RHCOS OS - YAY!) - so we'd like to ensure it gets less stressed vs. other OSD. See also https://bugzilla.redhat.com/show_bug.cgi?id=1924949
acking after internal discussion
Raz, can you provide your ACK?
Acked
This is critically important for a strategic customer. Please prioritize for 4.8.
This was mentioned in https://issues.redhat.com/browse/KNIP-1616 but didn't make it into 4.8 feature freeze for that. Checking where we are with it. Best I know Shachar was working on it. Reassigning. Shachar, can you give us an update where we are? (I.e. if there is a chance to still get the code completed for 4.8 dev freeze.) If you are not working on it, please reassign to me for redistribution.
Primary-affinity for OSD is still a work-in-progress. Currently, I have preliminary prototype patches for rook & OCS, still working on few fixes and improvments. Next steps: 1) Open ROOK issue + detailed design doc 2) Review comments from rook team 3) Fixes, dev-testing and PR Most likely will be ready for 4.8 z-stream
(In reply to Shachar Sharon from comment #13) > Primary-affinity for OSD is still a work-in-progress. Currently, I have > preliminary prototype patches for rook & OCS, still working on few fixes and > improvments. > Next steps: > 1) Open ROOK issue + detailed design doc > 2) Review comments from rook team > 3) Fixes, dev-testing and PR > > Most likely will be ready for 4.8 z-stream We are considering accepting this change, even after the dev freeze. Please provide an estimated date for completion.
ROOK's code is ready for review, will submit a PR by the end of this work-day (May 2nd 2021). Expecting comments + fixes + repeated dev-testing to take few days. If everything goes as expected, code will be merged by beginning of next week. The OCS code is rather trivial.
Upstream PRs are merged. What's the next step? (there hasn't been an update here for ~1 month, and this is a critical feature for 4.8)
PrimaryAffinity (and its sibling, InitialWeight) are part of 4.8 release. Currently, in QE testing. @
@
Is anyone looking at the above comment?
Have we eliminated network issues as cause for the primary affinity not being set correctly?
@ssharon just a short update: last week we redeployed OCS using CI build with a fix for BZ1970503 (fix is good), since then I was not able to reproduce the primary-affinity issue in which only some of the OSD's get updated with the new value.
Moving back to ON_QA based on comment 40
Pls add doc text
Mudit - please review the revised doc text and share feedback
Some modification: .The overall iops on OSDs with primary-affinity less than one is reduced This enhancement adds the ability to set primary-affinity on OSDs which can help in reducing the overall load from subset of OSDs in a non-balanced cluster; in particular where an OSD shares its physical device with other.
Test Environment: ------------------- GS configuration : ----------------- * Platform - BM * Replica 2 compression enabled * Root osd weight 0.167TiB * Primary affinity for root disks 0 * RBD only enabled * Total 6 osds in cluster (3 - master root disk, 3 - worker root disk) Versions: ---------- OCP - 4.8.0-fc.8 OCS - ocs-operator.v4.8.0-450.ci Observations: -------------- * Set Initial weight and primary affinity on root disk osds during deployment. * The root disk size was 334GiB, hence set initial weight as 0.167TiB * Set primary affinity as 0 Had almost filled 50%, so far we notice that the root disk utilization is lesser compared to other OSDs as expected due to the primary affinity we set. The root disk OSDs are not primary OSDs, Hence marking this BZ as Verified. Console Output : ----------------- $ oc rsh -n openshift-storage rook-ceph-tools-64d88c9b9f-5kpxw sh-4.4# ceph -s cluster: id: 601ba532-40f7-419e-bb30-0b6c995354aa health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 19m) mgr: a(active, since 4d) osd: 6 osds: 6 up (since 7h), 6 in (since 4d) data: pools: 1 pools, 256 pgs objects: 519.43k objects, 2.0 TiB usage: 3.9 TiB used, 3.6 TiB / 7.5 TiB avail pgs: 256 active+clean io: client: 391 KiB/s rd, 633 KiB/s wr, 195 op/s rd, 234 op/s wr sh-4.4# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 7.04457 root default -8 2.34819 rack rack0 -7 0.16699 host dell-r640-013-dsal-lab-eng-rdu2-redhat-com 1 hdd 0.16699 osd.1 up 1.00000 0 -17 2.18120 host dell-r730-040-dsal-lab-eng-rdu2-redhat-com 4 hdd 2.18120 osd.4 up 1.00000 1.00000 -4 2.34819 rack rack1 -3 0.16699 host dell-r640-007-dsal-lab-eng-rdu2-redhat-com 0 hdd 0.16699 osd.0 up 1.00000 0 -15 2.18120 host dell-r730-020-dsal-lab-eng-rdu2-redhat-com 3 hdd 2.18120 osd.3 up 1.00000 1.00000 -12 2.34819 rack rack2 -19 2.18120 host dell-r640-012-dsal-lab-eng-rdu2-redhat-com 5 hdd 2.18120 osd.5 up 1.00000 1.00000 -11 0.16699 host dell-r730-023-dsal-lab-eng-rdu2-redhat-com 2 hdd 0.16699 osd.2 up 1.00000 0 sh-4.4# sh-4.4# ceph osd df ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS 1 hdd 0.16699 1.00000 335 GiB 104 GiB 103 GiB 16 KiB 1024 MiB 231 GiB 30.95 0.59 13 up 4 hdd 2.18120 1.00000 2.2 TiB 1.3 TiB 1.3 TiB 123 KiB 2.6 GiB 944 GiB 57.73 1.11 164 up 0 hdd 0.16699 1.00000 335 GiB 72 GiB 71 GiB 4 KiB 1024 MiB 263 GiB 21.38 0.41 9 up 3 hdd 2.18120 1.00000 2.2 TiB 1.2 TiB 1.2 TiB 103 KiB 2.6 GiB 957 GiB 57.14 1.09 163 up 5 hdd 2.18120 1.00000 2.2 TiB 1.2 TiB 1.2 TiB 83 KiB 2.5 GiB 1.0 TiB 54.06 1.04 154 up 2 hdd 0.16699 1.00000 335 GiB 72 GiB 71 GiB 4 KiB 1024 MiB 262 GiB 21.61 0.41 9 up TOTAL 7.5 TiB 3.9 TiB 3.9 TiB 335 KiB 11 GiB 3.6 TiB 52.18 MIN/MAX VAR: 0.41/1.11 STDDEV: 19.97 ceph pg dump output: ----------------------- https://privatebin-it-iso.int.open.paas.redhat.com/?2c2368c42e18088c#GuPXVokeRx1yALmd1BibfV7qEFPapPaD8LzvszjibC3Z
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days