Bug 1924946 - [RFE] Add ability to set primary-affinity on OSDs
Summary: [RFE] Add ability to set primary-affinity on OSDs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: OCS 4.8.0
Assignee: Shachar Sharon
QA Contact: Shrivaibavi Raghaventhiran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-02-03 23:56 UTC by Neha Ojha
Modified: 2023-09-15 01:00 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
.The overall iops on OSDs with primary-affinity less than one is reduced This enhancement adds the ability to set primary-affinity on OSDs which can help in reducing the overall load from a subset of OSDs in a non-balanced cluster; in particular where an OSD shares its physical device with another.
Clone Of:
Environment:
Last Closed: 2021-08-03 18:15:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 1178 0 None closed Allow passing explicit PrimaryAffinity 2021-05-10 13:13:40 UTC
Github rook rook pull 7807 0 None closed ceph: allow setting primary-affinity to osd 2021-05-10 13:37:38 UTC
Red Hat Product Errata RHBA-2021:3003 0 None None None 2021-08-03 18:15:58 UTC

Description Neha Ojha 2021-02-03 23:56:47 UTC
Description of problem (please be detailed as possible and provide log
snippests):

The ability to set primary-affinity for an OSD will help us to disallow a particular OSD from becoming the primary for any PG. This can be achieved by setting primary-affinity 0 for that OSD.

Comment 2 Yaniv Kaul 2021-02-04 07:58:36 UTC
Use case: using a partition on the root disk (which hosts the RHCOS OS - YAY!) - so we'd like to ensure it gets less stressed vs. other OSD.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1924949

Comment 3 Michael Adam 2021-02-15 08:04:48 UTC
acking after internal discussion

Comment 4 Yaniv Kaul 2021-02-15 08:50:42 UTC
Raz, can you provide your ACK?

Comment 5 Raz Tamir 2021-02-17 07:20:40 UTC
Acked

Comment 10 Yaniv Kaul 2021-04-13 16:15:43 UTC
This is critically important for a strategic customer. Please prioritize for 4.8.

Comment 12 Michael Adam 2021-04-27 15:28:23 UTC
This was mentioned in https://issues.redhat.com/browse/KNIP-1616 but didn't make it into 4.8 feature freeze for that.
Checking where we are with it.
Best I know Shachar was working on it. Reassigning.

Shachar, can you give us an update where we are?
(I.e. if there is a chance to still get the code completed for 4.8 dev freeze.)
If you are not working on it, please reassign to me for redistribution.

Comment 13 Shachar Sharon 2021-04-28 07:41:21 UTC
Primary-affinity for OSD is still a work-in-progress. Currently, I have preliminary prototype patches for rook & OCS, still working on few fixes and improvments.
Next steps:
1) Open ROOK issue + detailed design doc
2) Review comments from rook team
3) Fixes, dev-testing and PR

Most likely will be ready for 4.8 z-stream

Comment 14 Anat Eyal 2021-05-02 08:17:14 UTC
(In reply to Shachar Sharon from comment #13)
> Primary-affinity for OSD is still a work-in-progress. Currently, I have
> preliminary prototype patches for rook & OCS, still working on few fixes and
> improvments.
> Next steps:
> 1) Open ROOK issue + detailed design doc
> 2) Review comments from rook team
> 3) Fixes, dev-testing and PR
> 
> Most likely will be ready for 4.8 z-stream

We are considering accepting this change, even after the dev freeze. Please provide an estimated date for completion.

Comment 15 Shachar Sharon 2021-05-02 08:31:18 UTC
ROOK's code is ready for review, will submit a PR by the end of this work-day (May 2nd 2021).
Expecting comments + fixes + repeated dev-testing to take few days. If everything goes as expected, code will be merged by beginning of next week. 

The OCS code is rather trivial.

Comment 16 Yaniv Kaul 2021-06-02 08:48:19 UTC
Upstream PRs are merged. What's the next step? (there hasn't been an update here for ~1 month, and this is a critical feature for 4.8)

Comment 17 Shachar Sharon 2021-06-02 09:01:25 UTC
PrimaryAffinity (and its sibling, InitialWeight) are part of 4.8 release. Currently, in QE testing.
@

Comment 18 Shachar Sharon 2021-06-02 09:02:57 UTC
@

Comment 21 Yaniv Kaul 2021-06-16 08:36:59 UTC
Is anyone looking at the above comment?

Comment 39 Sahina Bose 2021-07-05 09:19:39 UTC
Have we eliminated network issues as cause for the primary affinity not being set correctly?

Comment 40 Boaz 2021-07-05 09:50:36 UTC
@ssharon just a short update:
last week we redeployed OCS using CI build with a fix for BZ1970503 (fix is good), since then I was not able to reproduce the primary-affinity issue in which only some of the OSD's get updated with the new value.

Comment 41 Sahina Bose 2021-07-05 11:50:51 UTC
Moving back to ON_QA based on comment 40

Comment 42 Mudit Agarwal 2021-07-05 14:50:17 UTC
Pls add doc text

Comment 49 Olive Lakra 2021-07-09 04:47:53 UTC
Mudit - please review the revised doc text and share feedback

Comment 50 Mudit Agarwal 2021-07-09 08:34:24 UTC
Some modification:

.The overall iops on OSDs with primary-affinity less than one is reduced 
This enhancement adds the ability to set primary-affinity on OSDs which can help in reducing the overall load from subset of OSDs in a non-balanced cluster; in particular where an OSD shares its physical device with other.

Comment 51 Shrivaibavi Raghaventhiran 2021-07-21 16:11:53 UTC
Test Environment:
-------------------
GS configuration :
-----------------
* Platform - BM
* Replica 2 compression enabled
* Root osd weight 0.167TiB
* Primary affinity for root disks 0
* RBD only enabled
* Total 6 osds in cluster (3 - master root disk, 3 - worker root disk)

Versions:
----------
OCP - 4.8.0-fc.8
OCS - ocs-operator.v4.8.0-450.ci

Observations:
--------------
* Set Initial weight and primary affinity on root disk osds during deployment.
* The root disk size was 334GiB, hence set initial weight as 0.167TiB
* Set primary affinity as 0

Had almost filled 50%, so far we notice that the root disk utilization is lesser compared to other OSDs as expected due to the primary affinity we set. 

The root disk OSDs are not primary OSDs, Hence marking this BZ as Verified.

Console Output :
-----------------

$ oc rsh -n openshift-storage rook-ceph-tools-64d88c9b9f-5kpxw
sh-4.4# ceph -s
  cluster:
    id:     601ba532-40f7-419e-bb30-0b6c995354aa
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 19m)
    mgr: a(active, since 4d)
    osd: 6 osds: 6 up (since 7h), 6 in (since 4d)
 
  data:
    pools:   1 pools, 256 pgs
    objects: 519.43k objects, 2.0 TiB
    usage:   3.9 TiB used, 3.6 TiB / 7.5 TiB avail
    pgs:     256 active+clean
 
  io:
    client:   391 KiB/s rd, 633 KiB/s wr, 195 op/s rd, 234 op/s wr
 
sh-4.4# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                                               STATUS REWEIGHT PRI-AFF 
 -1       7.04457 root default                                                                    
 -8       2.34819     rack rack0                                                                  
 -7       0.16699         host dell-r640-013-dsal-lab-eng-rdu2-redhat-com                         
  1   hdd 0.16699             osd.1                                           up  1.00000       0 
-17       2.18120         host dell-r730-040-dsal-lab-eng-rdu2-redhat-com                         
  4   hdd 2.18120             osd.4                                           up  1.00000 1.00000 
 -4       2.34819     rack rack1                                                                  
 -3       0.16699         host dell-r640-007-dsal-lab-eng-rdu2-redhat-com                         
  0   hdd 0.16699             osd.0                                           up  1.00000       0 
-15       2.18120         host dell-r730-020-dsal-lab-eng-rdu2-redhat-com                         
  3   hdd 2.18120             osd.3                                           up  1.00000 1.00000 
-12       2.34819     rack rack2                                                                  
-19       2.18120         host dell-r640-012-dsal-lab-eng-rdu2-redhat-com                         
  5   hdd 2.18120             osd.5                                           up  1.00000 1.00000 
-11       0.16699         host dell-r730-023-dsal-lab-eng-rdu2-redhat-com                         
  2   hdd 0.16699             osd.2                                           up  1.00000       0 
sh-4.4# 
sh-4.4# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS 
 1   hdd 0.16699  1.00000 335 GiB 104 GiB 103 GiB  16 KiB 1024 MiB 231 GiB 30.95 0.59  13     up 
 4   hdd 2.18120  1.00000 2.2 TiB 1.3 TiB 1.3 TiB 123 KiB  2.6 GiB 944 GiB 57.73 1.11 164     up 
 0   hdd 0.16699  1.00000 335 GiB  72 GiB  71 GiB   4 KiB 1024 MiB 263 GiB 21.38 0.41   9     up 
 3   hdd 2.18120  1.00000 2.2 TiB 1.2 TiB 1.2 TiB 103 KiB  2.6 GiB 957 GiB 57.14 1.09 163     up 
 5   hdd 2.18120  1.00000 2.2 TiB 1.2 TiB 1.2 TiB  83 KiB  2.5 GiB 1.0 TiB 54.06 1.04 154     up 
 2   hdd 0.16699  1.00000 335 GiB  72 GiB  71 GiB   4 KiB 1024 MiB 262 GiB 21.61 0.41   9     up 
                    TOTAL 7.5 TiB 3.9 TiB 3.9 TiB 335 KiB   11 GiB 3.6 TiB 52.18                 
MIN/MAX VAR: 0.41/1.11  STDDEV: 19.97

ceph pg dump output:
-----------------------
https://privatebin-it-iso.int.open.paas.redhat.com/?2c2368c42e18088c#GuPXVokeRx1yALmd1BibfV7qEFPapPaD8LzvszjibC3Z

Comment 53 errata-xmlrpc 2021-08-03 18:15:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Comment 54 Red Hat Bugzilla 2023-09-15 01:00:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.