1924946 – [RFE] Add ability to set primary-affinity on OSDs

Bug 1924946 - [RFE] Add ability to set primary-affinity on OSDs

Summary: [RFE] Add ability to set primary-affinity on OSDs

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.8.0
Assignee:	Shachar Sharon
QA Contact:	Shrivaibavi Raghaventhiran
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-02-03 23:56 UTC by Neha Ojha
Modified:	2023-09-15 01:00 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	.The overall iops on OSDs with primary-affinity less than one is reduced This enhancement adds the ability to set primary-affinity on OSDs which can help in reducing the overall load from a subset of OSDs in a non-balanced cluster; in particular where an OSD shares its physical device with another.
Clone Of:
Environment:
Last Closed:	2021-08-03 18:15:14 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 1178	None	closed	Allow passing explicit PrimaryAffinity	2021-05-10 13:13:40 UTC
Github	rook rook pull 7807	None	closed	ceph: allow setting primary-affinity to osd	2021-05-10 13:37:38 UTC
Red Hat Product Errata	RHBA-2021:3003	None	None	None	2021-08-03 18:15:58 UTC

Description Neha Ojha 2021-02-03 23:56:47 UTC

Description of problem (please be detailed as possible and provide log
snippests):

The ability to set primary-affinity for an OSD will help us to disallow a particular OSD from becoming the primary for any PG. This can be achieved by setting primary-affinity 0 for that OSD.

Comment 2 Yaniv Kaul 2021-02-04 07:58:36 UTC

Use case: using a partition on the root disk (which hosts the RHCOS OS - YAY!) - so we'd like to ensure it gets less stressed vs. other OSD.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1924949

Comment 3 Michael Adam 2021-02-15 08:04:48 UTC

acking after internal discussion

Comment 4 Yaniv Kaul 2021-02-15 08:50:42 UTC

Raz, can you provide your ACK?

Comment 5 Raz Tamir 2021-02-17 07:20:40 UTC

Acked

Comment 10 Yaniv Kaul 2021-04-13 16:15:43 UTC

This is critically important for a strategic customer. Please prioritize for 4.8.

Comment 12 Michael Adam 2021-04-27 15:28:23 UTC

This was mentioned in https://issues.redhat.com/browse/KNIP-1616 but didn't make it into 4.8 feature freeze for that.
Checking where we are with it.
Best I know Shachar was working on it. Reassigning.

Shachar, can you give us an update where we are?
(I.e. if there is a chance to still get the code completed for 4.8 dev freeze.)
If you are not working on it, please reassign to me for redistribution.

Comment 13 Shachar Sharon 2021-04-28 07:41:21 UTC

Primary-affinity for OSD is still a work-in-progress. Currently, I have preliminary prototype patches for rook & OCS, still working on few fixes and improvments.
Next steps:
1) Open ROOK issue + detailed design doc
2) Review comments from rook team
3) Fixes, dev-testing and PR

Most likely will be ready for 4.8 z-stream

Comment 14 Anat Eyal 2021-05-02 08:17:14 UTC

(In reply to Shachar Sharon from comment #13)
> Primary-affinity for OSD is still a work-in-progress. Currently, I have
> preliminary prototype patches for rook & OCS, still working on few fixes and
> improvments.
> Next steps:
> 1) Open ROOK issue + detailed design doc
> 2) Review comments from rook team
> 3) Fixes, dev-testing and PR
> 
> Most likely will be ready for 4.8 z-stream

We are considering accepting this change, even after the dev freeze. Please provide an estimated date for completion.

Comment 15 Shachar Sharon 2021-05-02 08:31:18 UTC

ROOK's code is ready for review, will submit a PR by the end of this work-day (May 2nd 2021).
Expecting comments + fixes + repeated dev-testing to take few days. If everything goes as expected, code will be merged by beginning of next week. 

The OCS code is rather trivial.

Comment 16 Yaniv Kaul 2021-06-02 08:48:19 UTC

Upstream PRs are merged. What's the next step? (there hasn't been an update here for ~1 month, and this is a critical feature for 4.8)

Comment 17 Shachar Sharon 2021-06-02 09:01:25 UTC

PrimaryAffinity (and its sibling, InitialWeight) are part of 4.8 release. Currently, in QE testing.
@

Comment 18 Shachar Sharon 2021-06-02 09:02:57 UTC

Comment 21 Yaniv Kaul 2021-06-16 08:36:59 UTC

Is anyone looking at the above comment?

Comment 39 Sahina Bose 2021-07-05 09:19:39 UTC

Have we eliminated network issues as cause for the primary affinity not being set correctly?

Comment 40 Boaz 2021-07-05 09:50:36 UTC

@ssharon just a short update:
last week we redeployed OCS using CI build with a fix for BZ1970503 (fix is good), since then I was not able to reproduce the primary-affinity issue in which only some of the OSD's get updated with the new value.

Comment 41 Sahina Bose 2021-07-05 11:50:51 UTC

Moving back to ON_QA based on comment 40

Comment 42 Mudit Agarwal 2021-07-05 14:50:17 UTC

Pls add doc text

Comment 49 Olive Lakra 2021-07-09 04:47:53 UTC

Mudit - please review the revised doc text and share feedback

Comment 50 Mudit Agarwal 2021-07-09 08:34:24 UTC

Some modification:

.The overall iops on OSDs with primary-affinity less than one is reduced 
This enhancement adds the ability to set primary-affinity on OSDs which can help in reducing the overall load from subset of OSDs in a non-balanced cluster; in particular where an OSD shares its physical device with other.

Comment 51 Shrivaibavi Raghaventhiran 2021-07-21 16:11:53 UTC

Test Environment:
-------------------
GS configuration :
-----------------
* Platform - BM
* Replica 2 compression enabled
* Root osd weight 0.167TiB
* Primary affinity for root disks 0
* RBD only enabled
* Total 6 osds in cluster (3 - master root disk, 3 - worker root disk)

Versions:
----------
OCP - 4.8.0-fc.8
OCS - ocs-operator.v4.8.0-450.ci

Observations:
--------------
* Set Initial weight and primary affinity on root disk osds during deployment.
* The root disk size was 334GiB, hence set initial weight as 0.167TiB
* Set primary affinity as 0

Had almost filled 50%, so far we notice that the root disk utilization is lesser compared to other OSDs as expected due to the primary affinity we set. 

The root disk OSDs are not primary OSDs, Hence marking this BZ as Verified.

Console Output :
-----------------

$ oc rsh -n openshift-storage rook-ceph-tools-64d88c9b9f-5kpxw
sh-4.4# ceph -s
  cluster:
    id:     601ba532-40f7-419e-bb30-0b6c995354aa
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 19m)
    mgr: a(active, since 4d)
    osd: 6 osds: 6 up (since 7h), 6 in (since 4d)
 
  data:
    pools:   1 pools, 256 pgs
    objects: 519.43k objects, 2.0 TiB
    usage:   3.9 TiB used, 3.6 TiB / 7.5 TiB avail
    pgs:     256 active+clean
 
  io:
    client:   391 KiB/s rd, 633 KiB/s wr, 195 op/s rd, 234 op/s wr
 
sh-4.4# ceph osd tree
ID  CLASS WEIGHT  TYPE NAME                                               STATUS REWEIGHT PRI-AFF 
 -1       7.04457 root default                                                                    
 -8       2.34819     rack rack0                                                                  
 -7       0.16699         host dell-r640-013-dsal-lab-eng-rdu2-redhat-com                         
  1   hdd 0.16699             osd.1                                           up  1.00000       0 
-17       2.18120         host dell-r730-040-dsal-lab-eng-rdu2-redhat-com                         
  4   hdd 2.18120             osd.4                                           up  1.00000 1.00000 
 -4       2.34819     rack rack1                                                                  
 -3       0.16699         host dell-r640-007-dsal-lab-eng-rdu2-redhat-com                         
  0   hdd 0.16699             osd.0                                           up  1.00000       0 
-15       2.18120         host dell-r730-020-dsal-lab-eng-rdu2-redhat-com                         
  3   hdd 2.18120             osd.3                                           up  1.00000 1.00000 
-12       2.34819     rack rack2                                                                  
-19       2.18120         host dell-r640-012-dsal-lab-eng-rdu2-redhat-com                         
  5   hdd 2.18120             osd.5                                           up  1.00000 1.00000 
-11       0.16699         host dell-r730-023-dsal-lab-eng-rdu2-redhat-com                         
  2   hdd 0.16699             osd.2                                           up  1.00000       0 
sh-4.4# 
sh-4.4# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE  VAR  PGS STATUS 
 1   hdd 0.16699  1.00000 335 GiB 104 GiB 103 GiB  16 KiB 1024 MiB 231 GiB 30.95 0.59  13     up 
 4   hdd 2.18120  1.00000 2.2 TiB 1.3 TiB 1.3 TiB 123 KiB  2.6 GiB 944 GiB 57.73 1.11 164     up 
 0   hdd 0.16699  1.00000 335 GiB  72 GiB  71 GiB   4 KiB 1024 MiB 263 GiB 21.38 0.41   9     up 
 3   hdd 2.18120  1.00000 2.2 TiB 1.2 TiB 1.2 TiB 103 KiB  2.6 GiB 957 GiB 57.14 1.09 163     up 
 5   hdd 2.18120  1.00000 2.2 TiB 1.2 TiB 1.2 TiB  83 KiB  2.5 GiB 1.0 TiB 54.06 1.04 154     up 
 2   hdd 0.16699  1.00000 335 GiB  72 GiB  71 GiB   4 KiB 1024 MiB 262 GiB 21.61 0.41   9     up 
                    TOTAL 7.5 TiB 3.9 TiB 3.9 TiB 335 KiB   11 GiB 3.6 TiB 52.18                 
MIN/MAX VAR: 0.41/1.11  STDDEV: 19.97

ceph pg dump output:
-----------------------
https://privatebin-it-iso.int.open.paas.redhat.com/?2c2368c42e18088c#GuPXVokeRx1yALmd1BibfV7qEFPapPaD8LzvszjibC3Z

Comment 53 errata-xmlrpc 2021-08-03 18:15:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Comment 54 Red Hat Bugzilla 2023-09-15 01:00:29 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.

aeyal
bbenshab
bkunal
danken
idryomov
jarrpa
jespy
jhopper
kramdoss
madam
muagarwa
ocs-bugs
olakra
owasserm
ratamir
rcyriac
sabose
shan
sostapov
ssharon
tdesala