Bug 1836359 - Update OSD requests and limits
Summary: Update OSD requests and limits
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.5.0
Assignee: Jose A. Rivera
QA Contact: krishnaram Karthick
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-05-15 17:17 UTC by Kyle Bader
Modified: 2020-09-23 09:05 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-15 10:17:07 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 599 0 None closed Bug 1836359: Update OSD QoS to Guaranteed 2020-11-19 14:19:26 UTC
Red Hat Product Errata RHBA-2020:3754 0 None None None 2020-09-15 10:17:34 UTC

Comment 2 Yaniv Kaul 2020-06-24 08:34:59 UTC
This looks like an important item for performance/sizing. What's the status?

Comment 3 Jose A. Rivera 2020-06-29 15:26:14 UTC
I guess we can do this for OCS 4.5, though there's the question on whether we want to have OSDs with Guaranteed QoS or not (see https://bugzilla.redhat.com/show_bug.cgi?id=1781785). If so we'd have to make sure the CPU limits and requests match. Kyle?

Comment 4 Orit Wasserman 2020-06-30 08:01:02 UTC
Let's make it Guaranteed QoS:
Memory 5G
CPU: 2

Users that will want more performance will need to change the cpu requests/limits in the CRD.
We can improved it in 4.6.
Kyle works for you?

Comment 5 Michael Adam 2020-06-30 21:39:21 UTC
There was a PR for this (https://github.com/openshift/ocs-operator/pull/521) which has been closed again.
It was raised against release-4.5, and at least for my perception the suggested changes seemed to come a bit out of the blue.
But the performance test results mentioned in the description of this BZ give a good indication.

It seems that we first need a bit more discussion about changing these defaults.
And the patch needs to be done against master first.
If we can reach concensus and explain the changes in the PR, I agree that we could take it into 4.5.

Comment 6 Michael Adam 2020-06-30 21:47:22 UTC
I think as far as robustness is concerned, this is also somewhat related to the topic of adding priority classes: 

https://bugzilla.redhat.com/show_bug.cgi?id=1776876

Comment 7 Jose A. Rivera 2020-07-01 13:53:12 UTC
We don't have time to resolve this before this Thursday. If we're going to take it in OCs 4.5 we'll need an exception.

Comment 8 Kyle Bader 2020-07-01 16:41:14 UTC
I think Orit's proposal is a great compromise, short of doing something dynamically.

Comment 9 Michael Adam 2020-07-01 16:42:56 UTC
(In reply to Jose A. Rivera from comment #7)
> We don't have time to resolve this before this Thursday. If we're going to
> take it in OCs 4.5 we'll need an exception.

My point was that *if* we reach consensus, then it'll be easy to include it in 4.5 :-)

(In reply to Kyle Bader from comment #8)
> I think Orit's proposal is a great compromise, short of doing something
> dynamically.

It seems we got the consensus now :-D

Comment 10 Jose A. Rivera 2020-07-01 16:48:02 UTC
PR is up: https://github.com/openshift/ocs-operator/pull/597

It should also be noted that this will impact this Jira: https://issues.redhat.com/browse/RHSTOR-967

Comment 11 Elad 2020-07-01 17:27:58 UTC
Will require full regression testing, with an emphasis on performance, for verification.

Comment 12 Jose A. Rivera 2020-07-01 19:31:22 UTC
PR merged.

Comment 13 Michael Adam 2020-07-02 06:49:10 UTC
bot not working... adding missing ACKs.


POST is the correct status - that PR was the master PR

Comment 15 Michael Adam 2020-07-02 06:53:38 UTC
backport PR https://github.com/openshift/ocs-operator/pull/599

Comment 16 Michael Adam 2020-07-02 07:17:56 UTC
d/s patch merged

Comment 19 krishnaram Karthick 2020-08-18 13:38:30 UTC
With the 4.5.0-54.ci build, no memory or cpu related issues were seen in the performance automation run. 
Also, tests around heavy IO + OSD failures were run. Failure and recovery was seamless. 

Build used to verify:

oc get csv -n openshift-storage
NAME                        DISPLAY                       VERSION       REPLACES   PHASE
ocs-operator.v4.5.0-54.ci   OpenShift Container Storage   4.5.0-54.ci              Succeeded

I'll wait for tier1 & scale test automation analysis to ensure no issues are seen around this change before moving this bug to verified.

Comment 20 krishnaram Karthick 2020-08-21 03:46:49 UTC
No issues were seen around OCS related resources with other automation tiers. Moving the bug to verified.

Scale analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00456.html
Performance analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00427.html
tier1 analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00425.html

Moving the bug to verified.

Comment 22 errata-xmlrpc 2020-09-15 10:17:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3754


Note You need to log in before you can comment on or make changes to this bug.