This looks like an important item for performance/sizing. What's the status?
I guess we can do this for OCS 4.5, though there's the question on whether we want to have OSDs with Guaranteed QoS or not (see https://bugzilla.redhat.com/show_bug.cgi?id=1781785). If so we'd have to make sure the CPU limits and requests match. Kyle?
Let's make it Guaranteed QoS: Memory 5G CPU: 2 Users that will want more performance will need to change the cpu requests/limits in the CRD. We can improved it in 4.6. Kyle works for you?
There was a PR for this (https://github.com/openshift/ocs-operator/pull/521) which has been closed again. It was raised against release-4.5, and at least for my perception the suggested changes seemed to come a bit out of the blue. But the performance test results mentioned in the description of this BZ give a good indication. It seems that we first need a bit more discussion about changing these defaults. And the patch needs to be done against master first. If we can reach concensus and explain the changes in the PR, I agree that we could take it into 4.5.
I think as far as robustness is concerned, this is also somewhat related to the topic of adding priority classes: https://bugzilla.redhat.com/show_bug.cgi?id=1776876
We don't have time to resolve this before this Thursday. If we're going to take it in OCs 4.5 we'll need an exception.
I think Orit's proposal is a great compromise, short of doing something dynamically.
(In reply to Jose A. Rivera from comment #7) > We don't have time to resolve this before this Thursday. If we're going to > take it in OCs 4.5 we'll need an exception. My point was that *if* we reach consensus, then it'll be easy to include it in 4.5 :-) (In reply to Kyle Bader from comment #8) > I think Orit's proposal is a great compromise, short of doing something > dynamically. It seems we got the consensus now :-D
PR is up: https://github.com/openshift/ocs-operator/pull/597 It should also be noted that this will impact this Jira: https://issues.redhat.com/browse/RHSTOR-967
Will require full regression testing, with an emphasis on performance, for verification.
PR merged.
bot not working... adding missing ACKs. POST is the correct status - that PR was the master PR
backport PR https://github.com/openshift/ocs-operator/pull/599
d/s patch merged
With the 4.5.0-54.ci build, no memory or cpu related issues were seen in the performance automation run. Also, tests around heavy IO + OSD failures were run. Failure and recovery was seamless. Build used to verify: oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.5.0-54.ci OpenShift Container Storage 4.5.0-54.ci Succeeded I'll wait for tier1 & scale test automation analysis to ensure no issues are seen around this change before moving this bug to verified.
No issues were seen around OCS related resources with other automation tiers. Moving the bug to verified. Scale analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00456.html Performance analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00427.html tier1 analysis thread: http://post-office.corp.redhat.com/archives/ocs-ci/2020-August/msg00425.html Moving the bug to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.5.0 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3754