The following pods run in the BestEffort QoS with no resource requests openshift-operator-lifecycle-manager/catalog-operator openshift-operator-lifecycle-manager/olm-operator openshift-operator-lifecycle-manager/olm-operators openshift-operator-lifecycle-manager/packageserver https://github.com/openshift/origin/pull/22787 This can cause eviction, OOMKilling, and CPU starvation. Please add the following to the resource requests to the pods in this component: Memory: olm-operator 160Mi catalog-operator 80Mi olm-operators 50Mi packageserver 50Mi CPU: 10m for all
Additionally openshift-marketplace/certified-operators openshift-marketplace/community-operators openshift-marketplace/marketplace-operator openshift-marketplace/redhat-operators Memory: certified-operators 80Mi community-operators 80Mi marketplace-operator 50Mi redhat-operators 50Mi CPU: 10m for all
Could I get some feedback on this bug? This effort targets 4.2 and blocks this story in Jira https://jira.coreos.com/browse/POD-144 This are the last components to bring into compliance.
https://jira.coreos.com/browse/OLM-1130 is slated for this sprint, so it should be done soon.
https://github.com/operator-framework/operator-lifecycle-manager/pull/955
Looks good for the OLM component, and the `QoS Class` of pods are `Burstable` now. mac:~ jianzhang$ oc get pods NAME READY STATUS RESTARTS AGE catalog-operator-548956f758-vhldk 1/1 Running 0 24h olm-operator-85f7475cf-kqb49 1/1 Running 0 23h packageserver-7c6b67fc64-sh8td 1/1 Running 0 5h44m packageserver-7c6b67fc64-z7mhq 1/1 Running 0 5h44m mac:~ jianzhang$ oc describe pods |grep Requests: -A 2 Requests: cpu: 10m memory: 80Mi -- Requests: cpu: 10m memory: 160Mi -- Requests: cpu: 10m memory: 50Mi -- Requests: cpu: 10m memory: 50Mi mac:~ jianzhang$ oc describe pods |grep "QoS Class" QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable But, for the marketplace part, the pods still use the `BestEffort` and no request specify. Change status to `ASSIGNED` and move on to marketplace sub-component. mac:~ jianzhang$ oc describe pods -n openshift-marketplace | grep "Requests" mac:~ jianzhang$ oc describe pods -n openshift-marketplace | grep "QoS Class" QoS Class: BestEffort QoS Class: BestEffort QoS Class: BestEffort QoS Class: BestEffort QoS Class: BestEffort QoS Class: BestEffort
(In reply to Seth Jennings from comment #1) > Additionally > > openshift-marketplace/certified-operators > openshift-marketplace/community-operators > openshift-marketplace/marketplace-operator > openshift-marketplace/redhat-operators > > Memory: > certified-operators 80Mi > community-operators 80Mi > marketplace-operator 50Mi > redhat-operators 50Mi > > CPU: > 10m for all I'm not sure that this is a good way to handle these. We can certainly add these constraints for the openshift-marketplace/marketplace-operator pod, but the other three are operands that happened to be created by default and their needs will change over time as the content they host changes. In fact, the fact that the suggestion was that certified-operators and community-operators should request different resources implies that their resource needs will change as the content they host changes: those pods are identical aside from external content they are serving. How were these numbers obtained? Do you have a better suggestion than using BestEffort on these pods, given that they will definitely have different resource constraints depending on what content they are serving from quay?
https://github.com/operator-framework/operator-marketplace/pull/229
[scolange@scolange operators]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.nightly-2019-08-27-072819 Now we have: [scolange@scolange operators]$ oc describe pods community-operators-69ff689f5d-rwpwk -n openshift-marketplace | grep "QoS Class" QoS Class: BestEffort [scolange@scolange operators]$ oc describe pods certified-operators-64bc446dcf-9r4zb -n openshift-marketplace | grep "QoS Class" QoS Class: BestEffort [scolange@scolange operators]$ oc describe pods redhat-operators-7b9cd9c994-9cqts -n openshift-marketplace | grep "QoS Class" QoS Class: Burstable [scolange@scolange operators]$ oc describe pods marketplace-operator-df8d68d67-xddq9 -n openshift-marketplace | grep "QoS Class" QoS Class: Burstable I thinks should be setted to Burstable to manage any change in the future. What do you think?
Hi Salvatore, I'm not totally sure how that is possible, given that the pods `community-operators-*` `certified-operators-*` and `redhat-operators-*` are generated in code in the exact same way. Can you give me some context on this cluster? Is it an upgrade from a previous version? Can you try killing the pods and recreating them, then checking again? Can you also try to get all of the deployments to see if spec.resources.requests is specified in all of them, and what the values are if so? Thanks!
Hi Kevin , you are right maybe wrong cluster sorry! Below the step: [scolange@scolange ]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.ci-2019-08-28-103038 True False 153m Cluster version is 4.2.0-0.ci-2019-08-28-103038 [scolange@scolange ]$ oc get pods -n openshift-marketplace NAME READY STATUS RESTARTS AGE certified-operators-77c9c6b9c9-znx47 1/1 Running 0 165m community-operators-d5cb7dbf4-hgbzp 1/1 Running 0 165m marketplace-operator-65d498f785-kvbqp 1/1 Running 0 165m redhat-operators-66fdd79ff5-4h8mm 1/1 Running 0 104m [scolange@scolange ]$ oc describe pods -n openshift-marketplace|grep "QoS Class" QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable QoS Class: Burstable [scolange@scolange ]$ oc describe pods -n openshift-marketplace | grep "Requests" Requests: Requests: Requests: Requests:
I updated openshift-tests to not exclude openshift-marketplace and it works indeed. Closing the gate to prevent regression https://github.com/openshift/origin/pull/23690
Based on your comments (and the fact that I deployed a 4.2 cluster today and saw the same behavior), I am going to mark this as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922