1938493 – machine-api-operator declares restrictive cpu and memory limits where it should not

Bug 1938493 - machine-api-operator declares restrictive cpu and memory limits where it should not

Summary: machine-api-operator declares restrictive cpu and memory limits where it shou...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cloud Compute
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.8.0
Assignee:	Alexander Demicev
QA Contact:	sunzhaohua
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1938580 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-03-13 22:32 UTC by Clayton Coleman
Modified:	2021-07-27 22:53 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-27 22:53:17 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift machine-api-operator pull 827	0	None	open	Bug 1938493: Revert "Add resource limit to pods"	2021-03-15 09:53:16 UTC
Red Hat Product Errata	RHSA-2021:2438	0	None	None	None	2021-07-27 22:53:36 UTC

Description Clayton Coleman 2021-03-13 22:32:44 UTC

All payload components should request a reasonable minimum CPU and p90 memory usage and avoid limits on scaling components.

https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#resources-and-limits

https://github.com/openshift/machine-api-operator/pull/825 added a number of limits which require exceptions, and in several cases those limits fail the "only components that have completely deterministic workload regardless of cluster scale may set limits on memory and CPU" rule.

For now, all the limits should be removed and reintroduced later with approval for the specific workload containers that can justify its use.

Referenced from the new e2e test which gates components without resource requests and enforces the resource conventions.

Comment 1 Joel Speed 2021-03-15 09:54:31 UTC

*** Bug 1938580 has been marked as a duplicate of this bug. ***

Comment 3 pdsilva 2021-03-18 10:29:04 UTC

Verified installation on Power(ppc64le). No errors seen.

# oc version
Client Version: 4.8.0-0.nightly-ppc64le-2021-03-18-074956
Server Version: 4.8.0-0.nightly-ppc64le-2021-03-18-074956
Kubernetes Version: v1.20.0+e1bc274

# oc get co machine-api
NAME          VERSION                                     AVAILABLE   PROGRESSING   DEGRADED   SINCE
machine-api   4.8.0-0.nightly-ppc64le-2021-03-18-074956   True        False         False      92m

# oc get pods -A | grep machine-api
openshift-machine-api                              cluster-autoscaler-operator-57748cbb-95pct                2/2     Running     0          99m
openshift-machine-api                              cluster-baremetal-operator-6b5466c885-qs92l               2/2     Running     0          99m
openshift-machine-api                              machine-api-operator-6889c85fbc-bg8rd                     2/2     Running     0          99m

# oc describe pod machine-api-operator-6889c85fbc-bg8rd -n openshift-machine-api
Name:                 machine-api-operator-6889c85fbc-bg8rd
Namespace:            openshift-machine-api
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 master-2/192.168.26.251
Start Time:           Thu, 18 Mar 2021 04:51:54 -0400
Labels:               k8s-app=machine-api-operator
                      pod-template-hash=6889c85fbc
Annotations:          k8s.v1.cni.cncf.io/network-status:
                        [{
                            "name": "",
                            "interface": "eth0",
                            "ips": [
                                "10.130.0.7"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      k8s.v1.cni.cncf.io/networks-status:
                        [{
                            "name": "",
                            "interface": "eth0",
                            "ips": [
                                "10.130.0.7"
                            ],
                            "default": true,
                            "dns": {}
                        }]
                      openshift.io/scc: restricted
Status:               Running
IP:                   10.130.0.7
IPs:
  IP:           10.130.0.7
Controlled By:  ReplicaSet/machine-api-operator-6889c85fbc
Containers:
  kube-rbac-proxy:
    Container ID:  cri-o://039cb1791a45939d1595be2093495aa44636fa86cb62e9dd7fe2bfe05644cbbf
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://localhost:8080/
      --tls-cert-file=/etc/tls/private/tls.crt
      --tls-private-key-file=/etc/tls/private/tls.key
      --config-file=/etc/kube-rbac-proxy/config-file.yaml
      --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
      --logtostderr=true
      --v=3
    State:          Running
      Started:      Thu, 18 Mar 2021 04:53:01 -0400
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        10m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /etc/kube-rbac-proxy from config (rw)
      /etc/tls/private from machine-api-operator-tls (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-ps8tq (ro)
  machine-api-operator:
    Container ID:  cri-o://eccfa36b5831afe2dc252b0649f125f2d07614ab7916265797d5f9df3c07c274
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe
    Port:          <none>
    Host Port:     <none>
    Command:
      /machine-api-operator
    Args:
      start
      --images-json=/etc/machine-api-operator-config/images/images.json
      --alsologtostderr
      --v=3
    State:          Running
      Started:      Thu, 18 Mar 2021 04:53:16 -0400
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:     10m
      memory:  50Mi
    Environment:
      RELEASE_VERSION:      4.8.0-0.nightly-ppc64le-2021-03-18-074956
      COMPONENT_NAMESPACE:  openshift-machine-api (v1:metadata.namespace)
      METRICS_PORT:         8080
    Mounts:
      /etc/machine-api-operator-config/images from images (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from machine-api-operator-token-ps8tq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      kube-rbac-proxy
    Optional:  false
  images:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      machine-api-operator-images
    Optional:  false
  machine-api-operator-tls:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  machine-api-operator-tls
    Optional:    false
  machine-api-operator-token-ps8tq:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  machine-api-operator-token-ps8tq
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  node-role.kubernetes.io/master=
Tolerations:     node-role.kubernetes.io/master:NoSchedule op=Exists
                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 120s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 120s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  100m               default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  100m               default-scheduler  no nodes available to schedule pods
  Warning  FailedScheduling  94m                default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Warning  FailedScheduling  94m                default-scheduler  0/3 nodes are available: 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         93m                default-scheduler  Successfully assigned openshift-machine-api/machine-api-operator-6889c85fbc-bg8rd to master-2
  Warning  FailedMount       92m (x7 over 93m)  kubelet            MountVolume.SetUp failed for volume "machine-api-operator-tls" : secret "machine-api-operator-tls" not found
  Normal   AddedInterface    92m                multus             Add eth0 [10.130.0.7/23]
  Normal   Pulled            92m                kubelet            Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:feabd53e4be03a277277f08412bd8ea0a3caf0c63c3276fd301d2409647b4fb7" already present on machine
  Normal   Created           92m                kubelet            Created container kube-rbac-proxy
  Normal   Started           92m                kubelet            Started container kube-rbac-proxy
  Normal   Pulling           92m                kubelet            Pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe"
  Normal   Pulled            91m                kubelet            Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:959e83ba3b9024f9cc06f13f10ef70fee5cebd9c773469878e9820c72c7a2efe" in 14.520331526s
  Normal   Created           91m                kubelet            Created container machine-api-operator
  Normal   Started           91m                kubelet            Started container machine-api-operator

Comment 4 sunzhaohua 2021-03-23 03:11:53 UTC

Thanks pdsilva to verify this, move to verified.

Comment 5 Joel Speed 2021-03-24 11:31:29 UTC

*** Bug 1942161 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2021-07-27 22:53:17 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438

Note You need to log in before you can comment on or make changes to this bug.