Bug 1897313 - HPA monitoring cpu utilization fails for deployments which have init containers
Summary: HPA monitoring cpu utilization fails for deployments which have init containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.5
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 4.5.z
Assignee: Joel Smith
QA Contact: Weinan Liu
URL:
Whiteboard:
Depends On: 1895532
Blocks: 1899649
TreeView+ depends on / blocked
 
Reported: 2020-11-12 18:41 UTC by OpenShift BugZilla Robot
Modified: 2020-12-17 18:58 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-12-01 10:48:49 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift k8s-prometheus-adapter pull 37 0 None closed [release-4.5] Bug 1897313: Populate both CPU and Memory resource container metrics if one is specified 2021-02-01 10:17:02 UTC
Red Hat Knowledge Base (Solution) 5428871 0 None None None 2020-12-17 18:58:25 UTC
Red Hat Product Errata RHSA-2020:5194 0 None None None 2020-12-01 10:49:02 UTC

Comment 1 Joel Smith 2020-11-13 17:27:03 UTC
We won't be able to fix this bug until the 4.6 fix has been verified. Hopefully we can add it very soon.

Comment 3 Weinan Liu 2020-11-20 08:59:08 UTC
Failed test on 
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-11-18-080032   True        False         30m     Cluster version is 4.5.0-0.nightly-2020-11-18-080032

It seems not to have the fix included

$ oc get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods | jq .
{
  "kind": "PodMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"
  },
  "items": []
}

...

onditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: did not receive metrics for any ready pods

...

Comment 4 Weinan Liu 2020-11-23 06:40:49 UTC
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-11-22-160319   True        False         117m    Cluster version is 4.5.0-0.nightly-2020-11-22-160319

The build is ready for test.

Comment 5 Weinan Liu 2020-11-23 06:43:01 UTC
Issue verified to be fixed.

ScalingActive goes to True

...
Deployment pods:       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
...


$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods | jq .
{
  "kind": "PodMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods"
  },
  "items": [
    {
      "metadata": {
        "name": "gerbil-54cdbb9fc-fkr5n",
        "namespace": "weinliu",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods/gerbil-54cdbb9fc-fkr5n",
        "creationTimestamp": "2020-11-23T06:36:47Z"
      },
      "timestamp": "2020-11-23T06:36:47Z",
      "window": "5m0s",
      "containers": [
        {
          "name": "gerbil",
          "usage": {
            "cpu": "992m",
            "memory": "1328Ki"
          }
        },
        {
          "name": "gerbil-init",
          "usage": {
            "cpu": "0",       #<== Issue fixed
            "memory": "0"
          }
        }
      ]
    }
  ]
}

Comment 8 errata-xmlrpc 2020-12-01 10:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.21 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5194


Note You need to log in before you can comment on or make changes to this bug.