Bug 1897313

Summary: HPA monitoring cpu utilization fails for deployments which have init containers
Product: OpenShift Container Platform Reporter: OpenShift BugZilla Robot <openshift-bugzilla-robot>
Component: NodeAssignee: Joel Smith <joelsmith>
Node sub component: Autoscaler (HPA, VPA) QA Contact: Weinan Liu <weinliu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akaris, andbartl, aos-bugs, christopher.obrien, ddelcian, fshaikh, jeder, jlee, joelsmith, jokerman, ksathe, mfiedler, nagrawal, nmaynard, oarribas, ocasalsa, openshift-bugs-escalate, pbergene, pkanthal, rpalathi, skrenger, tmckay, tsweeney
Version: 4.5Keywords: ServiceDeliveryImpact
Target Milestone: ---   
Target Release: 4.5.z   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-01 10:48:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1895532    
Bug Blocks: 1899649    

Comment 1 Joel Smith 2020-11-13 17:27:03 UTC
We won't be able to fix this bug until the 4.6 fix has been verified. Hopefully we can add it very soon.

Comment 3 Weinan Liu 2020-11-20 08:59:08 UTC
Failed test on 
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-11-18-080032   True        False         30m     Cluster version is 4.5.0-0.nightly-2020-11-18-080032

It seems not to have the fix included

$ oc get --raw /apis/metrics.k8s.io/v1beta1/namespaces/default/pods | jq .
{
  "kind": "PodMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/default/pods"
  },
  "items": []
}

...

onditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: did not receive metrics for any ready pods

...

Comment 4 Weinan Liu 2020-11-23 06:40:49 UTC
oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.nightly-2020-11-22-160319   True        False         117m    Cluster version is 4.5.0-0.nightly-2020-11-22-160319

The build is ready for test.

Comment 5 Weinan Liu 2020-11-23 06:43:01 UTC
Issue verified to be fixed.

ScalingActive goes to True

...
Deployment pods:       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from cpu resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
...


$ kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods | jq .
{
  "kind": "PodMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods"
  },
  "items": [
    {
      "metadata": {
        "name": "gerbil-54cdbb9fc-fkr5n",
        "namespace": "weinliu",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/namespaces/weinliu/pods/gerbil-54cdbb9fc-fkr5n",
        "creationTimestamp": "2020-11-23T06:36:47Z"
      },
      "timestamp": "2020-11-23T06:36:47Z",
      "window": "5m0s",
      "containers": [
        {
          "name": "gerbil",
          "usage": {
            "cpu": "992m",
            "memory": "1328Ki"
          }
        },
        {
          "name": "gerbil-init",
          "usage": {
            "cpu": "0",       #<== Issue fixed
            "memory": "0"
          }
        }
      ]
    }
  ]
}

Comment 8 errata-xmlrpc 2020-12-01 10:48:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.21 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5194