Bug 1419481

Summary: HPA unable to get metrics for resource cpu: failed to unmarshal heapster response
Product: OpenShift Container Platform Reporter: DeShuai Ma <dma>
Component: NodeAssignee: Solly Ross <sross>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.0CC: aos-bugs, decarr, jokerman, mmccomas, mwringe, penli, tdawson
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-04-12 19:11:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description DeShuai Ma 2017-02-06 10:10:26 UTC
Description of problem:
Create a hpa a scale resource, the hpa can't get metrics correctly, always show error info "unable to get metrics for resource cpu: failed to unmarshal heapster response: json: cannot unmarshal array into Go value of type v1alpha1.PodMetricsList"

Version-Release number of selected component (if applicable):
openshift v3.5.0.16+a26133a
kubernetes v1.5.2+43a9be4
etcd 3.1.0

registry.ops.openshift.com/openshift3/metrics-hawkular-metrics:3.5.0   imageid: fc0e50112581 
registry.ops.openshift.com/openshift3/metrics-cassandra:3.5.0          imageid: aa7e5b2b7210
registry.ops.openshift.com/openshift3/metrics-heapster:3.5.0           imageid: b2cb3298b3db

How reproducible:
Always

Steps to Reproduce:
1. Create a scale resouce
$ oc run resource-consumer --image=gcr.io/google_containers/resource_consumer:beta --expose --port 8080 --requests='cpu=100m,memory=256Mi' -n dma1

2. Create hpa for scale resource
$ oc autoscale dc resource-consumer --min=1 --max=5 -n dma1

3. Check the hpa status
[root@ip-172-18-11-11 ~]# oc describe hpa/resource-consumer -n dma1
Name:                resource-consumer
Namespace:            dma1
Labels:                <none>
Annotations:            <none>
CreationTimestamp:        Mon, 06 Feb 2017 04:32:18 -0500
Reference:            DeploymentConfig/resource-consumer
Target CPU utilization:        80%
Current CPU utilization:    <unset>
Min replicas:            1
Max replicas:            5
Events:
  FirstSeen    LastSeen    Count    From                SubObjectPath    Type        Reason            Message
  ---------    --------    -----    ----                -------------    --------    ------            -------
  12m        9m        8    {horizontal-pod-autoscaler }            Normal        MetricsNotAvailableYet    unable to get metrics for resource cpu: failed to unmarshal heapster response: json: cannot unmarshal array into Go value of type v1alpha1.PodMetricsList
  9m        18s        19    {horizontal-pod-autoscaler }            Warning        FailedGetMetrics    unable to get metrics for resource cpu: failed to unmarshal heapster response: json: cannot unmarshal array into Go value of type v1alpha1.PodMetricsList

Actual results:

Expected results:

Additional info:

Comment 1 Solly Ross 2017-02-06 16:30:12 UTC
There was a change in the type of PodMetricsList at some point to be more in line with the rest of the *List types in Kubernetes (before, it was just an array).  It looks like our version of Heapster is too old to work with our version of Kubernetes.

We'll need a newer version of Heapster.

Comment 5 Troy Dawson 2017-02-07 20:58:50 UTC
It looks like this was my fault, but it's cleared up now.
The original OCP 3.5 repo's that we had were incorrectly setup and pulled in a 1.1.0 version of heapster.  They were fixed fairly quickly, but we didn't update the heapster image.

I have rebuilt the 3.5 metrics-heapster image, and verified that it has heapster-1.2.0 in it.

openshift3/metrics-heapster:3.5.0-2

That image is in the usual OCP 3.5 testing areas.

Comment 6 DeShuai Ma 2017-02-09 08:56:42 UTC
Verify on latest 3.5 heapster image, image tag: 7799362af752, image has updated and hpa can get metrics now.

[root@dhcp-128-7 dma]# oc run resource-consumer --image=gcr.io/google_containers/resource_consumer:beta --expose --port 8080 --requests='cpu=100m,memory=256Mi'
service "resource-consumer" created
deploymentconfig "resource-consumer" created
[root@dhcp-128-7 dma]# oc autoscale dc resource-consumer --min=1 --max=5
deploymentconfig "resource-consumer" autoscaled
[root@dhcp-128-7 dma]# oc get hpa
NAME                REFERENCE                            TARGET    CURRENT   MINPODS   MAXPODS   AGE
resource-consumer   DeploymentConfig/resource-consumer   80%       0%        1         5         <invalid>
[root@dhcp-128-7 dma]# oc describe hpa resource-consumer
Name:				resource-consumer
Namespace:			dma1
Labels:				<none>
Annotations:			<none>
CreationTimestamp:		Thu, 09 Feb 2017 16:52:19 +0800
Reference:			DeploymentConfig/resource-consumer
Target CPU utilization:		80%
Current CPU utilization:	0%
Min replicas:			1
Max replicas:			5
Events:
  FirstSeen	LastSeen	Count	From				SubObjectPath	Type		Reason			Message
  ---------	--------	-----	----				-------------	--------	------			-------
  <invalid>	<invalid>	3	{horizontal-pod-autoscaler }			Normal		MetricsNotAvailableYet	unable to get metrics for resource cpu: no metrics returned from heapster
  <invalid>	<invalid>	2	{horizontal-pod-autoscaler }			Normal		DesiredReplicasComputed	Computed the desired num of replicas: 0 (avgCPUutil: 0, current replicas: 1)

Comment 8 errata-xmlrpc 2017-04-12 19:11:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884