Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1610669

Summary: oc adm top node can not get metrics info after metrics-server deployed
Product: OpenShift Container Platform Reporter: Weinan Liu <weinliu>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, jokerman, mmccomas, weinliu, wsun
Target Milestone: ---   
Target Release: 3.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-11 07:23:00 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Weinan Liu 2018-08-01 07:49:40 UTC
Description of problem:
oc adm top node can not get metrics info after metrics-server deployed


Version-Release number of selected component (if applicable):
[root@qe-weinliu-311-master-etcd-1 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)
[root@qe-weinliu-311-master-etcd-1 ~]# oc version
oc v3.11.0-0.10.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-weinliu-311-master-etcd-1:8443
openshift v3.11.0-0.10.0
kubernetes v1.11.0+d4cacc0

[nathan@localhost openshift-ansible]$ rpm -qa|grep ansible-2.6.2-1.el7.ans.noarch
ansible-2.6.2-1.el7.ans.noarch

[nathan@localhost openshift-ansible]$ git branch
* (HEAD detached at openshift-ansible-3.11.0-0.9.0)
  master
  release-3.10
  release-3.9

How reproducible:
always

Steps to Reproduce:
1. Update qe-inventory-host-file to include metrics install parameters:
  openshift_metrics_install_metrics=True

2. Install metrics by
$ ansible-playbook -i qe-inventory-host-file playbooks/openshift-metrics/config.yml 

3. Check metris-server pods STATUS
[root@qe-weinliu-311-master-etcd-1 ~]# oc get pod -n openshift-monitoring
NAME                              READY     STATUS    RESTARTS   AGE
metrics-server-7675bf5d75-qr69d   1/1       Running   0          11m

4.Check node status by oc adm top node 

Actual results:
[root@qe-weinliu-311-master-etcd-1 ~]# oc adm top node
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: tabwriter: panic during Flush
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xfd4f8d]

goroutine 1 [running]:
text/tabwriter.handlePanic(0xc4215c2d88, 0x2bc6f23, 0x5)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:458 +0x111
panic(0x27a8ce0, 0x47321e0)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/runtime/panic.go:502 +0x229
text/tabwriter.(*Writer).write0(0xc420956c00, 0xc420b12140, 0x4, 0x13c)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:230 +0x2d
text/tabwriter.(*Writer).writeLines(0xc420956c00, 0x0, 0x0, 0x4, 0x20)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:297 +0x16c
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x0, 0x4, 0x5)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:385 +0x2ac
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x4, 0x4, 0x0)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:379 +0x1b4
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x4, 0x4, 0x3)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:379 +0x1b4
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x4, 0x4, 0x2)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:379 +0x1b4
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x4, 0x4, 0x1)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:379 +0x1b4
text/tabwriter.(*Writer).format(0xc420956c00, 0x0, 0x4, 0x5, 0x5)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:379 +0x1b4
text/tabwriter.(*Writer).flush(0xc420956c00, 0x0, 0x0)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:484 +0x156
text/tabwriter.(*Writer).Flush(0xc420956c00, 0xc42004c6c0, 0xc420a32000)
        /opt/rh/go-toolset-7/root/usr/lib/go-toolset-7-golang/src/text/tabwriter/tabwriter.go:467 +0x2b
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/metricsutil.(*TopCmdPrinter).PrintNodeMetrics(0xc42119b910, 0xc4211a0500, 0x3, 0x4, 0xc4215c30b0, 0x0, 0x0)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/metricsutil/metrics_printer.go:80 +0x3ab
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.TopNodeOptions.RunTopNode(0x0, 0x0, 0x0, 0x0, 0x3158f00, 0xc42119a920, 0x2bdfc02, 0xf, 0x2bce13c, 0x8, ...)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/top_node.go:219 +0x49d
github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd.NewCmdTopNode.func1(0xc421432000, 0x49d61a8, 0x0, 0x0)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/top_node.go:114 +0x165
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).execute(0xc421432000, 0x49d61a8, 0x0, 0x0, 0xc421432000, 0x49d61a8)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:760 +0x2c1
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc4212cfb80, 0xc42000e010, 0xc42000e020, 0xc4212cfb80)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:846 +0x30a
github.com/openshift/origin/vendor/github.com/spf13/cobra.(*Command).Execute(0xc4212cfb80, 0x2, 0xc4212cfb80)
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/vendor/github.com/spf13/cobra/command.go:794 +0x2b
main.main()
        /builddir/build/BUILD/atomic-openshift-git-0.766dbc4/_output/local/go/src/github.com/openshift/origin/cmd/oc/oc.go:49 +0x365

Expected results:
oc adm top node
NAME                                    CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
qe-weinliu-gce-master-etcd-1            305m         30%       2131Mi          62%       
qe-weinliu-gce-node-1                   63m          6%        1017Mi          29%       
qe-weinliu-gce-node-registry-router-1   60m          6%        1006Mi          29%  

Additional info:
[root@qe-weinliu-311-master-etcd-1 ~]# oc get --raw /apis/metrics.k8s.io/v1beta1/nodes?pretty=true
{
  "kind": "NodeMetricsList",
  "apiVersion": "metrics.k8s.io/v1beta1",
  "metadata": {
    "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
  },
  "items": [
    {
      "metadata": {
        "name": "qe-weinliu-311-master-etcd-1",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/qe-weinliu-311-master-etcd-1",
        "creationTimestamp": "2018-07-31T09:13:52Z"
      },
      "timestamp": "2018-07-31T09:13:30Z",
      "window": "1m0s",
      "usage": {
        "cpu": "409m",
        "memory": "3215900Ki"
      }
    },
    {
      "metadata": {
        "name": "qe-weinliu-311-node-1",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/qe-weinliu-311-node-1",
        "creationTimestamp": "2018-07-31T09:13:52Z"
      },
      "timestamp": "2018-07-31T09:13:30Z",
      "window": "1m0s",
      "usage": {
        "cpu": "89m",
        "memory": "1792784Ki"
      }
    },
    {
      "metadata": {
        "name": "qe-weinliu-311-node-registry-router-1",
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/qe-weinliu-311-node-registry-router-1",
        "creationTimestamp": "2018-07-31T09:13:52Z"
      },
      "timestamp": "2018-07-31T09:13:30Z",
      "window": "1m0s",
      "usage": {
        "cpu": "104m",
        "memory": "1804444Ki"
      }
    }
  ]
}[root@qe-weinliu-311-master-etcd-1 ~]# 

[root@qe-weinliu-311-master-etcd-1 ~]# oc describe apiservices v1beta1.metrics.k8s.io
Name:         v1beta1.metrics.k8s.io
Namespace:    
Labels:       kubernetes.io/cluster-service=true
              metrics-server-infra=support
Annotations:  kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"apiregistration.k8s.io/v1beta1","kind":"APIService","metadata":{"annotations":{},"labels":{"kubernetes.io/cluster-service":"true","metri...
API Version:  apiregistration.k8s.io/v1
Kind:         APIService
Metadata:
  Creation Timestamp:  2018-07-31T08:59:49Z
  Resource Version:    20286
  Self Link:           /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
  UID:                 1468952a-94a0-11e8-b194-42010af00009
Spec:
  Ca Bundle:               LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM1akNDQWM2Z0F3SUJBZ0lCQVRBTkJna3Foa2lHOXcwQkFRc0ZBREFrTVNJd0lBWURWUVFEREJsdFpYUnkKYVdOekxYTnBaMjVsY2tBeE5UTXpNREkzTlRBNU1CNFhEVEU0TURjek1UQTROVGd6TUZvWERUSXpNRGN6TURBNApOVGd6TVZvd0pERWlNQ0FHQTFVRUF3d1piV1YwY21samN5MXphV2R1WlhKQU1UVXpNekF5TnpVd09UQ0NBU0l3CkRRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFKKzZHQkRBaU9JYldCRkRNSTFMeDM3cVpnMUsKUHhtNXFjUXA2M3YrY2dNbmZvR3F1dEFXcWQ5aHRualV3bHgrZ1B1c0U1QU9vYU9JZE9vaW0wNUNiYklMVTVJTApHRjJ5YjRqWkw0MmM3WXZERlZQZFlqRzBlbG9yUnJSeUVFZDc1Uk5wcjk1RTNHdzk3N1MrdGhWMndWOWxLZmlrCndGNmh1ZEZ4T0U3dWFRb0dPaVlTMlVRUnV6YmdjMmlqV1dOOGVNbHFHdEIvVHdpMUI4MmtXbVVBR0FpQVhFUGIKTS9RRDNIYzNQMHhFWHVKcy9HWUprelBzeFdaQnlpUTNCWnlzd00wTDBMYm4yNFplNjBxdXU2QnhTZHFjbnZyTwprbmx2dVpQaDRNNTJ3WHU3VEYvMDl3cVU4U0JMQ09QT1BLV3BYNGJ0WmZ4K3RuNy9nY0YxaDZmQ3B4OENBd0VBCkFhTWpNQ0V3RGdZRFZSMFBBUUgvQkFRREFnS2tNQThHQTFVZEV3RUIvd1FGTUFNQkFmOHdEUVlKS29aSWh2Y04KQVFFTEJRQURnZ0VCQUZqSUkxQVhzV1cxbG1seVNrWjJ0T2k5U3k4MmZMRE5YdWNNV2VOZlF0RkNBWGxuV21hMgpSZEROME52WWJxN3VQMDVYeTJSTUdtK3VGTzEwb3VyWWhRbmxuNFkwaTcvajhkem9XRDIzdkZ0N1JDRFliWnowCmZydXF6RGJCbWY1R2VvOWgxNDNwS2RYdlR0aThxQmVoTkpsYXo5MkFCZVU5aktPbkFHUGVTNHdSMkJIS09yNGQKNkgvLzZVS3NHL0ZaT2ZhMW51MzFjSW9NTW02Z0RmWGJUSmJwc1BOK1FNTzYydC9TMVV0dkhYcEI3Q1NXM0ZhTwo0ZVMzUmUyRlFSY1hMSGs3TEZjeGJPQXE2QlRJNUVNU3lFUU5ZNHVjbUFxeHFtaEE3LzRVa3BZbDRtZ3YxWGJaCktIM28xYXFwRHlaOStnem54MDFETytiaFBuYkMzbGlUZldrPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  Group:                   metrics.k8s.io
  Group Priority Minimum:  100
  Service:
    Name:            metrics-server
    Namespace:       openshift-monitoring
  Version:           v1beta1
  Version Priority:  100
Status:
  Conditions:
    Last Transition Time:  2018-07-31T09:00:27Z
    Message:               all checks passed
    Reason:                Passed
    Status:                True
    Type:                  Available
Events:                    <none>

Current CPU usage fails to be retreived either:
[root@qe-weinliu-311-master-etcd-1 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/autoscaling/hpa-v2beta1/resource-metrics-cpu.yaml -n dma
horizontalpodautoscaler.autoscaling/resource-cpu created
[root@qe-weinliu-311-master-etcd-1 ~]# oc get hpa.v2beta1.autoscaling resource-cpu -n dma
NAME           REFERENCE                               TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
resource-cpu   ReplicationController/hello-openshift   <unknown>/80%   2         10        0          10s
[root@qe-weinliu-311-master-etcd-1 ~]# oc describe hpa.v2beta1.autoscaling resource-cpu -n dma
Name:                                                  resource-cpu
Namespace:                                             dma
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Tue, 31 Jul 2018 05:34:56 -0400
Reference:                                             ReplicationController/hello-openshift
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 80%
Min replicas:                                          2
Max replicas:                                          10
ReplicationController pods:                            0 current / 0 desired
Events:                                                <none>

Comment 2 Solly Ross 2018-08-02 19:21:38 UTC
Fix PR: https://github.com/openshift/origin/pull/20529.

It's not a metrics-server issue, necessarily.  A refactor was done to the kubectl code to change how IO streams were passed in, and the way the `kubectl top` command was changed made it easy to accidentally ignore those and never set streams.

Comment 3 Weinan Liu 2018-08-07 11:07:51 UTC
Still fails on 
[root@qe-weinliu-311-master-etcd-1 ~]# oc version
oc v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-weinliu-311-master-etcd-1:8443
openshift v3.11.0-0.11.0
kubernetes v1.11.0+d4cacc0
[root@qe-weinliu-311-master-etcd-1 ~]#

Comment 4 Solly Ross 2018-08-08 14:40:42 UTC
Just confirmed that this works on master.  Please make sure that you're using the latest code before retesting.

Comment 6 Weinan Liu 2018-08-16 06:03:04 UTC
Verified to be fixed.

[root@qe-weinliu-round3-private-2-me-1 ~]# oc adm top node
NAME                                 CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%   
qe-weinliu-round3-private-2-me-1     543m         13%       3510Mi          23%       
qe-weinliu-round3-private-2-node-1   172m         4%        2284Mi          15%       
qe-weinliu-round3-private-2-nrr-1    138m         3%        1872Mi          12%       
[root@qe-weinliu-round3-private-2-me-1 ~]# oc version
oc v3.11.0-0.16.0
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://qe-weinliu-round3-private-2-me-1:8443
openshift v3.11.0-0.16.0
kubernetes v1.11.0+d4cacc0
[root@qe-weinliu-round3-private-2-me-1 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Atomic Host release 7.4

Comment 8 errata-xmlrpc 2018-10-11 07:23:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2652