Bug 1674341

Summary: "error: metrics not available yet" for `oc adm top node`
Product: OpenShift Container Platform Reporter: Junqi Zhao <juzhao>
Component: MonitoringAssignee: Frederic Branczyk <fbranczy>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, jokerman, lcosic, mloibl, mmccomas, sjenning, surbania, weinliu
Target Milestone: ---Keywords: Regression, Reopened
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-25 07:27:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Junqi Zhao 2019-02-11 03:19:12 UTC
Description of problem:
$ oc adm top node
error: metrics not available yet


Version-Release number of selected component (if applicable):
$ oc version
oc v4.0.0-0.168.0
kubernetes v1.12.4+bdfe8e3f3a
features: Basic-Auth GSSAPI Kerberos SPNEGO


How reproducible:
Always

Steps to Reproduce:
1. `oc adm top node`
2.
3.

Actual results:
error: metrics not available yet

Expected results:
Should show node metrics

Additional info:

Comment 1 Seth Jennings 2019-02-15 15:47:06 UTC

*** This bug has been marked as a duplicate of bug 1674372 ***

Comment 2 Junqi Zhao 2019-02-27 01:59:03 UTC
It is not a duplicate of bug 1674372, bug 1674372 is fixed, but
$ oc adm top node
error: metrics not available yet

with 
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-26-125216   True        False         48m     Cluster version is 4.0.0-0.nightly-2019-02-26-125216

Re-open it

Comment 3 Seth Jennings 2019-02-27 19:22:40 UTC
Sending to Monitoring.  This worked in 4.0 before once the prometheus-adapter was added, but it appears to have regressed.

$ oc adm top node
error: metrics not available yet

$ oc get pod -n openshift-monitoring
NAME                                          READY   STATUS    RESTARTS   AGE
alertmanager-main-0                           3/3     Running   0          25m
alertmanager-main-1                           3/3     Running   0          25m
alertmanager-main-2                           3/3     Running   0          24m
cluster-monitoring-operator-cf6bc5fc9-jmgvn   1/1     Running   0          139m
grafana-848dfcfcbd-qlhbr                      2/2     Running   0          136m
kube-state-metrics-7897b8c589-z5g6v           3/3     Running   0          136m
node-exporter-2j7qg                           2/2     Running   0          136m
node-exporter-4r76t                           2/2     Running   0          136m
node-exporter-6cszg                           2/2     Running   0          136m
node-exporter-c6rbn                           2/2     Running   0          136m
node-exporter-w84fn                           2/2     Running   0          136m
node-exporter-xbn4c                           2/2     Running   0          136m
prometheus-adapter-94f874779-tzjhj            1/1     Running   0          25m
prometheus-adapter-94f874779-zvf7h            1/1     Running   0          25m
prometheus-k8s-0                              6/6     Running   1          25m
prometheus-k8s-1                              6/6     Running   1          25m
prometheus-operator-7db4b8db8c-qkbcv          1/1     Running   0          139m
telemeter-client-56784bb4f6-ljffr             3/3     Running   0          136m

All targets are scraping successfully.

$ oc get svc -n openshift-monitoring
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
alertmanager-main             ClusterIP   172.30.4.3      <none>        9094/TCP            136m
alertmanager-operated         ClusterIP   None            <none>        9093/TCP,6783/TCP   25m
cluster-monitoring-operator   ClusterIP   None            <none>        8080/TCP            139m
grafana                       ClusterIP   172.30.10.233   <none>        3000/TCP            136m
kube-state-metrics            ClusterIP   None            <none>        8443/TCP,9443/TCP   136m
node-exporter                 ClusterIP   None            <none>        9100/TCP            136m
prometheus-adapter            ClusterIP   172.30.16.141   <none>        443/TCP             136m
prometheus-k8s                ClusterIP   172.30.98.20    <none>        9091/TCP,9092/TCP   136m
prometheus-operated           ClusterIP   None            <none>        9090/TCP            25m
prometheus-operator           ClusterIP   None            <none>        8080/TCP            139m
telemeter-client              ClusterIP   None            <none>        8443/TCP            136m

Comment 5 Frederic Branczyk 2019-02-28 08:31:49 UTC
I noticed this as well while working on https://github.com/openshift/cluster-monitoring-operator/pull/272, this should be fixed with that PR.

Comment 6 Junqi Zhao 2019-02-28 13:53:44 UTC
Issue is fixed with
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-28-054829   True        False         33m     Cluster version is 4.0.0-0.nightly-2019-02-28-054829

$ oc adm top node
NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-10-0-130-134.us-east-2.compute.internal   550m         15%    2640Mi          17%       
ip-10-0-143-87.us-east-2.compute.internal    219m         14%    1472Mi          19%       
ip-10-0-147-116.us-east-2.compute.internal   294m         19%    873Mi           11%       
ip-10-0-159-148.us-east-2.compute.internal   484m         13%    2594Mi          16%       
ip-10-0-167-249.us-east-2.compute.internal   191m         12%    1376Mi          18%       
ip-10-0-173-248.us-east-2.compute.internal   611m         17%    2827Mi          18%

Comment 7 Seth Jennings 2019-02-28 16:08:47 UTC
Confirmed this is working this morning on 4.0.0-0.alpha-2019-02-28-133250

$ oc adm top node
NAME                                         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
ip-10-0-130-226.us-west-1.compute.internal   645m         18%    2500Mi          16%       
ip-10-0-134-63.us-west-1.compute.internal    408m         27%    1196Mi          16%       
ip-10-0-137-162.us-west-1.compute.internal   441m         12%    2308Mi          14%       
ip-10-0-143-91.us-west-1.compute.internal    83m          5%     693Mi           9%        
ip-10-0-145-115.us-west-1.compute.internal   193m         12%    1195Mi          16%       
ip-10-0-147-157.us-west-1.compute.internal   609m         17%    2464Mi          15%

Comment 9 errata-xmlrpc 2019-09-25 07:27:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2820