Bug 1867034 - Filesystem and Pod count are No datapoints found for node
Summary: Filesystem and Pod count are No datapoints found for node
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Management Console
Version: 4.6
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: ralpert
QA Contact: Yanping Zhang
URL:
Whiteboard:
: 1865741 1873044 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-07 08:23 UTC by Junqi Zhao
Modified: 2020-10-27 16:26 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 16:25:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Filesystem and Pod count are No datapoints found for node (235.41 KB, image/png)
2020-08-07 08:23 UTC, Junqi Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift console pull 6419 0 None closed Bug 1867034: Fix no-show pods and filesystem queries 2020-10-05 14:44:10 UTC
Github openshift console pull 6536 0 None closed Bug 1867034: Updated cluster dashboard queries 2020-10-05 14:44:10 UTC
Github openshift console pull 6599 0 None closed Bug 1867034: follow on fix cluster dashboard queries 2020-10-05 14:44:09 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:26:16 UTC

Description Junqi Zhao 2020-08-07 08:23:33 UTC
Created attachment 1710759 [details]
Filesystem and Pod count are No datapoints found for node

Description of problem:
cluster admin, login console, goto "Compute -> Nodes" and select one node to check the node overview.
see from picture, Filesystem and Pod count are No datapoints found for node
checked from API
Filesystem: instance:node_filesystem_usage:sum{instance='qe-anusaxen10-nhx4f-master-0'}
Pod count: kubelet_running_pod_count{instance=~'139.178.76.49:.*'}

these metrics are not in prometheus now, we should use other metrics
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "kubelet_running_pod_count"
no result

# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep "instance:node_filesystem_usage:sum"
no result

Version-Release number of selected component (if applicable):
4.6.0-0.nightly-2020-08-06-131904

How reproducible:
always

Steps to Reproduce:
1. see the description
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Junqi Zhao 2020-08-07 08:33:46 UTC
https://github.com/kubernetes/sig-release/blob/858fc2d68c731352df9ab94b5160436deccf5eab/releases/release-1.19/release-notes-draft.md
Kubelet: following metrics have been renamed: kubelet_running_container_count --> kubelet_running_containers kubelet_running_pod_count --> kubelet_running_pods (#92407, @RainbowMango) [SIG API Machinery, Cluster Lifecycle, Instrumentation and Node]

Comment 2 Jakub Hadvig 2020-08-07 10:18:08 UTC
*** Bug 1865741 has been marked as a duplicate of this bug. ***

Comment 3 Jakub Hadvig 2020-08-07 10:21:51 UTC
Ranaming kubelet_running_pod_count --> kubelet_running_pods is fixing the issue with the pod counts.
Other issue is the FileSystem Usage since the `instance:node_filesystem_usage:sum` was removed completely
from the Prometheus Operator in https://github.com/prometheus-operator/kube-prometheus/pull/617

Comment 4 ralpert 2020-08-20 15:28:49 UTC
I have fixes ready for the pods issues, but I'm waiting to hear back from the monitoring team on what query we should be using for the filesystem issue.

Comment 5 Jakub Hadvig 2020-08-27 09:34:06 UTC
*** Bug 1873044 has been marked as a duplicate of this bug. ***

Comment 9 Yadan Pei 2020-08-31 06:25:06 UTC
The latest accepted 4.6 nightly 4.6.0-0.nightly-2020-08-27-005538 doesn't include the fix PR 

# export PAYLOAD=registry.svc.ci.openshift.org/ocp/release:4.6.0-0.nightly-2020-08-27-005538
[root@preserved-qe-ui-rhel-1 console]# oc adm release info $PAYLOAD --pullspecs | grep console
  console                                        quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a3ccd44b0545258785c52d90d43a2bebc80365f52a7f2eb28601a2957020310
  console-operator                               quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:741428633e21bddb5f6b86970a6d62f9ed2b762bafd074f2bb7a09aa3aaf5d0a
[root@preserved-qe-ui-rhel-1 console]# export CONSOLE_IMAGE=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a3ccd44b0545258785c52d90d43a2bebc80365f52a7f2eb28601a2957020310
[root@preserved-qe-ui-rhel-1 console]# oc image info $CONSOLE_IMAGE | grep commit
             io.openshift.build.commit.id=9cc959b2ec10e83dd0f9c6dc0e70c0ae4fb03daf
             io.openshift.build.commit.url=https://github.com/openshift/console/commit/9cc959b2ec10e83dd0f9c6dc0e70c0ae4fb03daf
[root@preserved-qe-ui-rhel-1 console]# git log 9cc959b2ec10e83dd0f9c6dc0e70c0ae4fb03daf | grep '#6419'   //nothing returns

Comment 10 Yadan Pei 2020-08-31 08:04:03 UTC
nightly build 4.6.0-0.nightly-2020-08-31-012413 contains the fix

In Node Utilization charts, Filesystem and Pods are using the queries in PR, but in Cluster Utilization charts, there are still some issues: 
1) Filesystem -> By Node shows no data
2) Pods -> By Node is still using 'topk(25, sort_desc(sum(avg_over_time(kube_pod_info[5m])) BY (node)))' which is different from the query in Node Utilization charts 'kubelet_running_pods{instance=~'10.0.150.132:.*'}'

Assigning back for another fix

Comment 11 ralpert 2020-08-31 13:30:22 UTC
Thanks for flagging this; I'll work on it today.

Comment 12 ralpert 2020-09-04 20:47:21 UTC
See https://github.com/openshift/console/pull/6536 for updated queries.

Comment 14 Yanping Zhang 2020-09-11 02:58:12 UTC
Checked on ocp 4.6 cluster with payload 4.6.0-0.nightly-2020-09-10-195619.
The fix pr is contained.
Check on Overview -> Cluster Utilization.
Filter "Filesystem" by node, shows "Not Available"; 
Filter "Pod count" by node, click "View more", the query is "topk(25, sort_desc(sum(avg_over_time(kubelet_running_pods{instance=~"<%= ipAddress %>:.*"}[5m])) BY (node)))" and "No datapoints found" is shown.
The issue in Comment 10 is not fixed.

Comment 15 ralpert 2020-09-11 14:47:40 UTC
Let me bounce this back to monitoring; they said the queries were correct.

Comment 17 Yanping Zhang 2020-09-14 07:08:32 UTC
Checked on ocp 4.6 cluster with payload 4.6.0-0.nightly-2020-09-12-230035
Check on Overview -> Cluster Utilization.
Filter "Filesystem" by node, click "View more", it opens metrics page and correct data are shown.
Filter "Pod count" by node, click "View more", it opens metrics page and correct data are shown.

Comment 19 errata-xmlrpc 2020-10-27 16:25:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.