Bug 1950810 - zero for container_network_tcp_usage_total and container_network_udp_usage_total
Summary: zero for container_network_tcp_usage_total and container_network_udp_usage_total
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 3.11.0
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Arunprasad Rajkumar
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-04-18 22:27 UTC by Venkata Tadimarri
Modified: 2021-09-20 13:23 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-07-07 11:01:35 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1251 0 None open Bug 1950810: Remove high cardinality metrics from cadvisor and apiserver 2021-06-28 07:30:55 UTC
Github openshift cluster-monitoring-operator pull 1255 0 None open Bug 1950810: [kube-apiserver/servicemonitor] metricRelabelings must be a property inside spec.endpoints 2021-06-29 10:34:09 UTC
Red Hat Product Errata RHBA-2021:2639 0 None None None 2021-07-07 11:02:03 UTC

Description Venkata Tadimarri 2021-04-18 22:27:23 UTC
Description of problem:

Cloned from : https://bugzilla.redhat.com/show_bug.cgi?id=1668315


Version-Release number of selected component (if applicable):
 3.11.306


Secure environment.


Customer is seeing a non zero value for container_network_tcp_usage_total and container_network_udp_usage_total.  

As per the bug mentioned earlier (1668315) and  https://github.com/google/cadvisor/issues/1925 , these values are supposed to be zero and disabled. However, this doesn't seem to be the case. 

Example 1: 

[openshift@master-1 ~]$ server=app-node-0.openshift.mydomain
[openshift@master-1 ~]$ curl -s -X GET -H "Authorization: Bearer $(oc whoami -t)" https://$server:10250/metrics/cadvisor |egrep '(container_network_tcp_usage_total|container_network_udp_usage_total)'  |wc -l
3694

Example2:

The way to check is by running the following query in the prometheus ui:


URL: https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/graph?g0.range_input=1h&g0.expr=topk(10%2C%20count%20by%20(__name__)(%7B__name__%3D~%22.%2B%22%7D))&g0.tab=1


Query: topk(10, count by (__name__)({__name__=~".+"}))


Results: container_network_tcp_usage_total has a non-zero value 175450, when it is supposed to be zero, and this is creating an extra load on the monitoring solution. 

cAdvisor is producing metrics even though it is not supposed to causing performance problems and later on affecting their ability to monitor the environments effectively.

Comment 4 Mohammad 2021-04-26 20:39:38 UTC
Sorry, re-opening this as we need the fix which was done for https://bugzilla.redhat.com/show_bug.cgi?id=1668315 in OCP 4.1 backported to OCP3.11.

Basically, the stats (as per https://bugzilla.redhat.com/show_bug.cgi?id=1668315#c3) should be zero, when they are not. Happy to provide more info.

Comment 18 Junqi Zhao 2021-06-30 13:14:59 UTC
tested with ose-cluster-monitoring-operator:v3.11.463, container_network_tcp_usage_total and container_network_udp_usage_total metrics are removed
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "container_network_tcp_usage_total|container_network_udp_usage_total"
no result

Comment 22 errata-xmlrpc 2021-07-07 11:01:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.465 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2639


Note You need to log in before you can comment on or make changes to this bug.