1950810 – zero for container_network_tcp_usage_total and container_network_udp_usage_total

Bug 1950810 - zero for container_network_tcp_usage_total and container_network_udp_usage_total

Summary: zero for container_network_tcp_usage_total and container_network_udp_usage_total

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Monitoring
Sub Component:
Version:	3.11.0
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Arunprasad Rajkumar
QA Contact:	Junqi Zhao
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-04-18 22:27 UTC by Venkata Tadimarri
Modified:	2024-10-01 17:57 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-07-07 11:01:35 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-monitoring-operator pull 1251	None	open	Bug 1950810: Remove high cardinality metrics from cadvisor and apiserver	2021-06-28 07:30:55 UTC
Github	openshift cluster-monitoring-operator pull 1255	None	open	Bug 1950810: [kube-apiserver/servicemonitor] metricRelabelings must be a property inside spec.endpoints	2021-06-29 10:34:09 UTC
Red Hat Product Errata	RHBA-2021:2639	None	None	None	2021-07-07 11:02:03 UTC

Description Venkata Tadimarri 2021-04-18 22:27:23 UTC

Description of problem:

Cloned from : https://bugzilla.redhat.com/show_bug.cgi?id=1668315


Version-Release number of selected component (if applicable):
 3.11.306


Secure environment.


Customer is seeing a non zero value for container_network_tcp_usage_total and container_network_udp_usage_total.  

As per the bug mentioned earlier (1668315) and  https://github.com/google/cadvisor/issues/1925 , these values are supposed to be zero and disabled. However, this doesn't seem to be the case. 

Example 1: 

[openshift@master-1 ~]$ server=app-node-0.openshift.mydomain
[openshift@master-1 ~]$ curl -s -X GET -H "Authorization: Bearer $(oc whoami -t)" https://$server:10250/metrics/cadvisor |egrep '(container_network_tcp_usage_total|container_network_udp_usage_total)'  |wc -l
3694

Example2:

The way to check is by running the following query in the prometheus ui:


URL: https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/graph?g0.range_input=1h&g0.expr=topk(10%2C%20count%20by%20(__name__)(%7B__name__%3D~%22.%2B%22%7D))&g0.tab=1


Query: topk(10, count by (__name__)({__name__=~".+"}))


Results: container_network_tcp_usage_total has a non-zero value 175450, when it is supposed to be zero, and this is creating an extra load on the monitoring solution. 

cAdvisor is producing metrics even though it is not supposed to causing performance problems and later on affecting their ability to monitor the environments effectively.

Comment 4 Mohammad 2021-04-26 20:39:38 UTC

Sorry, re-opening this as we need the fix which was done for https://bugzilla.redhat.com/show_bug.cgi?id=1668315 in OCP 4.1 backported to OCP3.11.

Basically, the stats (as per https://bugzilla.redhat.com/show_bug.cgi?id=1668315#c3) should be zero, when they are not. Happy to provide more info.

Comment 18 Junqi Zhao 2021-06-30 13:14:59 UTC

tested with ose-cluster-monitoring-operator:v3.11.463, container_network_tcp_usage_total and container_network_udp_usage_total metrics are removed
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "container_network_tcp_usage_total|container_network_udp_usage_total"
no result

Comment 22 errata-xmlrpc 2021-07-07 11:01:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.465 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2639

Note You need to log in before you can comment on or make changes to this bug.