Description of problem: Cloned from : https://bugzilla.redhat.com/show_bug.cgi?id=1668315 Version-Release number of selected component (if applicable): 3.11.306 Secure environment. Customer is seeing a non zero value for container_network_tcp_usage_total and container_network_udp_usage_total. As per the bug mentioned earlier (1668315) and https://github.com/google/cadvisor/issues/1925 , these values are supposed to be zero and disabled. However, this doesn't seem to be the case. Example 1: [openshift@master-1 ~]$ server=app-node-0.openshift.mydomain [openshift@master-1 ~]$ curl -s -X GET -H "Authorization: Bearer $(oc whoami -t)" https://$server:10250/metrics/cadvisor |egrep '(container_network_tcp_usage_total|container_network_udp_usage_total)' |wc -l 3694 Example2: The way to check is by running the following query in the prometheus ui: URL: https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/graph?g0.range_input=1h&g0.expr=topk(10%2C%20count%20by%20(__name__)(%7B__name__%3D~%22.%2B%22%7D))&g0.tab=1 Query: topk(10, count by (__name__)({__name__=~".+"})) Results: container_network_tcp_usage_total has a non-zero value 175450, when it is supposed to be zero, and this is creating an extra load on the monitoring solution. cAdvisor is producing metrics even though it is not supposed to causing performance problems and later on affecting their ability to monitor the environments effectively.
Sorry, re-opening this as we need the fix which was done for https://bugzilla.redhat.com/show_bug.cgi?id=1668315 in OCP 4.1 backported to OCP3.11. Basically, the stats (as per https://bugzilla.redhat.com/show_bug.cgi?id=1668315#c3) should be zero, when they are not. Happy to provide more info.
tested with ose-cluster-monitoring-operator:v3.11.463, container_network_tcp_usage_total and container_network_udp_usage_total metrics are removed # token=`oc sa get-token prometheus-k8s -n openshift-monitoring` # oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep -E "container_network_tcp_usage_total|container_network_udp_usage_total" no result
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.465 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2639