Hide Forgot
Description of problem: In the monitoring operator the network interface for components are set to record traffic from iface eth0, this is a static setting, that will break some monitors in case iface name is ens192. Version-Release number of selected component (if applicable): 3.11.z How reproducible: Every install Steps to Reproduce: 1. Install OCP with Monitoring stack on platform where iface name is not eth0 2. Check stats for network in grafana / Prometheus 3. Actual results: node:node_net_utilisation:sum_irate Missing in prometheus due to network name not eth0 Expected results: Network graph Additional info: Changed the following record from device=eth0 to device=ens192 for my current cluster, which caused network monitoring to start working like intended. record: node:node_disk_saturation:avg_irate - expr: | sum(irate(node_network_receive_bytes{job="node-exporter",device="eth0"}[1m])) + sum(irate(node_network_transmit_bytes{job="node-exporter",device="eth0"}[1m])) record: :node_net_utilisation:sum_irate - expr: | sum by (node) ( (irate(node_network_receive_bytes{job="node-exporter",device="eth0"}[1m]) + irate(node_network_transmit_bytes{job="node-exporter",device="eth0"}[1m])) * on (namespace, pod) group_left(node) node_namespace_pod:kube_pod_info: ) record: node:node_net_utilisation:sum_irate - expr: | sum(irate(node_network_receive_drop{job="node-exporter",device="eth0"}[1m])) + sum(irate(node_network_transmit_drop{job="node-exporter",device="eth0"}[1m])) record: :node_net_saturation:sum_irate - expr: | sum by (node) ( (irate(node_network_receive_drop{job="node-exporter",device="eth0"}[1m]) + irate(node_network_transmit_drop{job="node-exporter",device="eth0"}[1m])) * on (namespace, pod) group_left(node) node_namespace_pod:kube_pod_info:
The network interface selector is already configurable in kubernetes mixin project [1], defaulting to `eth0` [2]. This gives us the possibility of adjusting the interface selector, but at the wrong stage, at cluster monitoring operator compile time, and not at run time. Maybe mloibl knows of any case, where we have been templating rule values at run time before? [1] https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/rules/rules.libsonnet#L328 [2] https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/master/config.libsonnet#L15
After talking to Matthias, Casey and Frederic, we can change the default value 'device="eth0"' to a regex ignoring the interfaces that we don't want. In the long term the network operator could expose the names for us, which could then be templated into the rules manifest by the cluster monitoring operator. Assigning to Matthias for now. Let me know if you want me to further look into this.
https://github.com/openshift/cluster-monitoring-operator/pull/226 merged, hence this fix will soon be available in Openshift 4.0. Thanks for the report and thanks Matthias for looking into this.
It seems it is the same bug as bug 1654907
*** Bug 1654907 has been marked as a duplicate of this bug. ***
Tested with 4.0.0-0.nightly-2019-03-06-074438 device name is not restricted to eth0, "veth.+" devices are excluded, could show stats for network in grafana / Prometheus
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758