Bug 1984753 - Node exporter veth optimizations do not work if the network type is OVN
Summary: Node exporter veth optimizations do not work if the network type is OVN
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 4.8.z
Assignee: Philip Gough
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1973491
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-22 07:23 UTC by Simon Pasquier
Modified: 2021-09-13 15:21 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1973491
Environment:
Last Closed: 2021-09-07 04:14:05 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1264 0 None open Bug 1978208: Sync dependencies for 4.8 release backports 2021-08-12 08:14:24 UTC
Github openshift cluster-monitoring-operator pull 1323 0 None None None 2021-08-16 11:00:51 UTC
Red Hat Product Errata RHBA-2021:3299 0 None None None 2021-09-07 04:14:18 UTC

Comment 2 Junqi Zhao 2021-08-16 06:33:17 UTC
with https://github.com/openshift/cluster-monitoring-operator/pull/1264,
# oc -n openshift-monitoring get ds node-exporter -oyaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
...
    spec:
      containers:
      - args:
        - --web.listen-address=127.0.0.1:9100
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --no-collector.wifi
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        - --collector.netclass.ignored-devices=^(veth.*|[a-z0-9]+@if\d+)$
        - --collector.netdev.device-exclude=^(veth.*|[a-z0-9]+@if\d+)$
        - --collector.cpu.info
        - --collector.textfile.directory=/var/node_exporter/textfile
        - --no-collector.cpufreq
        image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9de21a6522cb774d0a0c86c5471b8246cca940859ee27f0a9b36560753d772bb
        imagePullPolicy: IfNotPresent
        name: node-exporter
...
the right fix should be 
        - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$
        - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$
see from https://bugzilla.redhat.com/show_bug.cgi?id=1973491#c13

Comment 3 Junqi Zhao 2021-08-16 14:12:54 UTC
tested with openshift/cluster-monitoring-operator/pull/1323 baremetal ovn cluster,
# oc -n openshift-monitoring get ds node-exporter -oyaml
...
    spec:
      containers:
      - args:
        - --web.listen-address=127.0.0.1:9100
        - --path.sysfs=/host/sys
        - --path.rootfs=/host/root
        - --no-collector.wifi
        - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15})$
        - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15})$
        - --collector.cpu.info
        - --collector.textfile.directory=/var/node_exporter/textfile
        - --no-collector.cpufreq
...

# oc get infrastructures/cluster -o jsonpath="{..status.platform}"
None

# oc get network/cluster -oyaml
...
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
...


# oc debug node/worker-0.ci-ln-h1s0ngb-86010.origin-ci-int-aws.dev.rhcloud.com
sh-4.4# chroot /host
sh-4.4# dmesg | grep 'renamed from veth'
[  140.926085] e6439c25e8c2ca6: renamed from vetheb2ef091
[  140.938519] 284c1f65fb6066f: renamed from vethd0d2c727
[  145.340974] 521ef85a66cdc65: renamed from vethfb46bb33
[  145.353725] 65bc4aa2d50c0ef: renamed from veth7b885cc3
[  152.245913] bcec5e4bb6abefc: renamed from veth7a7b3026
[  152.270706] 745c167f0ab1d3a: renamed from veth6751b160
[  152.296985] f92b4417038d5d5: renamed from vethd5973676
[  165.004155] a6cc4e149aaaa8b: renamed from veth1684c118
[  165.391228] 9d91b177119d720: renamed from veth14dc08ad
[  165.407889] 48ae68c9545e985: renamed from vethaf841d72
[  165.438373] 0058da4ad2d0e6f: renamed from veth9ab46033
[  208.033108] aad741b2fd097a3: renamed from veth4276e458
[  248.532324] a9c0f728185d91b: renamed from veth90fa735b
[  860.631500] 71f32b0a9d7e0c2: renamed from veth707d94ef
[  910.526090] 0082d05b3de07e0: renamed from vethfe275372

checked from API, there is not node_network_info for device which renamed from 'veth**'
example
# token=`oc sa get-token prometheus-k8s -n openshift-monitoring`
# oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://prometheus-k8s.openshift-monitoring.svc:9091/api/v1/query?query=node_network_info' | jq | grep 745c167f0ab1d3a
no result

Comment 7 Junqi Zhao 2021-08-30 02:00:52 UTC
move to VERIFIED based on Comment 6

Comment 12 errata-xmlrpc 2021-09-07 04:14:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.10 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3299


Note You need to log in before you can comment on or make changes to this bug.