OSD: 4.5.13 Description of problem: I ran iperf3 arranging the deployments so that server and client pod were on separate nodes so that a know amount of network traffic was generated (4.95Gbit/s, 618Mi/s). Examining the "USE Method / Node" dashboard and "USE Method / Cluster" I noticed that the Network Utilisation for the Received traffic is twice that expected (1.227Gi/s). The Transmit figures are as expected (611Mi/s). By querying Prometheus directly, I see that instance:node_network_receive_bytes_excluding_lo:rate1m that is reporting a doubled value. instance:node_network_transmit_bytes_excluding_lo:rate1m node_network_receive_bytes_total/node_network_transmit_bytes_total are unaffected. They give figures that tally with the work generated by iperf3. Version-Release number of selected component (if applicable): 4.5.11 / 4.5.13 How reproducible: 100% Steps to Reproduce: 1. Deploy iperf3 server/client (https://github.com/k-wall/iperf3-yamls) 2. Use OpenShift Console to view the Graphana dashboard. Actual results: Received traffic is reported twice the expected value. Expected results: Network utilisation to be reported faithfully. Additional info:
Created attachment 1721546 [details] NetUtilisatiuonUseMethodCluster
Created attachment 1721547 [details] NetUtilisationUseMethodNode
Created attachment 1721548 [details] instance_node_network_receive_bytes_excluding_lo_rate
Created attachment 1721550 [details] iperf-yamls.tar.gz
Can you run the following queries in the Prometheus UI: sum by(device,instance) (rate(node_network_receive_bytes_total{job="node-exporter",instance="xxx",device!="lo"}[1m])) sum by(device,instance) (rate(node_network_transmit_bytes_total{job="node-exporter",node="xxx",device!="lo"}[1m])) That should help to see if there the network traffic is accounted for 2 devices.
I cleaned up my yamls to reproduce the problem. Use these rather than the zip: https://github.com/k-wall/iperf3-yamls So running with these files, I have: kwall@ovpn-113-108 iperf3-yamls % KUBECONFIG=~/src/mk-performance-tests/kafka-config oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES iperf3-client-6bbd6759f7-fxfl2 1/1 Running 0 8m21s 10.128.2.9 ip-10-0-182-239.ec2.internal <none> <none> iperf3-server-749d7b7468-kwdgb 1/1 Running 0 8m20s 10.128.6.7 ip-10-0-143-179.ec2.internal <none> <none> Producing this work: [ 5] 346.00-347.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 347.00-348.00 sec 589 MBytes 4.94 Gbits/sec 0 2.41 MBytes [ 5] 348.00-349.00 sec 589 MBytes 4.94 Gbits/sec 14 2.41 MBytes [ 5] 349.00-350.00 sec 590 MBytes 4.95 Gbits/sec 14 2.41 MBytes [ 5] 350.00-351.00 sec 589 MBytes 4.94 Gbits/sec 0 2.41 MBytes [ 5] 351.00-352.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 352.00-353.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 353.00-354.00 sec 589 MBytes 4.94 Gbits/sec 0 2.41 MBytes [ 5] 354.00-355.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 355.00-356.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 356.00-357.00 sec 589 MBytes 4.94 Gbits/sec 0 2.41 MBytes [ 5] 357.00-358.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 358.00-359.00 sec 590 MBytes 4.95 Gbits/sec 0 2.41 MBytes [ 5] 359.00-360.00 sec 589 MBytes 4.94 Gbits/sec 0 2.41 MBytes I attach the screenshots of the queries you requested (with the instance being the server pod)
Comment on attachment 1721550 [details] iperf-yamls.tar.gz Please use https://github.com/k-wall/iperf3-yamls instead.,
Created attachment 1721788 [details] Capture requested by Simon 1
Created attachment 1721789 [details] Capture requested by Simon 2
Created attachment 1721790 [details] Capture requested by Simon 3
Followed the steps in Comment 0 and checked on 4.7.0-0.nightly-2020-12-20-055006 with the same node, "USE Method / Cluster" and "USE Method / Node" dashboard result for "Net Utilisation (Bytes Receive)" is the same in both page and almost the same result from prometheus query
Created attachment 1741291 [details] USE Method / Node dashboard
Created attachment 1741292 [details] USE Method / Cluster dashboard
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633