Bug 2181255

Summary: Insights-client harms network performance on Ice-Lake systems
Product: Red Hat Enterprise Linux 9 Reporter: Adam Okuliar <aokuliar>
Component: insights-clientAssignee: CSI Client Tools Bugs <csi-client-tools-bugs>
Status: CLOSED MIGRATED QA Contact: Red Hat subscription-manager QE Team <rhsm-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 9.3CC: cmarinea, fjansen, jhladky, redakkan, stomsa
Target Milestone: rcKeywords: MigratedToJIRA, Performance, Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-07 13:25:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Adam Okuliar 2023-03-23 13:01:29 UTC
Description of problem:
Installation of insights-client on RHEL-9.3 has negative effect on throughput and cpu utilisation on Intel Ice-lake systems with Mellanox Connectx6 NIC

Version-Release number of selected component (if applicable):
insights-client-3.1.7-12.el9.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install RHEL-9.3 run 16 parallel iperf instances, to saturate full 200gbit link bandwidth

iperf3 --json --client 172.16.1.26 --time 30 --port 5201  --affinity 0,0 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5202  --affinity 1,1 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5203  --affinity 2,2 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5204  --affinity 3,3 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5205  --affinity 4,4 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5206  --affinity 5,5 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5207  --affinity 6,6 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5208  --affinity 7,7 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5209  --affinity 8,8 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5210  --affinity 9,9 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5211  --affinity 10,10 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5212  --affinity 11,11 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5213  --affinity 12,12 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5214  --affinity 13,13 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5215  --affinity 14,14 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5216  --affinity 15,15 --parallel 8

total_throughput=126744.93
efficiency:
  sender 38495.04
  receiver 8083.17
streams:
  receiver:mlx5_core_1:inet4:0:5201 throughput=1631.35 retransmits=5202.0
  receiver:mlx5_core_1:inet4:0:5202 throughput=14599.66 retransmits=23966.0
  receiver:mlx5_core_1:inet4:0:5203 throughput=1808.24 retransmits=6069.0
  receiver:mlx5_core_1:inet4:0:5204 throughput=1217.48 retransmits=6514.0
  receiver:mlx5_core_1:inet4:0:5205 throughput=2048.64 retransmits=9765.0
  receiver:mlx5_core_1:inet4:0:5206 throughput=1728.49 retransmits=4694.0
  receiver:mlx5_core_1:inet4:0:5207 throughput=2703.95 retransmits=8629.0
  receiver:mlx5_core_1:inet4:0:5208 throughput=7688.01 retransmits=12064.0
  receiver:mlx5_core_1:inet4:0:5209 throughput=1671.44 retransmits=4469.0
  receiver:mlx5_core_1:inet4:0:5210 throughput=15176.81 retransmits=19171.0
  receiver:mlx5_core_1:inet4:0:5211 throughput=17138.93 retransmits=12477.0
  receiver:mlx5_core_1:inet4:0:5212 throughput=20369.80 retransmits=10101.0
  receiver:mlx5_core_1:inet4:0:5213 throughput=17628.61 retransmits=20087.0
  receiver:mlx5_core_1:inet4:0:5214 throughput=1695.05 retransmits=4854.0
  receiver:mlx5_core_1:inet4:0:5215 throughput=13487.04 retransmits=13873.0
  receiver:mlx5_core_1:inet4:0:5216 throughput=6151.43 retransmits=8465.0
sender:
  sender cpu  0 total=13.33 usr= 0.07 sys= 3.80 irq= 1.03 soft= 8.44
  sender cpu  1 total=30.32 usr= 0.30 sys=23.21 irq= 0.90 soft= 5.91
  sender cpu  2 total=12.54 usr= 0.07 sys= 3.75 irq= 0.93 soft= 7.79
  sender cpu  3 total= 8.16 usr= 0.07 sys= 2.94 irq= 0.63 soft= 4.53
  sender cpu  4 total= 8.66 usr= 0.07 sys= 3.53 irq= 0.57 soft= 4.50
  sender cpu  5 total=11.77 usr= 0.03 sys= 3.48 irq= 0.96 soft= 7.30
  sender cpu  6 total=11.34 usr= 0.07 sys= 4.53 irq= 0.77 soft= 5.97
  sender cpu  7 total=16.27 usr= 0.13 sys=11.85 irq= 0.57 soft= 3.71
  sender cpu  8 total= 8.77 usr= 0.03 sys= 3.12 irq= 0.73 soft= 4.88
  sender cpu  9 total=33.26 usr= 0.23 sys=23.45 irq= 1.04 soft= 8.53
  sender cpu 10 total=39.45 usr= 0.30 sys=26.60 irq= 1.37 soft=11.18
  sender cpu 11 total=37.25 usr= 0.33 sys=31.14 irq= 0.80 soft= 4.98
  sender cpu 12 total=37.65 usr= 0.33 sys=27.68 irq= 1.10 soft= 8.54
  sender cpu 13 total= 9.59 usr= 0.07 sys= 3.35 irq= 0.76 soft= 5.41
  sender cpu 14 total=29.88 usr= 0.30 sys=20.92 irq= 0.94 soft= 7.72
  sender cpu 15 total=19.85 usr= 0.13 sys= 9.86 irq= 1.47 soft= 8.39
receiver:
  receiver cpu  0 total=99.40 usr= 0.03 sys= 5.77 irq= 0.23 soft=93.37
  receiver cpu  1 total=75.41 usr= 0.92 sys=55.28 irq= 1.03 soft=18.18
  receiver cpu  2 total=99.43 usr= 0.07 sys= 6.74 irq= 0.23 soft=92.40
  receiver cpu  3 total=99.43 usr= 0.00 sys= 5.93 irq= 0.37 soft=93.13
  receiver cpu  4 total=99.43 usr= 0.03 sys=10.73 irq= 0.50 soft=88.17
  receiver cpu  5 total=99.40 usr= 0.03 sys= 5.96 irq= 0.23 soft=93.17
  receiver cpu  6 total=99.40 usr= 0.03 sys=13.60 irq= 0.43 soft=85.34
  receiver cpu  7 total=99.40 usr= 0.10 sys=32.87 irq= 0.70 soft=65.73
  receiver cpu  8 total=99.43 usr= 0.00 sys= 5.97 irq= 0.20 soft=93.26
  receiver cpu  9 total=99.40 usr= 0.27 sys=53.80 irq= 0.97 soft=44.37
  receiver cpu 10 total=99.43 usr= 0.30 sys=59.75 irq= 0.97 soft=38.41
  receiver cpu 11 total=99.40 usr= 0.33 sys=69.07 irq= 0.97 soft=29.03
  receiver cpu 12 total=99.40 usr= 0.33 sys=58.47 irq= 0.87 soft=39.73
  receiver cpu 13 total=99.43 usr= 0.03 sys= 6.17 irq= 0.23 soft=93.00
  receiver cpu 14 total=99.43 usr= 0.17 sys=48.78 irq= 1.07 soft=49.42
  receiver cpu 15 total=99.43 usr= 0.07 sys=27.38 irq= 1.00 soft=70.99

* We have only 126 gbit throughput over 200gbit network link, bottleneck is receiver, which has all its cpus utilized to 100% *

2. dnf -y remove insights client && reboot
3. run the same test scenario again:

perf3 --json --client 172.16.1.26 --time 30 --port 5201  --affinity 0,0 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5202  --affinity 1,1 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5203  --affinity 2,2 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5204  --affinity 3,3 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5205  --affinity 4,4 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5206  --affinity 5,5 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5207  --affinity 6,6 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5208  --affinity 7,7 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5209  --affinity 8,8 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5210  --affinity 9,9 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5211  --affinity 10,10 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5212  --affinity 11,11 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5213  --affinity 12,12 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5214  --affinity 13,13 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5215  --affinity 14,14 --parallel 8
iperf3 --json --client 172.16.1.26 --time 30 --port 5216  --affinity 15,15 --parallel 8
mpstats and iperfs started
iperfs results stored
mpstats results stored
total_throughput=169832.03
efficiency:
  sender 31424.19
  receiver 14189.21
streams:
  receiver:mlx5_core_1:inet4:0:5201 throughput=9843.62 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5202 throughput=10758.80 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5203 throughput=8418.24 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5204 throughput=9284.23 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5205 throughput=7561.73 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5206 throughput=7176.14 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5207 throughput=10880.63 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5208 throughput=15293.55 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5209 throughput=13089.26 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5210 throughput=12018.62 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5211 throughput=11918.24 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5212 throughput=12073.78 retransmits=0.0
  receiver:mlx5_core_1:inet4:0:5213 throughput=11426.15 retransmits=326.0
  receiver:mlx5_core_1:inet4:0:5214 throughput=13600.04 retransmits=127.0
  receiver:mlx5_core_1:inet4:0:5215 throughput=11189.18 retransmits=73.0
  receiver:mlx5_core_1:inet4:0:5216 throughput=5299.83 retransmits=0.0
sender:
  sender cpu  0 total=34.26 usr= 0.20 sys=18.04 irq= 1.81 soft=14.21
  sender cpu  1 total=34.52 usr= 0.20 sys=19.46 irq= 1.68 soft=13.18
  sender cpu  2 total=27.13 usr= 0.17 sys=15.72 irq= 1.14 soft=10.10
  sender cpu  3 total=28.55 usr= 0.20 sys=16.64 irq= 1.27 soft=10.43
  sender cpu  4 total=26.56 usr= 0.13 sys=13.33 irq= 1.34 soft=11.75
  sender cpu  5 total=27.34 usr= 0.23 sys=13.19 irq= 1.51 soft=12.42
  sender cpu  6 total=35.61 usr= 0.23 sys=19.95 irq= 1.94 soft=13.49
  sender cpu  7 total=44.25 usr= 0.27 sys=27.46 irq= 1.68 soft=14.85
  sender cpu  8 total=41.92 usr= 0.23 sys=23.61 irq= 1.74 soft=16.33
  sender cpu  9 total=37.92 usr= 0.27 sys=21.68 irq= 1.91 soft=14.07
  sender cpu 10 total=37.65 usr= 0.24 sys=21.11 irq= 1.95 soft=14.35
  sender cpu 11 total=36.53 usr= 0.20 sys=21.44 irq= 1.65 soft=13.24
  sender cpu 12 total=32.74 usr= 0.20 sys=20.27 irq= 1.37 soft=10.89
  sender cpu 13 total=40.21 usr= 0.23 sys=24.66 irq= 1.64 soft=13.67
  sender cpu 14 total=32.72 usr= 0.23 sys=20.02 irq= 1.27 soft=11.20
  sender cpu 15 total=21.21 usr= 0.17 sys= 9.70 irq= 1.54 soft= 9.80
receiver:
  receiver cpu  0 total=92.86 usr= 0.44 sys=36.63 irq= 2.59 soft=53.20
  receiver cpu  1 total=83.81 usr= 0.59 sys=36.86 irq= 2.36 soft=44.00
  receiver cpu  2 total=62.56 usr= 0.91 sys=31.71 irq= 1.96 soft=27.98
  receiver cpu  3 total=58.83 usr= 1.10 sys=34.14 irq= 1.83 soft=21.76
  receiver cpu  4 total=65.92 usr= 0.63 sys=28.11 irq= 2.18 soft=35.00
  receiver cpu  5 total=66.68 usr= 0.60 sys=27.38 irq= 2.18 soft=36.52
  receiver cpu  6 total=70.85 usr= 1.01 sys=37.31 irq= 2.12 soft=30.41
  receiver cpu  7 total=92.89 usr= 0.82 sys=49.44 irq= 2.32 soft=40.31
  receiver cpu  8 total=79.83 usr= 0.95 sys=44.38 irq= 1.94 soft=32.56
  receiver cpu  9 total=86.96 usr= 0.70 sys=41.02 irq= 2.41 soft=42.84
  receiver cpu 10 total=76.49 usr= 1.03 sys=41.05 irq= 2.06 soft=32.35
  receiver cpu 11 total=67.88 usr= 1.16 sys=39.81 irq= 1.63 soft=25.28
  receiver cpu 12 total=87.55 usr= 0.69 sys=40.29 irq= 2.07 soft=44.50
  receiver cpu 13 total=85.56 usr= 0.97 sys=45.82 irq= 2.26 soft=36.50
  receiver cpu 14 total=77.51 usr= 0.74 sys=37.87 irq= 1.88 soft=37.03
  receiver cpu 15 total=37.97 usr= 0.68 sys=21.07 irq= 1.89 soft=14.33

* we have achieved 170gbps of throughput even with some spare receiver cpu time*

Actual results:
Performance of RHEL-9.3 is worse with insights-client installed, than without insights-client

Expected results:
Insights client does not have negative impact on cpu performance

Additional info:
We believe that this regression is caused by insights enabling cgroup v2 cpu controller. We have very similar BZ related to systemd:
https://bugzilla.redhat.com/show_bug.cgi?id=2173996

Comment 2 Adam Okuliar 2023-05-31 11:17:21 UTC
RHEL-8.8.0 seems to be affected in same way. Performance without insights-client:

2023-05-30 15:27:17,961 [INFO] Experiment: ipv4_mlx5_sixteen_stream_best_node
2023-05-30 15:27:17,961 [INFO]   throughput_mean = 155326.48 stdev_pct =  8.50 stdev = 13203.68
2023-05-30 15:27:17,961 [INFO] formir1 (1984 cpus):
2023-05-30 15:27:17,961 [INFO]   used cpus       = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
2023-05-30 15:27:17,961 [INFO]   used cpus count =        16
2023-05-30 15:27:17,961 [INFO]   cpu_mean        =    466.59 stdev_pct = 17.52 stdev =   81.73
2023-05-30 15:27:17,961 [INFO]   efficiency_mean =  33755.22 stdev_pct =  8.79 stdev = 2967.72
2023-05-30 15:27:17,961 [INFO] formir2 (1984 cpus):
2023-05-30 15:27:17,961 [INFO]   used cpus       = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 54]
2023-05-30 15:27:17,962 [INFO]   used cpus count =        17
2023-05-30 15:27:17,962 [INFO]   cpu_mean        =   1396.08 stdev_pct = 17.04 stdev =  237.89
2023-05-30 15:27:17,962 [INFO]   efficiency_mean =  11598.33 stdev_pct = 26.61 stdev = 3086.80

Performance with insights-client installed:

2023-05-30 16:08:42,084 [INFO] Experiment: ipv4_mlx5_sixteen_stream_best_node
2023-05-30 16:08:42,084 [INFO]   throughput_mean = 129759.08 stdev_pct = 11.54 stdev = 14973.78
2023-05-30 16:08:42,084 [INFO] formir1 (1984 cpus):
2023-05-30 16:08:42,084 [INFO]   used cpus       = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
2023-05-30 16:08:42,084 [INFO]   used cpus count =        16
2023-05-30 16:08:42,084 [INFO]   cpu_mean        =    345.81 stdev_pct = 22.94 stdev =   79.32
2023-05-30 16:08:42,084 [INFO]   efficiency_mean =  38119.77 stdev_pct =  7.60 stdev = 2898.78
2023-05-30 16:08:42,084 [INFO] formir2 (1984 cpus):
2023-05-30 16:08:42,084 [INFO]   used cpus       = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 28]
2023-05-30 16:08:42,084 [INFO]   used cpus count =        17
2023-05-30 16:08:42,084 [INFO]   cpu_mean        =   1542.91 stdev_pct =  9.95 stdev =  153.46
2023-05-30 16:08:42,084 [INFO]   efficiency_mean =   8620.58 stdev_pct = 27.62 stdev = 2381.04

Comment 3 Pino Toscano 2023-06-01 14:03:36 UTC
Hi Adam,

thanks for the feedback about RHEL 8. One question we had when reviewing this: is there a way to reproduce it easily without any particular hardware? Maybe in a VM, even with a generic enough config to notice some small difference?

Comment 4 Adam Okuliar 2023-06-06 13:00:04 UTC
Hello Pino,

unfortunately, short answer is no. Reproducing anything performance related in virtualised environment in notoriously hard. That is the main reason why in kernel team we still need to go through the pain of using physical hardware. Even if there would be reliable reproducer in virtualised environment, It would require hypervisor with 8-16 CPU cores, so not a chance to reproduce this on any kind of laptop.

This may sound bit of disappointing, and I understand the pain which might developer feel to do remote debugging on some particular remote system. I will try to be helpful as much I can, will happily provide you access to the affected system any time you want. Also if you need any kind of assistance (reboot; re-installation) do not hesitate to contact me.

Thanks for understanding.
Adam

Comment 8 RHEL Program Management 2023-09-07 13:22:24 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 9 RHEL Program Management 2023-09-07 13:25:51 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues.