Bug 2173996

Summary: systemd update causes network performance regression
Product: Red Hat Enterprise Linux 9 Reporter: Adam Okuliar <aokuliar>
Component: systemdAssignee: Michal Sekletar <msekleta>
Status: CLOSED ERRATA QA Contact: Frantisek Sumsal <fsumsal>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 9.2CC: aquini, dtardon, fweimer, jamacku, jhladky, jmario, msekleta, pvlasin, systemd-maint-list, systemd-maint
Target Milestone: rcKeywords: Performance, Regression, Triaged
Target Release: ---Flags: pm-rhel: mirror+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-252-10.el9_2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2176899 (view as bug list) Environment:
Last Closed: 2023-05-09 08:22:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2176899    

Comment 1 Adam Okuliar 2023-02-28 16:27:27 UTC
Please note that this regression is visible only with specific conditions.

Hardware used:
Intel IceLake cpus (EPYCs are unaffected) 
Mellanox Connectx-6 200gbit nic (100g nics are not affected)

Only when running 16 parallel iperf streams on 16 core cpus. IRQs are pinned, exact command sequence used:

# tuna --irqs=mlx5* --cpus=0-15 --spread

# iperf3 --json --client 172.16.1.26 --time 30 --port 5201  --affinity 0,0 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5202  --affinity 1,1 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5203  --affinity 2,2 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5204  --affinity 3,3 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5205  --affinity 4,4 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5206  --affinity 5,5 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5207  --affinity 6,6 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5208  --affinity 7,7 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5209  --affinity 8,8 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5210  --affinity 9,9 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5211  --affinity 10,10 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5212  --affinity 11,11 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5213  --affinity 12,12 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5214  --affinity 13,13 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5215  --affinity 14,14 --parallel 8
# iperf3 --json --client 172.16.1.26 --time 30 --port 5216  --affinity 15,15 --parallel 8

Comment 4 Michal Sekletar 2023-03-03 09:17:22 UTC
The performance regression is caused by ksoftirqd eating up a lot of CPU time in comparison with the case when NIC bandwidth is expected. This is a systemd change which appeared in between version v250 and v252 that has this side effect on ksoftirqd.

https://github.com/systemd/systemd/commit/b8df7f8629cb310beac982a4779b27eabe5362c6

After reverting the change the performance recovers. This change effectively enables CPU cgroup controller globally which adds some overhead on the kernel side and that exhibits in the test case. I have some intuitive understanding why this happens but more explanation from kernel cgroup expert would be welcome. On systemd side we will revert the change until we have a full understanding of the performance regression and maybe even some fixes on kernel side.

Comment 11 errata-xmlrpc 2023-05-09 08:22:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemd bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:2531