Bug 2219830 - irqbalance: silently failing to enforce IRQBALANCE_BANNED_CPULIST
Summary: irqbalance: silently failing to enforce IRQBALANCE_BANNED_CPULIST
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Robin Jarry
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On: 2184735
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-05 14:52 UTC by Robin Jarry
Modified: 2024-03-26 16:59 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
In RHOSP 17.1, there is a known issue of transient packet loss where hardware interrupt requests (IRQs) are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. + This issue is the result of provisioning large numbers of VFs during deployment. VFs need IRQs, each of which must be bound to a physical CPU. When there are not enough housekeeping CPUs to handle the capacity of IRQs, `irqbalance` fails to bind all of them and the IRQs overspill on isolated CPUs. + Workaround: You can try one or more of these actions: * Reduce the number of provisioned VFs to avoid unused VFs remaining bound to their default Linux driver. * Increase the number of housekeeping CPUs to handle all IRQs. * Force unused VF network interfaces down to avoid IRQs from interrupting isolated CPUs. * Disable multicast and broadcast traffic on unused, down VF network interfaces to avoid IRQs from interrupting isolated CPUs.
Clone Of: 2184735
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-26344 0 None None None 2023-07-05 15:37:56 UTC

Description Robin Jarry 2023-07-05 14:52:06 UTC
+++ This bug was initially created as a clone of Bug #2184735 +++

Hi folks,

When there is not enough room in the non-banned CPUs APIC, irqbalance seems to silently let irq affinities overspill on banned CPUs.

Here are a few traces to highlight the problem. The tool I am using (irqstat) is available here: https://pypi.org/project/linux-tools/

> # rpm -qa irqbalance
> irqbalance-1.9.0-3.el9.x86_64
>
> ~# grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance
> IRQBALANCE_BANNED_CPULIST=2-19,22-39
>
> ~# irqstat -n | grep -v '\<0\>$'
> CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> 0              114             114
> 1               43              34
> 20             184             179
> 21              61              51
>
> ~# echo 10 > /sys/class/net/ens2f0np0/device/sriov_numvfs
> ~# echo 10 > /sys/class/net/ens2f1np1/device/sriov_numvfs
> ~# echo 10 > /sys/class/net/eno3/device/sriov_numvfs
> ~# echo 10 > /sys/class/net/eno4/device/sriov_numvfs
>
> ~# irqstat -n | grep -v '\<0\>$'
> CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> 0              224             201
> 1               52              43
> 2               97              78
> 4               12              11
> 6               12              11
> 8               13              12
> 10               6               5
> 12              12              11
> 14              12              11
> 16              13              12
> 18              13              12
> 20             234             201
> 21              52              42
> 22              81              68
>
> ~# irqstat -c 2
> IRQ        AFFINITY  EFFECTIVE-CPU  DESCRIPTION
> 47                2              2  IR-PCI-MSI 12582920-edge i40e-eno1-TxRx-7
> 61                2              2  IR-PCI-MSI 12582934-edge i40e-eno1-TxRx-21
> 62                2              2  IR-PCI-MSI 12582935-edge i40e-eno1-TxRx-22
> 63                2              2  IR-PCI-MSI 12582936-edge i40e-eno1-TxRx-23
> 64                2              2  IR-PCI-MSI 12582937-edge i40e-eno1-TxRx-24
> 66                2              2  IR-PCI-MSI 12582939-edge i40e-eno1-TxRx-26
> 68                2              2  IR-PCI-MSI 12582941-edge i40e-eno1-TxRx-28
> 70                2              2  IR-PCI-MSI 12582943-edge i40e-eno1-TxRx-30
> 77                2              2  IR-PCI-MSI 12582950-edge i40e-eno1-TxRx-37
> 92                2              2  IR-PCI-MSI 12584960-edge i40e-0000:18:00.1:misc
> 97                2              2  IR-PCI-MSI 12584965-edge i40e-eno2-TxRx-4
> 102               2              2  IR-PCI-MSI 12584970-edge i40e-eno2-TxRx-9
> 128               2              2  IR-PCI-MSI 12584988-edge i40e-eno2-TxRx-27
> 133               2              2  IR-PCI-MSI 12584993-edge i40e-eno2-TxRx-32
> 134               2              2  IR-PCI-MSI 12584994-edge i40e-eno2-TxRx-33
> 139               2              2  IR-PCI-MSI 12584999-edge i40e-eno2-TxRx-38
> 141               2              2  IR-PCI-MSI 12585001-edge i40e-0000:18:00.1:fdir-TxRx-0
> 157               2              2  IR-PCI-MSI 49285123-edge ens3f1-rx-3
> 168               2              2  IR-PCI-MSI 49289220-edge ens3f3-rx-4
> 251               2              2  IR-PCI-MSI 12587020-edge i40e-eno3-TxRx-11
> 253               2              2  IR-PCI-MSI 12587022-edge i40e-eno3-TxRx-13
> 254               2              2  IR-PCI-MSI 12587023-edge i40e-eno3-TxRx-14
> 256               2              2  IR-PCI-MSI 12587025-edge i40e-eno3-TxRx-16
> 257               2              2  IR-PCI-MSI 12587026-edge i40e-eno3-TxRx-17
> 260               2              2  IR-PCI-MSI 12587029-edge i40e-eno3-TxRx-20
> 269               2              2  IR-PCI-MSI 12587038-edge i40e-eno3-TxRx-29
> 316       0,2,20,22              2  IR-PCI-MSI 49827840-edge mlx5_comp0@pci:0000:5f:01.2
> 317               2              2  IR-PCI-MSI 49827841-edge mlx5_comp1@pci:0000:5f:01.2
> 329               2              2  IR-PCI-MSI 49829889-edge mlx5_comp1@pci:0000:5f:01.3
> 352       0,2,20,22              2  IR-PCI-MSI 49833984-edge mlx5_comp0@pci:0000:5f:01.5
> 353               2              2  IR-PCI-MSI 49833985-edge mlx5_comp1@pci:0000:5f:01.5
> 364       0,2,20,22              2  IR-PCI-MSI 49836032-edge mlx5_comp0@pci:0000:5f:01.6
> 365               2              2  IR-PCI-MSI 49836033-edge mlx5_comp1@pci:0000:5f:01.6
> 380               2              2  IR-PCI-MSI 49807363-edge mlx5_comp3@pci:0000:5f:00.0
> 382               2              2  IR-PCI-MSI 49807365-edge mlx5_comp5@pci:0000:5f:00.0
> 384               2              2  IR-PCI-MSI 49807367-edge mlx5_comp7@pci:0000:5f:00.0
> 386               2              2  IR-PCI-MSI 49807369-edge mlx5_comp9@pci:0000:5f:00.0
> 387               2              2  IR-PCI-MSI 49807370-edge mlx5_comp10@pci:0000:5f:00.0
> 406               2              2  IR-PCI-MSI 49807389-edge mlx5_comp29@pci:0000:5f:00.0
> 407               2              2  IR-PCI-MSI 49807390-edge mlx5_comp30@pci:0000:5f:00.0
> 408               2              2  IR-PCI-MSI 49807391-edge mlx5_comp31@pci:0000:5f:00.0
> 413               2              2  IR-PCI-MSI 49807396-edge mlx5_comp36@pci:0000:5f:00.0
> 420               2              2  IR-PCI-MSI 12589058-edge i40e-eno4-TxRx-1
> 421               2              2  IR-PCI-MSI 12589059-edge i40e-eno4-TxRx-2
> 423               2              2  IR-PCI-MSI 12589061-edge i40e-eno4-TxRx-4
> 425               2              2  IR-PCI-MSI 12589063-edge i40e-eno4-TxRx-6
> 427               2              2  IR-PCI-MSI 12589065-edge i40e-eno4-TxRx-8
> 436               2              2  IR-PCI-MSI 12589074-edge i40e-eno4-TxRx-17
> 441               2              2  IR-PCI-MSI 12589079-edge i40e-eno4-TxRx-22
> 447               2              2  IR-PCI-MSI 12589085-edge i40e-eno4-TxRx-28
> 450               2              2  IR-PCI-MSI 12589088-edge i40e-eno4-TxRx-31
> 452               2              2  IR-PCI-MSI 12589090-edge i40e-eno4-TxRx-33
> 454               2              2  IR-PCI-MSI 12589092-edge i40e-eno4-TxRx-35
> 457               2              2  IR-PCI-MSI 12589095-edge i40e-eno4-TxRx-38
> 530               2              2  IR-PCI-MSI 49809418-edge mlx5_comp10@pci:0000:5f:00.1
> 531               2              2  IR-PCI-MSI 49809419-edge mlx5_comp11@pci:0000:5f:00.1
> 532               2              2  IR-PCI-MSI 49809420-edge mlx5_comp12@pci:0000:5f:00.1
> 533               2              2  IR-PCI-MSI 49809421-edge mlx5_comp13@pci:0000:5f:00.1
> 534               2              2  IR-PCI-MSI 49809422-edge mlx5_comp14@pci:0000:5f:00.1
> 535               2              2  IR-PCI-MSI 49809423-edge mlx5_comp15@pci:0000:5f:00.1
> 536               2              2  IR-PCI-MSI 49809424-edge mlx5_comp16@pci:0000:5f:00.1
> 537               2              2  IR-PCI-MSI 49809425-edge mlx5_comp17@pci:0000:5f:00.1
> 538               2              2  IR-PCI-MSI 49809426-edge mlx5_comp18@pci:0000:5f:00.1
> 539               2              2  IR-PCI-MSI 49809427-edge mlx5_comp19@pci:0000:5f:00.1
> 611               2              2  IR-PCI-MSI 49838081-edge mlx5_comp1@pci:0000:5f:01.7
> 622       0,2,20,22              2  IR-PCI-MSI 49840128-edge mlx5_comp0@pci:0000:5f:02.0
> 623               2              2  IR-PCI-MSI 49840129-edge mlx5_comp1@pci:0000:5f:02.0
> 635               2              2  IR-PCI-MSI 49842177-edge mlx5_comp1@pci:0000:5f:02.1
> 647               2              2  IR-PCI-MSI 49844225-edge mlx5_comp1@pci:0000:5f:02.2
> 659               2              2  IR-PCI-MSI 49846273-edge mlx5_comp1@pci:0000:5f:02.3
> 732       0,2,20,22              2  IR-PCI-MSI 12756995-edge iavf-eno3v5-TxRx-2
> 734       0,2,20,22              2  IR-PCI-MSI 12759040-edge iavf-0000:18:0a.6:mbx
> 762       0,2,20,22              2  IR-PCI-MSI 12824579-edge iavf-eno4v6-TxRx-2
> 767       0,2,20,22              2  IR-PCI-MSI 12814339-edge iavf-eno4v1-TxRx-2
> 772       0,2,20,22              2  IR-PCI-MSI 12826627-edge iavf-eno4v7-TxRx-2
> 784       0,2,20,22              2  IR-PCI-MSI 12830720-edge iavf-0000:18:0f.1:mbx
> 787       0,2,20,22              2  IR-PCI-MSI 12830723-edge iavf-eno4v9-TxRx-2
> 792       0,2,20,22              2  IR-PCI-MSI 12816387-edge iavf-eno4v2-TxRx-2

The irq affinities are overspilling on banned cpus. This probably is because the APIC from cpus 0 and 20 are full which is only a hardware limitation.

Reducing the span of banned cpus fixes the issue:

> # grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance
> IRQBALANCE_BANNED_CPULIST=4-19,24-39
>
> ~# irqstat -n | grep -v '\<0\>$'
> CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> 0              160             164
> 1               31              22
> 2              162             153
> 3               30              21
> 20             164             159
> 21              31              21
> 22             162             157
> 23              30              21

However, it would be nice if irqbalance could at least log a warning or an error that it failed to set the affinity for a specific irq.

> ~# echo 0,20 > /proc/irq/47/smp_affinity_list
> -bash: echo: write error: No space left on device

For the record, here is the platform overview:

> NUMA 0
> ======
> 
> Memory: 187GB
> 2MB hugepages: 0
> 1GB hugepages: 32
> 
> CPUs
> ----
> 
> Model name:                      Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> Cores IDs:
> 0,20    2,22    4,24    6,26    8,28    10,30   12,32   14,34   16,36   18,38
> 
> NICs
> ----
> 
> SLOT          DRIVER     IFNAME     MAC                LINK/STATE  SPEED   DEVICE
> 0000:18:00.0  i40e       eno1       e4:43:4b:48:6a:20  1/up        1Gb/s   Ethernet Controller X710 for 10GbE SFP+
> 0000:18:00.1  i40e       eno2       e4:43:4b:48:6a:21  1/up        1Gb/s   Ethernet Controller X710 for 10GbE SFP+
> 0000:18:00.2  i40e       eno3       e4:43:4b:48:6a:22  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> 0000:18:00.3  i40e       eno4       e4:43:4b:48:6a:23  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> 0000:3b:00.0  vfio-pci   -          -                  -/-         -       Ethernet Controller E810-C for QSFP
> 0000:3b:00.1  vfio-pci   -          -                  -/-         -       Ethernet Controller E810-C for QSFP
> 0000:5e:00.0  tg3        ens3f0     00:0a:f7:d9:e4:14  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> 0000:5e:00.1  tg3        ens3f1     00:0a:f7:d9:e4:15  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> 0000:5e:00.2  tg3        ens3f2     00:0a:f7:d9:e4:16  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> 0000:5e:00.3  tg3        ens3f3     00:0a:f7:d9:e4:17  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> 0000:5f:00.0  mlx5_core  ens2f0np0  04:3f:72:b8:be:6a  1/up        10Gb/s  MT27800 Family [ConnectX-5]
> 0000:5f:00.1  mlx5_core  ens2f1np1  04:3f:72:b8:be:6b  1/up        10Gb/s  MT27800 Family [ConnectX-5]
> 
> NUMA 1
> ======
> 
> Memory: 188GB
> 2MB hugepages: 0
> 1GB hugepages: 32
> 
> CPUs
> ----
> 
> Model name:                      Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> Cores IDs:
> 1,21    3,23    5,25    7,27    9,29    11,31   13,33   15,35   17,37   19,39
> 
> NICs
> ----
> 
> SLOT          DRIVER    IFNAME  MAC                LINK/STATE  SPEED   DEVICE
> 0000:af:00.0  i40e      ens4f0  f8:f2:1e:42:5d:70  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> 0000:af:00.1  i40e      ens4f1  f8:f2:1e:42:5d:70  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> 0000:af:00.2  vfio-pci  -       -                  -/-         -       Ethernet Controller X710 for 10GbE SFP+
> 0000:af:00.3  vfio-pci  -       -                  -/-         -       Ethernet Controller X710 for 10GbE SFP+

--- Additional comment from  on 2023-06-19 07:10:43 UTC ---

(In reply to Robin Jarry from comment #0)
> Hi folks,
> 
> When there is not enough room in the non-banned CPUs APIC, irqbalance seems
> to silently let irq affinities overspill on banned CPUs.
> 
> Here are a few traces to highlight the problem. The tool I am using
> (irqstat) is available here: https://pypi.org/project/linux-tools/
> 
> > # rpm -qa irqbalance
> > irqbalance-1.9.0-3.el9.x86_64
> >
> > ~# grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance
> > IRQBALANCE_BANNED_CPULIST=2-19,22-39
> >
> > ~# irqstat -n | grep -v '\<0\>$'
> > CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> > 0              114             114
> > 1               43              34
> > 20             184             179
> > 21              61              51
> >
> > ~# echo 10 > /sys/class/net/ens2f0np0/device/sriov_numvfs
> > ~# echo 10 > /sys/class/net/ens2f1np1/device/sriov_numvfs
> > ~# echo 10 > /sys/class/net/eno3/device/sriov_numvfs
> > ~# echo 10 > /sys/class/net/eno4/device/sriov_numvfs
> >
> > ~# irqstat -n | grep -v '\<0\>$'
> > CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> > 0              224             201
> > 1               52              43
> > 2               97              78
> > 4               12              11
> > 6               12              11
> > 8               13              12
> > 10               6               5
> > 12              12              11
> > 14              12              11
> > 16              13              12
> > 18              13              12
> > 20             234             201
> > 21              52              42
> > 22              81              68
> >
> > ~# irqstat -c 2
> > IRQ        AFFINITY  EFFECTIVE-CPU  DESCRIPTION
> > 47                2              2  IR-PCI-MSI 12582920-edge i40e-eno1-TxRx-7
> > 61                2              2  IR-PCI-MSI 12582934-edge i40e-eno1-TxRx-21
> > 62                2              2  IR-PCI-MSI 12582935-edge i40e-eno1-TxRx-22
> > 63                2              2  IR-PCI-MSI 12582936-edge i40e-eno1-TxRx-23
> > 64                2              2  IR-PCI-MSI 12582937-edge i40e-eno1-TxRx-24
> > 66                2              2  IR-PCI-MSI 12582939-edge i40e-eno1-TxRx-26
> > 68                2              2  IR-PCI-MSI 12582941-edge i40e-eno1-TxRx-28
> > 70                2              2  IR-PCI-MSI 12582943-edge i40e-eno1-TxRx-30
> > 77                2              2  IR-PCI-MSI 12582950-edge i40e-eno1-TxRx-37
> > 92                2              2  IR-PCI-MSI 12584960-edge i40e-0000:18:00.1:misc
> > 97                2              2  IR-PCI-MSI 12584965-edge i40e-eno2-TxRx-4
> > 102               2              2  IR-PCI-MSI 12584970-edge i40e-eno2-TxRx-9
> > 128               2              2  IR-PCI-MSI 12584988-edge i40e-eno2-TxRx-27
> > 133               2              2  IR-PCI-MSI 12584993-edge i40e-eno2-TxRx-32
> > 134               2              2  IR-PCI-MSI 12584994-edge i40e-eno2-TxRx-33
> > 139               2              2  IR-PCI-MSI 12584999-edge i40e-eno2-TxRx-38
> > 141               2              2  IR-PCI-MSI 12585001-edge i40e-0000:18:00.1:fdir-TxRx-0
> > 157               2              2  IR-PCI-MSI 49285123-edge ens3f1-rx-3
> > 168               2              2  IR-PCI-MSI 49289220-edge ens3f3-rx-4
> > 251               2              2  IR-PCI-MSI 12587020-edge i40e-eno3-TxRx-11
> > 253               2              2  IR-PCI-MSI 12587022-edge i40e-eno3-TxRx-13
> > 254               2              2  IR-PCI-MSI 12587023-edge i40e-eno3-TxRx-14
> > 256               2              2  IR-PCI-MSI 12587025-edge i40e-eno3-TxRx-16
> > 257               2              2  IR-PCI-MSI 12587026-edge i40e-eno3-TxRx-17
> > 260               2              2  IR-PCI-MSI 12587029-edge i40e-eno3-TxRx-20
> > 269               2              2  IR-PCI-MSI 12587038-edge i40e-eno3-TxRx-29
> > 316       0,2,20,22              2  IR-PCI-MSI 49827840-edge mlx5_comp0@pci:0000:5f:01.2
> > 317               2              2  IR-PCI-MSI 49827841-edge mlx5_comp1@pci:0000:5f:01.2
> > 329               2              2  IR-PCI-MSI 49829889-edge mlx5_comp1@pci:0000:5f:01.3
> > 352       0,2,20,22              2  IR-PCI-MSI 49833984-edge mlx5_comp0@pci:0000:5f:01.5
> > 353               2              2  IR-PCI-MSI 49833985-edge mlx5_comp1@pci:0000:5f:01.5
> > 364       0,2,20,22              2  IR-PCI-MSI 49836032-edge mlx5_comp0@pci:0000:5f:01.6
> > 365               2              2  IR-PCI-MSI 49836033-edge mlx5_comp1@pci:0000:5f:01.6
> > 380               2              2  IR-PCI-MSI 49807363-edge mlx5_comp3@pci:0000:5f:00.0
> > 382               2              2  IR-PCI-MSI 49807365-edge mlx5_comp5@pci:0000:5f:00.0
> > 384               2              2  IR-PCI-MSI 49807367-edge mlx5_comp7@pci:0000:5f:00.0
> > 386               2              2  IR-PCI-MSI 49807369-edge mlx5_comp9@pci:0000:5f:00.0
> > 387               2              2  IR-PCI-MSI 49807370-edge mlx5_comp10@pci:0000:5f:00.0
> > 406               2              2  IR-PCI-MSI 49807389-edge mlx5_comp29@pci:0000:5f:00.0
> > 407               2              2  IR-PCI-MSI 49807390-edge mlx5_comp30@pci:0000:5f:00.0
> > 408               2              2  IR-PCI-MSI 49807391-edge mlx5_comp31@pci:0000:5f:00.0
> > 413               2              2  IR-PCI-MSI 49807396-edge mlx5_comp36@pci:0000:5f:00.0
> > 420               2              2  IR-PCI-MSI 12589058-edge i40e-eno4-TxRx-1
> > 421               2              2  IR-PCI-MSI 12589059-edge i40e-eno4-TxRx-2
> > 423               2              2  IR-PCI-MSI 12589061-edge i40e-eno4-TxRx-4
> > 425               2              2  IR-PCI-MSI 12589063-edge i40e-eno4-TxRx-6
> > 427               2              2  IR-PCI-MSI 12589065-edge i40e-eno4-TxRx-8
> > 436               2              2  IR-PCI-MSI 12589074-edge i40e-eno4-TxRx-17
> > 441               2              2  IR-PCI-MSI 12589079-edge i40e-eno4-TxRx-22
> > 447               2              2  IR-PCI-MSI 12589085-edge i40e-eno4-TxRx-28
> > 450               2              2  IR-PCI-MSI 12589088-edge i40e-eno4-TxRx-31
> > 452               2              2  IR-PCI-MSI 12589090-edge i40e-eno4-TxRx-33
> > 454               2              2  IR-PCI-MSI 12589092-edge i40e-eno4-TxRx-35
> > 457               2              2  IR-PCI-MSI 12589095-edge i40e-eno4-TxRx-38
> > 530               2              2  IR-PCI-MSI 49809418-edge mlx5_comp10@pci:0000:5f:00.1
> > 531               2              2  IR-PCI-MSI 49809419-edge mlx5_comp11@pci:0000:5f:00.1
> > 532               2              2  IR-PCI-MSI 49809420-edge mlx5_comp12@pci:0000:5f:00.1
> > 533               2              2  IR-PCI-MSI 49809421-edge mlx5_comp13@pci:0000:5f:00.1
> > 534               2              2  IR-PCI-MSI 49809422-edge mlx5_comp14@pci:0000:5f:00.1
> > 535               2              2  IR-PCI-MSI 49809423-edge mlx5_comp15@pci:0000:5f:00.1
> > 536               2              2  IR-PCI-MSI 49809424-edge mlx5_comp16@pci:0000:5f:00.1
> > 537               2              2  IR-PCI-MSI 49809425-edge mlx5_comp17@pci:0000:5f:00.1
> > 538               2              2  IR-PCI-MSI 49809426-edge mlx5_comp18@pci:0000:5f:00.1
> > 539               2              2  IR-PCI-MSI 49809427-edge mlx5_comp19@pci:0000:5f:00.1
> > 611               2              2  IR-PCI-MSI 49838081-edge mlx5_comp1@pci:0000:5f:01.7
> > 622       0,2,20,22              2  IR-PCI-MSI 49840128-edge mlx5_comp0@pci:0000:5f:02.0
> > 623               2              2  IR-PCI-MSI 49840129-edge mlx5_comp1@pci:0000:5f:02.0
> > 635               2              2  IR-PCI-MSI 49842177-edge mlx5_comp1@pci:0000:5f:02.1
> > 647               2              2  IR-PCI-MSI 49844225-edge mlx5_comp1@pci:0000:5f:02.2
> > 659               2              2  IR-PCI-MSI 49846273-edge mlx5_comp1@pci:0000:5f:02.3
> > 732       0,2,20,22              2  IR-PCI-MSI 12756995-edge iavf-eno3v5-TxRx-2
> > 734       0,2,20,22              2  IR-PCI-MSI 12759040-edge iavf-0000:18:0a.6:mbx
> > 762       0,2,20,22              2  IR-PCI-MSI 12824579-edge iavf-eno4v6-TxRx-2
> > 767       0,2,20,22              2  IR-PCI-MSI 12814339-edge iavf-eno4v1-TxRx-2
> > 772       0,2,20,22              2  IR-PCI-MSI 12826627-edge iavf-eno4v7-TxRx-2
> > 784       0,2,20,22              2  IR-PCI-MSI 12830720-edge iavf-0000:18:0f.1:mbx
> > 787       0,2,20,22              2  IR-PCI-MSI 12830723-edge iavf-eno4v9-TxRx-2
> > 792       0,2,20,22              2  IR-PCI-MSI 12816387-edge iavf-eno4v2-TxRx-2
> 
> The irq affinities are overspilling on banned cpus. This probably is because
> the APIC from cpus 0 and 20 are full which is only a hardware limitation.
> 
> Reducing the span of banned cpus fixes the issue:
> 
> > # grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance
> > IRQBALANCE_BANNED_CPULIST=4-19,24-39
> >
> > ~# irqstat -n | grep -v '\<0\>$'
> > CPU  AFFINITY-IRQs  EFFECTIVE-IRQs
> > 0              160             164
> > 1               31              22
> > 2              162             153
> > 3               30              21
> > 20             164             159
> > 21              31              21
> > 22             162             157
> > 23              30              21
> 
> However, it would be nice if irqbalance could at least log a warning or an
> error that it failed to set the affinity for a specific irq.
> 
> > ~# echo 0,20 > /proc/irq/47/smp_affinity_list
> > -bash: echo: write error: No space left on device
> 
> For the record, here is the platform overview:
> 
> > NUMA 0
> > ======
> > 
> > Memory: 187GB
> > 2MB hugepages: 0
> > 1GB hugepages: 32
> > 
> > CPUs
> > ----
> > 
> > Model name:                      Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> > Cores IDs:
> > 0,20    2,22    4,24    6,26    8,28    10,30   12,32   14,34   16,36   18,38
> > 
> > NICs
> > ----
> > 
> > SLOT          DRIVER     IFNAME     MAC                LINK/STATE  SPEED   DEVICE
> > 0000:18:00.0  i40e       eno1       e4:43:4b:48:6a:20  1/up        1Gb/s   Ethernet Controller X710 for 10GbE SFP+
> > 0000:18:00.1  i40e       eno2       e4:43:4b:48:6a:21  1/up        1Gb/s   Ethernet Controller X710 for 10GbE SFP+
> > 0000:18:00.2  i40e       eno3       e4:43:4b:48:6a:22  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> > 0000:18:00.3  i40e       eno4       e4:43:4b:48:6a:23  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> > 0000:3b:00.0  vfio-pci   -          -                  -/-         -       Ethernet Controller E810-C for QSFP
> > 0000:3b:00.1  vfio-pci   -          -                  -/-         -       Ethernet Controller E810-C for QSFP
> > 0000:5e:00.0  tg3        ens3f0     00:0a:f7:d9:e4:14  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> > 0000:5e:00.1  tg3        ens3f1     00:0a:f7:d9:e4:15  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> > 0000:5e:00.2  tg3        ens3f2     00:0a:f7:d9:e4:16  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> > 0000:5e:00.3  tg3        ens3f3     00:0a:f7:d9:e4:17  0/down      -       NetXtreme BCM5719 Gigabit Ethernet PCIe
> > 0000:5f:00.0  mlx5_core  ens2f0np0  04:3f:72:b8:be:6a  1/up        10Gb/s  MT27800 Family [ConnectX-5]
> > 0000:5f:00.1  mlx5_core  ens2f1np1  04:3f:72:b8:be:6b  1/up        10Gb/s  MT27800 Family [ConnectX-5]
> > 
> > NUMA 1
> > ======
> > 
> > Memory: 188GB
> > 2MB hugepages: 0
> > 1GB hugepages: 32
> > 
> > CPUs
> > ----
> > 
> > Model name:                      Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz
> > Cores IDs:
> > 1,21    3,23    5,25    7,27    9,29    11,31   13,33   15,35   17,37   19,39
> > 
> > NICs
> > ----
> > 
> > SLOT          DRIVER    IFNAME  MAC                LINK/STATE  SPEED   DEVICE
> > 0000:af:00.0  i40e      ens4f0  f8:f2:1e:42:5d:70  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> > 0000:af:00.1  i40e      ens4f1  f8:f2:1e:42:5d:70  1/up        10Gb/s  Ethernet Controller X710 for 10GbE SFP+
> > 0000:af:00.2  vfio-pci  -       -                  -/-         -       Ethernet Controller X710 for 10GbE SFP+
> > 0000:af:00.3  vfio-pci  -       -                  -/-         -       Ethernet Controller X710 for 10GbE SFP+

Hi Robin,

Thanks for reporting the issue, I think it is reasonable to have a notification when irqbalance overspill IRQs on banned cpus. Just for curiousity, which beaker machine did you use for the testing? I failed to simulate a system which have enough devices' IRQs to overspill using qemu.

Thanks,
Tao Liu

--- Additional comment from Robin Jarry on 2023-06-19 07:37:08 UTC ---

Hi there, any machine with SRIOV capable PCI devices should be enough to produce more than 224 IRQs. I don't think you will be able to test this in QEMU.

--- Additional comment from  on 2023-07-04 02:34:39 UTC ---

Patch[1] posted upstream

[1]: https://github.com/Irqbalance/irqbalance/pull/265

--- Additional comment from Robin Jarry on 2023-07-05 14:47:01 UTC ---

I just realized that this issue actually hides another regression.

This patch https://github.com/Irqbalance/irqbalance/commit/55c5c321c73e4c9b54e041ba8c7d542598685bae (included in irqbalance 1.7.0) causes any failure to enforce smp_affinity to ban the IRQ for the whole life of the process. APIC being out of space is a transient issue but irqbalance will never try again to move the interrupt to another CPU unless it is restarted.

I have submitted another pull request here https://github.com/Irqbalance/irqbalance/pull/266.

Waiting for feedback.


Note You need to log in before you can comment on or make changes to this bug.