+++ This bug was initially created as a clone of Bug #2184735 +++ Hi folks, When there is not enough room in the non-banned CPUs APIC, irqbalance seems to silently let irq affinities overspill on banned CPUs. Here are a few traces to highlight the problem. The tool I am using (irqstat) is available here: https://pypi.org/project/linux-tools/ > # rpm -qa irqbalance > irqbalance-1.9.0-3.el9.x86_64 > > ~# grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance > IRQBALANCE_BANNED_CPULIST=2-19,22-39 > > ~# irqstat -n | grep -v '\<0\>$' > CPU AFFINITY-IRQs EFFECTIVE-IRQs > 0 114 114 > 1 43 34 > 20 184 179 > 21 61 51 > > ~# echo 10 > /sys/class/net/ens2f0np0/device/sriov_numvfs > ~# echo 10 > /sys/class/net/ens2f1np1/device/sriov_numvfs > ~# echo 10 > /sys/class/net/eno3/device/sriov_numvfs > ~# echo 10 > /sys/class/net/eno4/device/sriov_numvfs > > ~# irqstat -n | grep -v '\<0\>$' > CPU AFFINITY-IRQs EFFECTIVE-IRQs > 0 224 201 > 1 52 43 > 2 97 78 > 4 12 11 > 6 12 11 > 8 13 12 > 10 6 5 > 12 12 11 > 14 12 11 > 16 13 12 > 18 13 12 > 20 234 201 > 21 52 42 > 22 81 68 > > ~# irqstat -c 2 > IRQ AFFINITY EFFECTIVE-CPU DESCRIPTION > 47 2 2 IR-PCI-MSI 12582920-edge i40e-eno1-TxRx-7 > 61 2 2 IR-PCI-MSI 12582934-edge i40e-eno1-TxRx-21 > 62 2 2 IR-PCI-MSI 12582935-edge i40e-eno1-TxRx-22 > 63 2 2 IR-PCI-MSI 12582936-edge i40e-eno1-TxRx-23 > 64 2 2 IR-PCI-MSI 12582937-edge i40e-eno1-TxRx-24 > 66 2 2 IR-PCI-MSI 12582939-edge i40e-eno1-TxRx-26 > 68 2 2 IR-PCI-MSI 12582941-edge i40e-eno1-TxRx-28 > 70 2 2 IR-PCI-MSI 12582943-edge i40e-eno1-TxRx-30 > 77 2 2 IR-PCI-MSI 12582950-edge i40e-eno1-TxRx-37 > 92 2 2 IR-PCI-MSI 12584960-edge i40e-0000:18:00.1:misc > 97 2 2 IR-PCI-MSI 12584965-edge i40e-eno2-TxRx-4 > 102 2 2 IR-PCI-MSI 12584970-edge i40e-eno2-TxRx-9 > 128 2 2 IR-PCI-MSI 12584988-edge i40e-eno2-TxRx-27 > 133 2 2 IR-PCI-MSI 12584993-edge i40e-eno2-TxRx-32 > 134 2 2 IR-PCI-MSI 12584994-edge i40e-eno2-TxRx-33 > 139 2 2 IR-PCI-MSI 12584999-edge i40e-eno2-TxRx-38 > 141 2 2 IR-PCI-MSI 12585001-edge i40e-0000:18:00.1:fdir-TxRx-0 > 157 2 2 IR-PCI-MSI 49285123-edge ens3f1-rx-3 > 168 2 2 IR-PCI-MSI 49289220-edge ens3f3-rx-4 > 251 2 2 IR-PCI-MSI 12587020-edge i40e-eno3-TxRx-11 > 253 2 2 IR-PCI-MSI 12587022-edge i40e-eno3-TxRx-13 > 254 2 2 IR-PCI-MSI 12587023-edge i40e-eno3-TxRx-14 > 256 2 2 IR-PCI-MSI 12587025-edge i40e-eno3-TxRx-16 > 257 2 2 IR-PCI-MSI 12587026-edge i40e-eno3-TxRx-17 > 260 2 2 IR-PCI-MSI 12587029-edge i40e-eno3-TxRx-20 > 269 2 2 IR-PCI-MSI 12587038-edge i40e-eno3-TxRx-29 > 316 0,2,20,22 2 IR-PCI-MSI 49827840-edge mlx5_comp0@pci:0000:5f:01.2 > 317 2 2 IR-PCI-MSI 49827841-edge mlx5_comp1@pci:0000:5f:01.2 > 329 2 2 IR-PCI-MSI 49829889-edge mlx5_comp1@pci:0000:5f:01.3 > 352 0,2,20,22 2 IR-PCI-MSI 49833984-edge mlx5_comp0@pci:0000:5f:01.5 > 353 2 2 IR-PCI-MSI 49833985-edge mlx5_comp1@pci:0000:5f:01.5 > 364 0,2,20,22 2 IR-PCI-MSI 49836032-edge mlx5_comp0@pci:0000:5f:01.6 > 365 2 2 IR-PCI-MSI 49836033-edge mlx5_comp1@pci:0000:5f:01.6 > 380 2 2 IR-PCI-MSI 49807363-edge mlx5_comp3@pci:0000:5f:00.0 > 382 2 2 IR-PCI-MSI 49807365-edge mlx5_comp5@pci:0000:5f:00.0 > 384 2 2 IR-PCI-MSI 49807367-edge mlx5_comp7@pci:0000:5f:00.0 > 386 2 2 IR-PCI-MSI 49807369-edge mlx5_comp9@pci:0000:5f:00.0 > 387 2 2 IR-PCI-MSI 49807370-edge mlx5_comp10@pci:0000:5f:00.0 > 406 2 2 IR-PCI-MSI 49807389-edge mlx5_comp29@pci:0000:5f:00.0 > 407 2 2 IR-PCI-MSI 49807390-edge mlx5_comp30@pci:0000:5f:00.0 > 408 2 2 IR-PCI-MSI 49807391-edge mlx5_comp31@pci:0000:5f:00.0 > 413 2 2 IR-PCI-MSI 49807396-edge mlx5_comp36@pci:0000:5f:00.0 > 420 2 2 IR-PCI-MSI 12589058-edge i40e-eno4-TxRx-1 > 421 2 2 IR-PCI-MSI 12589059-edge i40e-eno4-TxRx-2 > 423 2 2 IR-PCI-MSI 12589061-edge i40e-eno4-TxRx-4 > 425 2 2 IR-PCI-MSI 12589063-edge i40e-eno4-TxRx-6 > 427 2 2 IR-PCI-MSI 12589065-edge i40e-eno4-TxRx-8 > 436 2 2 IR-PCI-MSI 12589074-edge i40e-eno4-TxRx-17 > 441 2 2 IR-PCI-MSI 12589079-edge i40e-eno4-TxRx-22 > 447 2 2 IR-PCI-MSI 12589085-edge i40e-eno4-TxRx-28 > 450 2 2 IR-PCI-MSI 12589088-edge i40e-eno4-TxRx-31 > 452 2 2 IR-PCI-MSI 12589090-edge i40e-eno4-TxRx-33 > 454 2 2 IR-PCI-MSI 12589092-edge i40e-eno4-TxRx-35 > 457 2 2 IR-PCI-MSI 12589095-edge i40e-eno4-TxRx-38 > 530 2 2 IR-PCI-MSI 49809418-edge mlx5_comp10@pci:0000:5f:00.1 > 531 2 2 IR-PCI-MSI 49809419-edge mlx5_comp11@pci:0000:5f:00.1 > 532 2 2 IR-PCI-MSI 49809420-edge mlx5_comp12@pci:0000:5f:00.1 > 533 2 2 IR-PCI-MSI 49809421-edge mlx5_comp13@pci:0000:5f:00.1 > 534 2 2 IR-PCI-MSI 49809422-edge mlx5_comp14@pci:0000:5f:00.1 > 535 2 2 IR-PCI-MSI 49809423-edge mlx5_comp15@pci:0000:5f:00.1 > 536 2 2 IR-PCI-MSI 49809424-edge mlx5_comp16@pci:0000:5f:00.1 > 537 2 2 IR-PCI-MSI 49809425-edge mlx5_comp17@pci:0000:5f:00.1 > 538 2 2 IR-PCI-MSI 49809426-edge mlx5_comp18@pci:0000:5f:00.1 > 539 2 2 IR-PCI-MSI 49809427-edge mlx5_comp19@pci:0000:5f:00.1 > 611 2 2 IR-PCI-MSI 49838081-edge mlx5_comp1@pci:0000:5f:01.7 > 622 0,2,20,22 2 IR-PCI-MSI 49840128-edge mlx5_comp0@pci:0000:5f:02.0 > 623 2 2 IR-PCI-MSI 49840129-edge mlx5_comp1@pci:0000:5f:02.0 > 635 2 2 IR-PCI-MSI 49842177-edge mlx5_comp1@pci:0000:5f:02.1 > 647 2 2 IR-PCI-MSI 49844225-edge mlx5_comp1@pci:0000:5f:02.2 > 659 2 2 IR-PCI-MSI 49846273-edge mlx5_comp1@pci:0000:5f:02.3 > 732 0,2,20,22 2 IR-PCI-MSI 12756995-edge iavf-eno3v5-TxRx-2 > 734 0,2,20,22 2 IR-PCI-MSI 12759040-edge iavf-0000:18:0a.6:mbx > 762 0,2,20,22 2 IR-PCI-MSI 12824579-edge iavf-eno4v6-TxRx-2 > 767 0,2,20,22 2 IR-PCI-MSI 12814339-edge iavf-eno4v1-TxRx-2 > 772 0,2,20,22 2 IR-PCI-MSI 12826627-edge iavf-eno4v7-TxRx-2 > 784 0,2,20,22 2 IR-PCI-MSI 12830720-edge iavf-0000:18:0f.1:mbx > 787 0,2,20,22 2 IR-PCI-MSI 12830723-edge iavf-eno4v9-TxRx-2 > 792 0,2,20,22 2 IR-PCI-MSI 12816387-edge iavf-eno4v2-TxRx-2 The irq affinities are overspilling on banned cpus. This probably is because the APIC from cpus 0 and 20 are full which is only a hardware limitation. Reducing the span of banned cpus fixes the issue: > # grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance > IRQBALANCE_BANNED_CPULIST=4-19,24-39 > > ~# irqstat -n | grep -v '\<0\>$' > CPU AFFINITY-IRQs EFFECTIVE-IRQs > 0 160 164 > 1 31 22 > 2 162 153 > 3 30 21 > 20 164 159 > 21 31 21 > 22 162 157 > 23 30 21 However, it would be nice if irqbalance could at least log a warning or an error that it failed to set the affinity for a specific irq. > ~# echo 0,20 > /proc/irq/47/smp_affinity_list > -bash: echo: write error: No space left on device For the record, here is the platform overview: > NUMA 0 > ====== > > Memory: 187GB > 2MB hugepages: 0 > 1GB hugepages: 32 > > CPUs > ---- > > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > Cores IDs: > 0,20 2,22 4,24 6,26 8,28 10,30 12,32 14,34 16,36 18,38 > > NICs > ---- > > SLOT DRIVER IFNAME MAC LINK/STATE SPEED DEVICE > 0000:18:00.0 i40e eno1 e4:43:4b:48:6a:20 1/up 1Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:18:00.1 i40e eno2 e4:43:4b:48:6a:21 1/up 1Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:18:00.2 i40e eno3 e4:43:4b:48:6a:22 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:18:00.3 i40e eno4 e4:43:4b:48:6a:23 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:3b:00.0 vfio-pci - - -/- - Ethernet Controller E810-C for QSFP > 0000:3b:00.1 vfio-pci - - -/- - Ethernet Controller E810-C for QSFP > 0000:5e:00.0 tg3 ens3f0 00:0a:f7:d9:e4:14 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > 0000:5e:00.1 tg3 ens3f1 00:0a:f7:d9:e4:15 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > 0000:5e:00.2 tg3 ens3f2 00:0a:f7:d9:e4:16 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > 0000:5e:00.3 tg3 ens3f3 00:0a:f7:d9:e4:17 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > 0000:5f:00.0 mlx5_core ens2f0np0 04:3f:72:b8:be:6a 1/up 10Gb/s MT27800 Family [ConnectX-5] > 0000:5f:00.1 mlx5_core ens2f1np1 04:3f:72:b8:be:6b 1/up 10Gb/s MT27800 Family [ConnectX-5] > > NUMA 1 > ====== > > Memory: 188GB > 2MB hugepages: 0 > 1GB hugepages: 32 > > CPUs > ---- > > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > Cores IDs: > 1,21 3,23 5,25 7,27 9,29 11,31 13,33 15,35 17,37 19,39 > > NICs > ---- > > SLOT DRIVER IFNAME MAC LINK/STATE SPEED DEVICE > 0000:af:00.0 i40e ens4f0 f8:f2:1e:42:5d:70 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:af:00.1 i40e ens4f1 f8:f2:1e:42:5d:70 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > 0000:af:00.2 vfio-pci - - -/- - Ethernet Controller X710 for 10GbE SFP+ > 0000:af:00.3 vfio-pci - - -/- - Ethernet Controller X710 for 10GbE SFP+ --- Additional comment from on 2023-06-19 07:10:43 UTC --- (In reply to Robin Jarry from comment #0) > Hi folks, > > When there is not enough room in the non-banned CPUs APIC, irqbalance seems > to silently let irq affinities overspill on banned CPUs. > > Here are a few traces to highlight the problem. The tool I am using > (irqstat) is available here: https://pypi.org/project/linux-tools/ > > > # rpm -qa irqbalance > > irqbalance-1.9.0-3.el9.x86_64 > > > > ~# grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance > > IRQBALANCE_BANNED_CPULIST=2-19,22-39 > > > > ~# irqstat -n | grep -v '\<0\>$' > > CPU AFFINITY-IRQs EFFECTIVE-IRQs > > 0 114 114 > > 1 43 34 > > 20 184 179 > > 21 61 51 > > > > ~# echo 10 > /sys/class/net/ens2f0np0/device/sriov_numvfs > > ~# echo 10 > /sys/class/net/ens2f1np1/device/sriov_numvfs > > ~# echo 10 > /sys/class/net/eno3/device/sriov_numvfs > > ~# echo 10 > /sys/class/net/eno4/device/sriov_numvfs > > > > ~# irqstat -n | grep -v '\<0\>$' > > CPU AFFINITY-IRQs EFFECTIVE-IRQs > > 0 224 201 > > 1 52 43 > > 2 97 78 > > 4 12 11 > > 6 12 11 > > 8 13 12 > > 10 6 5 > > 12 12 11 > > 14 12 11 > > 16 13 12 > > 18 13 12 > > 20 234 201 > > 21 52 42 > > 22 81 68 > > > > ~# irqstat -c 2 > > IRQ AFFINITY EFFECTIVE-CPU DESCRIPTION > > 47 2 2 IR-PCI-MSI 12582920-edge i40e-eno1-TxRx-7 > > 61 2 2 IR-PCI-MSI 12582934-edge i40e-eno1-TxRx-21 > > 62 2 2 IR-PCI-MSI 12582935-edge i40e-eno1-TxRx-22 > > 63 2 2 IR-PCI-MSI 12582936-edge i40e-eno1-TxRx-23 > > 64 2 2 IR-PCI-MSI 12582937-edge i40e-eno1-TxRx-24 > > 66 2 2 IR-PCI-MSI 12582939-edge i40e-eno1-TxRx-26 > > 68 2 2 IR-PCI-MSI 12582941-edge i40e-eno1-TxRx-28 > > 70 2 2 IR-PCI-MSI 12582943-edge i40e-eno1-TxRx-30 > > 77 2 2 IR-PCI-MSI 12582950-edge i40e-eno1-TxRx-37 > > 92 2 2 IR-PCI-MSI 12584960-edge i40e-0000:18:00.1:misc > > 97 2 2 IR-PCI-MSI 12584965-edge i40e-eno2-TxRx-4 > > 102 2 2 IR-PCI-MSI 12584970-edge i40e-eno2-TxRx-9 > > 128 2 2 IR-PCI-MSI 12584988-edge i40e-eno2-TxRx-27 > > 133 2 2 IR-PCI-MSI 12584993-edge i40e-eno2-TxRx-32 > > 134 2 2 IR-PCI-MSI 12584994-edge i40e-eno2-TxRx-33 > > 139 2 2 IR-PCI-MSI 12584999-edge i40e-eno2-TxRx-38 > > 141 2 2 IR-PCI-MSI 12585001-edge i40e-0000:18:00.1:fdir-TxRx-0 > > 157 2 2 IR-PCI-MSI 49285123-edge ens3f1-rx-3 > > 168 2 2 IR-PCI-MSI 49289220-edge ens3f3-rx-4 > > 251 2 2 IR-PCI-MSI 12587020-edge i40e-eno3-TxRx-11 > > 253 2 2 IR-PCI-MSI 12587022-edge i40e-eno3-TxRx-13 > > 254 2 2 IR-PCI-MSI 12587023-edge i40e-eno3-TxRx-14 > > 256 2 2 IR-PCI-MSI 12587025-edge i40e-eno3-TxRx-16 > > 257 2 2 IR-PCI-MSI 12587026-edge i40e-eno3-TxRx-17 > > 260 2 2 IR-PCI-MSI 12587029-edge i40e-eno3-TxRx-20 > > 269 2 2 IR-PCI-MSI 12587038-edge i40e-eno3-TxRx-29 > > 316 0,2,20,22 2 IR-PCI-MSI 49827840-edge mlx5_comp0@pci:0000:5f:01.2 > > 317 2 2 IR-PCI-MSI 49827841-edge mlx5_comp1@pci:0000:5f:01.2 > > 329 2 2 IR-PCI-MSI 49829889-edge mlx5_comp1@pci:0000:5f:01.3 > > 352 0,2,20,22 2 IR-PCI-MSI 49833984-edge mlx5_comp0@pci:0000:5f:01.5 > > 353 2 2 IR-PCI-MSI 49833985-edge mlx5_comp1@pci:0000:5f:01.5 > > 364 0,2,20,22 2 IR-PCI-MSI 49836032-edge mlx5_comp0@pci:0000:5f:01.6 > > 365 2 2 IR-PCI-MSI 49836033-edge mlx5_comp1@pci:0000:5f:01.6 > > 380 2 2 IR-PCI-MSI 49807363-edge mlx5_comp3@pci:0000:5f:00.0 > > 382 2 2 IR-PCI-MSI 49807365-edge mlx5_comp5@pci:0000:5f:00.0 > > 384 2 2 IR-PCI-MSI 49807367-edge mlx5_comp7@pci:0000:5f:00.0 > > 386 2 2 IR-PCI-MSI 49807369-edge mlx5_comp9@pci:0000:5f:00.0 > > 387 2 2 IR-PCI-MSI 49807370-edge mlx5_comp10@pci:0000:5f:00.0 > > 406 2 2 IR-PCI-MSI 49807389-edge mlx5_comp29@pci:0000:5f:00.0 > > 407 2 2 IR-PCI-MSI 49807390-edge mlx5_comp30@pci:0000:5f:00.0 > > 408 2 2 IR-PCI-MSI 49807391-edge mlx5_comp31@pci:0000:5f:00.0 > > 413 2 2 IR-PCI-MSI 49807396-edge mlx5_comp36@pci:0000:5f:00.0 > > 420 2 2 IR-PCI-MSI 12589058-edge i40e-eno4-TxRx-1 > > 421 2 2 IR-PCI-MSI 12589059-edge i40e-eno4-TxRx-2 > > 423 2 2 IR-PCI-MSI 12589061-edge i40e-eno4-TxRx-4 > > 425 2 2 IR-PCI-MSI 12589063-edge i40e-eno4-TxRx-6 > > 427 2 2 IR-PCI-MSI 12589065-edge i40e-eno4-TxRx-8 > > 436 2 2 IR-PCI-MSI 12589074-edge i40e-eno4-TxRx-17 > > 441 2 2 IR-PCI-MSI 12589079-edge i40e-eno4-TxRx-22 > > 447 2 2 IR-PCI-MSI 12589085-edge i40e-eno4-TxRx-28 > > 450 2 2 IR-PCI-MSI 12589088-edge i40e-eno4-TxRx-31 > > 452 2 2 IR-PCI-MSI 12589090-edge i40e-eno4-TxRx-33 > > 454 2 2 IR-PCI-MSI 12589092-edge i40e-eno4-TxRx-35 > > 457 2 2 IR-PCI-MSI 12589095-edge i40e-eno4-TxRx-38 > > 530 2 2 IR-PCI-MSI 49809418-edge mlx5_comp10@pci:0000:5f:00.1 > > 531 2 2 IR-PCI-MSI 49809419-edge mlx5_comp11@pci:0000:5f:00.1 > > 532 2 2 IR-PCI-MSI 49809420-edge mlx5_comp12@pci:0000:5f:00.1 > > 533 2 2 IR-PCI-MSI 49809421-edge mlx5_comp13@pci:0000:5f:00.1 > > 534 2 2 IR-PCI-MSI 49809422-edge mlx5_comp14@pci:0000:5f:00.1 > > 535 2 2 IR-PCI-MSI 49809423-edge mlx5_comp15@pci:0000:5f:00.1 > > 536 2 2 IR-PCI-MSI 49809424-edge mlx5_comp16@pci:0000:5f:00.1 > > 537 2 2 IR-PCI-MSI 49809425-edge mlx5_comp17@pci:0000:5f:00.1 > > 538 2 2 IR-PCI-MSI 49809426-edge mlx5_comp18@pci:0000:5f:00.1 > > 539 2 2 IR-PCI-MSI 49809427-edge mlx5_comp19@pci:0000:5f:00.1 > > 611 2 2 IR-PCI-MSI 49838081-edge mlx5_comp1@pci:0000:5f:01.7 > > 622 0,2,20,22 2 IR-PCI-MSI 49840128-edge mlx5_comp0@pci:0000:5f:02.0 > > 623 2 2 IR-PCI-MSI 49840129-edge mlx5_comp1@pci:0000:5f:02.0 > > 635 2 2 IR-PCI-MSI 49842177-edge mlx5_comp1@pci:0000:5f:02.1 > > 647 2 2 IR-PCI-MSI 49844225-edge mlx5_comp1@pci:0000:5f:02.2 > > 659 2 2 IR-PCI-MSI 49846273-edge mlx5_comp1@pci:0000:5f:02.3 > > 732 0,2,20,22 2 IR-PCI-MSI 12756995-edge iavf-eno3v5-TxRx-2 > > 734 0,2,20,22 2 IR-PCI-MSI 12759040-edge iavf-0000:18:0a.6:mbx > > 762 0,2,20,22 2 IR-PCI-MSI 12824579-edge iavf-eno4v6-TxRx-2 > > 767 0,2,20,22 2 IR-PCI-MSI 12814339-edge iavf-eno4v1-TxRx-2 > > 772 0,2,20,22 2 IR-PCI-MSI 12826627-edge iavf-eno4v7-TxRx-2 > > 784 0,2,20,22 2 IR-PCI-MSI 12830720-edge iavf-0000:18:0f.1:mbx > > 787 0,2,20,22 2 IR-PCI-MSI 12830723-edge iavf-eno4v9-TxRx-2 > > 792 0,2,20,22 2 IR-PCI-MSI 12816387-edge iavf-eno4v2-TxRx-2 > > The irq affinities are overspilling on banned cpus. This probably is because > the APIC from cpus 0 and 20 are full which is only a hardware limitation. > > Reducing the span of banned cpus fixes the issue: > > > # grep ^IRQBALANCE_BANNED_CPU /etc/sysconfig/irqbalance > > IRQBALANCE_BANNED_CPULIST=4-19,24-39 > > > > ~# irqstat -n | grep -v '\<0\>$' > > CPU AFFINITY-IRQs EFFECTIVE-IRQs > > 0 160 164 > > 1 31 22 > > 2 162 153 > > 3 30 21 > > 20 164 159 > > 21 31 21 > > 22 162 157 > > 23 30 21 > > However, it would be nice if irqbalance could at least log a warning or an > error that it failed to set the affinity for a specific irq. > > > ~# echo 0,20 > /proc/irq/47/smp_affinity_list > > -bash: echo: write error: No space left on device > > For the record, here is the platform overview: > > > NUMA 0 > > ====== > > > > Memory: 187GB > > 2MB hugepages: 0 > > 1GB hugepages: 32 > > > > CPUs > > ---- > > > > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > > Cores IDs: > > 0,20 2,22 4,24 6,26 8,28 10,30 12,32 14,34 16,36 18,38 > > > > NICs > > ---- > > > > SLOT DRIVER IFNAME MAC LINK/STATE SPEED DEVICE > > 0000:18:00.0 i40e eno1 e4:43:4b:48:6a:20 1/up 1Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:18:00.1 i40e eno2 e4:43:4b:48:6a:21 1/up 1Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:18:00.2 i40e eno3 e4:43:4b:48:6a:22 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:18:00.3 i40e eno4 e4:43:4b:48:6a:23 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:3b:00.0 vfio-pci - - -/- - Ethernet Controller E810-C for QSFP > > 0000:3b:00.1 vfio-pci - - -/- - Ethernet Controller E810-C for QSFP > > 0000:5e:00.0 tg3 ens3f0 00:0a:f7:d9:e4:14 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > > 0000:5e:00.1 tg3 ens3f1 00:0a:f7:d9:e4:15 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > > 0000:5e:00.2 tg3 ens3f2 00:0a:f7:d9:e4:16 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > > 0000:5e:00.3 tg3 ens3f3 00:0a:f7:d9:e4:17 0/down - NetXtreme BCM5719 Gigabit Ethernet PCIe > > 0000:5f:00.0 mlx5_core ens2f0np0 04:3f:72:b8:be:6a 1/up 10Gb/s MT27800 Family [ConnectX-5] > > 0000:5f:00.1 mlx5_core ens2f1np1 04:3f:72:b8:be:6b 1/up 10Gb/s MT27800 Family [ConnectX-5] > > > > NUMA 1 > > ====== > > > > Memory: 188GB > > 2MB hugepages: 0 > > 1GB hugepages: 32 > > > > CPUs > > ---- > > > > Model name: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz > > Cores IDs: > > 1,21 3,23 5,25 7,27 9,29 11,31 13,33 15,35 17,37 19,39 > > > > NICs > > ---- > > > > SLOT DRIVER IFNAME MAC LINK/STATE SPEED DEVICE > > 0000:af:00.0 i40e ens4f0 f8:f2:1e:42:5d:70 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:af:00.1 i40e ens4f1 f8:f2:1e:42:5d:70 1/up 10Gb/s Ethernet Controller X710 for 10GbE SFP+ > > 0000:af:00.2 vfio-pci - - -/- - Ethernet Controller X710 for 10GbE SFP+ > > 0000:af:00.3 vfio-pci - - -/- - Ethernet Controller X710 for 10GbE SFP+ Hi Robin, Thanks for reporting the issue, I think it is reasonable to have a notification when irqbalance overspill IRQs on banned cpus. Just for curiousity, which beaker machine did you use for the testing? I failed to simulate a system which have enough devices' IRQs to overspill using qemu. Thanks, Tao Liu --- Additional comment from Robin Jarry on 2023-06-19 07:37:08 UTC --- Hi there, any machine with SRIOV capable PCI devices should be enough to produce more than 224 IRQs. I don't think you will be able to test this in QEMU. --- Additional comment from on 2023-07-04 02:34:39 UTC --- Patch[1] posted upstream [1]: https://github.com/Irqbalance/irqbalance/pull/265 --- Additional comment from Robin Jarry on 2023-07-05 14:47:01 UTC --- I just realized that this issue actually hides another regression. This patch https://github.com/Irqbalance/irqbalance/commit/55c5c321c73e4c9b54e041ba8c7d542598685bae (included in irqbalance 1.7.0) causes any failure to enforce smp_affinity to ban the IRQ for the whole life of the process. APIC being out of space is a transient issue but irqbalance will never try again to move the interrupt to another CPU unless it is restarted. I have submitted another pull request here https://github.com/Irqbalance/irqbalance/pull/266. Waiting for feedback.