Bug 1844520
Summary: | Incorrect pinning of IRQ threads on isolated CPUs by drivers that use cpumask_local_spread() | |||
---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Nitesh Narayan Lal <nilal> | |
Component: | kernel | Assignee: | Nitesh Narayan Lal <nilal> | |
kernel sub component: | Kernel-Core | QA Contact: | Pei Zhang <pezhang> | |
Status: | CLOSED ERRATA | Docs Contact: | Jaroslav Klech <jklech> | |
Severity: | high | |||
Priority: | high | CC: | bhu, broskos, jklech, jlelli, kcarcia, lcapitulino, linville, marjones, mschmidt, mstowell, mtosatti, network-qe, peterx, pezhang, smeisner, sthennak, williams | |
Version: | 8.3 | Keywords: | ZStream | |
Target Milestone: | rc | |||
Target Release: | 8.3 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | kernel-4.18.0-229.el8 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1867174 (view as bug list) | Environment: | ||
Last Closed: | 2020-11-04 01:20:59 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1807069, 1817732, 1867174, 1868433 |
Description
Nitesh Narayan Lal
2020-06-05 15:20:10 UTC
I am assigning the bug to myself as I am already working on a fix that will ensure that cpumask_local_spread() only uses housekeeping CPUs. This fix is derived from one of the existing patches of task isolation patch-series [1] that is currently under discussion. I will be posting the fix upstream with some changes and will share a link here. [1] https://lkml.org/lkml/2020/4/9/530 Patches have been posted upstream: https://lore.kernel.org/lkml/20200610161226.424337-1-nitesh@redhat.com/ Patch(es) available on kernel-4.18.0-229.el8 Seems this issue is hardware related. Following Nitesh's instructions, I cannot reproduce this issue on below 2 servers with XXV710 NICs and XL710 NICs. Next, I'll try to reproduce and verify on Nitesh's server. (1)dell-per730-27.lab.eng.pek2.redhat.com # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2299.763 BogoMIPS: 4599.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 40960K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d # lspci | grep Eth ... 82:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) 82:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) (2) dell-per430-11.lab.eng.pek2.redhat.com # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thread(s) per core: 1 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2297.239 BogoMIPS: 4594.46 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d # lspci | grep Eth ... 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 04:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 06:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 06:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) Update my testings here: Version: 4.18.0-193.rt13.51.el8.x86_64: Fails to reproduce this issue. 1. Create 2 VFs echo 2 > /sys/bus/pci/devices/0000\:82\:00.0/sriov_numvfs 2. Find iavf irqs. # find /proc/irq/ -name "*iav*" /proc/irq/361/iavf-0000:82:02.0:mbx /proc/irq/362/iavf-enp130s0f0v0-TxRx-0 /proc/irq/363/iavf-enp130s0f0v0-TxRx-1 /proc/irq/364/iavf-enp130s0f0v0-TxRx-2 /proc/irq/365/iavf-enp130s0f0v0-TxRx-3 /proc/irq/366/iavf-0000:82:02.1:mbx /proc/irq/367/iavf-enp130s0f0v1-TxRx-0 /proc/irq/368/iavf-enp130s0f0v1-TxRx-1 /proc/irq/369/iavf-enp130s0f0v1-TxRx-2 /proc/irq/370/iavf-enp130s0f0v1-TxRx-3 3. Check cpu affinity of above threads. They are pin to housekeeping cores. [root@dell-per730-27 ~]# cat /proc/irq/36*/smp_affinity_list 22 28 2 6 0 4 30 14 20 22 22 [root@dell-per730-27 ~]# cat /proc/irq/370/smp_affinity_list 12 Reference: Host info: # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-193.rt13.51.el8.x86_64 root=/dev/mapper/rhel_dell--per730--27-root ro crashkernel=auto resume=/dev/mapper/rhel_dell--per730--27-swap rd.lvm.lv=rhel_dell-per730-27/root rd.lvm.lv=rhel_dell-per730-27/swap console=ttyS0,115200n81 skew_tick=1 isolcpus=managed_irq,domain,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 rcu_nocbs=1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 [root@dell-per730-27 ~]# lspci | grep Eth ... 82:00.0 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) 82:00.1 Ethernet controller: Intel Corporation Ethernet Controller XXV710 for 25GbE SFP28 (rev 02) 82:02.0 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02) 82:02.1 Ethernet controller: Intel Corporation Ethernet Virtual Function 700 Series (rev 02) # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2300.009 BogoMIPS: 4599.62 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 40960K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d Pei, A couple of suggestions to help see if you can get a reproducer: - raise the # of VFs per port (go to 8 or maybe even higher, the card supports it) - create VFs for every port on every NIC. I see 2 nics with 2 ports each in one of your tests so that should result in 32 or more VFs for testing - reduce the number of house keeping threads - a typical NFV server would only allocate core 0 of each numa to housekeeping. (In reply to broskos from comment #21) > Pei, > > A couple of suggestions to help see if you can get a reproducer: > - raise the # of VFs per port (go to 8 or maybe even higher, the card > supports it) > - create VFs for every port on every NIC. I see 2 nics with 2 ports each in > one of your tests so that should result in 32 or more VFs for testing > - reduce the number of house keeping threads - a typical NFV server would > only allocate core 0 of each numa to housekeeping. Thank you Brent for the suggestions. Now this issue can be reproduced on my setup after creating 32 VFs for both XL710 NICs and isolating 2-19 (only leave 0 and 1 as housekeeping cores). Best regards, Pei Steps: Following Nitesh's reproducer in Description. 1. Setup RT host, leave 0 of each NUMA as housekeeping cores, other cores are isolated. In this setup, we isolate 2-19 (0,1 as housekeeping cores) # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Thread(s) per core: 1 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz Stepping: 2 CPU MHz: 2297.583 BogoMIPS: 4594.85 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 25600K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm arat pln pts md_clear flush_l1d # cat /proc/cmdline BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-... isolcpus=managed_irq,domain,2-19 intel_pstate=disable nosoftlockup tsc=nowatchdog nohz=on nohz_full=2-19 rcu_nocbs=2-19 2. Create 32 VFs per PF. This NIC is XL710. # lspci | grep Eth 04:00.0 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) 04:00.1 Ethernet controller: Intel Corporation Ethernet Controller XL710 for 40GbE QSFP+ (rev 02) ... # echo 32 > /sys/bus/pci/devices/0000\:04\:00.0/sriov_numvfs # echo 32 > /sys/bus/pci/devices/0000\:04\:00.1/sriov_numvfs 3. Check irqs of iavf. # find /proc/irq/ -name "*iav*" /proc/irq/202/iavf-0000:04:02.0:mbx /proc/irq/203/iavf-enp4s0f0v0-TxRx-0 /proc/irq/204/iavf-enp4s0f0v0-TxRx-1 /proc/irq/205/iavf-enp4s0f0v0-TxRx-2 /proc/irq/206/iavf-enp4s0f0v0-TxRx-3 /proc/irq/207/iavf-0000:04:02.1:mbx /proc/irq/212/iavf-0000:04:02.4:mbx /proc/irq/213/iavf-enp4s0f0v4-TxRx-0 /proc/irq/214/iavf-enp4s0f0v4-TxRx-1 /proc/irq/215/iavf-enp4s0f0v4-TxRx-2 /proc/irq/216/iavf-enp4s0f0v4-TxRx-3 /proc/irq/217/iavf-0000:04:02.5:mbx /proc/irq/222/iavf-0000:04:03.2:mbx /proc/irq/227/iavf-0000:04:02.7:mbx /proc/irq/232/iavf-0000:04:03.1:mbx /proc/irq/237/iavf-0000:04:03.3:mbx /proc/irq/242/iavf-0000:04:04.2:mbx /proc/irq/247/iavf-0000:04:05.2:mbx /proc/irq/252/iavf-0000:04:04.3:mbx /proc/irq/257/iavf-0000:04:04.1:mbx /proc/irq/262/iavf-0000:04:05.1:mbx /proc/irq/267/iavf-0000:04:03.4:mbx /proc/irq/272/iavf-0000:04:04.0:mbx /proc/irq/277/iavf-0000:04:05.0:mbx /proc/irq/282/iavf-0000:04:03.6:mbx /proc/irq/287/iavf-0000:04:05.5:mbx /proc/irq/292/iavf-0000:04:04.6:mbx /proc/irq/297/iavf-0000:04:03.7:mbx /proc/irq/302/iavf-0000:04:05.6:mbx /proc/irq/307/iavf-0000:04:05.3:mbx /proc/irq/312/iavf-0000:04:04.4:mbx /proc/irq/317/iavf-0000:04:03.5:mbx /proc/irq/322/iavf-0000:04:05.4:mbx /proc/irq/327/iavf-0000:04:04.5:mbx /proc/irq/332/iavf-0000:04:04.7:mbx /proc/irq/337/iavf-0000:04:05.7:mbx /proc/irq/342/iavf-0000:04:0b.5:mbx /proc/irq/347/iavf-0000:04:0a.3:mbx /proc/irq/352/iavf-0000:04:0c.4:mbx /proc/irq/357/iavf-0000:04:0c.5:mbx /proc/irq/362/iavf-0000:04:0c.6:mbx /proc/irq/367/iavf-0000:04:0a.0:mbx /proc/irq/372/iavf-0000:04:0c.1:mbx /proc/irq/377/iavf-0000:04:0a.1:mbx /proc/irq/382/iavf-0000:04:0c.2:mbx /proc/irq/387/iavf-0000:04:0a.2:mbx /proc/irq/392/iavf-0000:04:0c.3:mbx /proc/irq/397/iavf-0000:04:0b.6:mbx /proc/irq/402/iavf-0000:04:0b.7:mbx /proc/irq/407/iavf-0000:04:0c.0:mbx /proc/irq/412/iavf-0000:04:0d.0:mbx /proc/irq/417/iavf-0000:04:0d.1:mbx /proc/irq/422/iavf-0000:04:0d.2:mbx /proc/irq/427/iavf-0000:04:0c.7:mbx /proc/irq/432/iavf-0000:04:0a.4:mbx /proc/irq/437/iavf-0000:04:0d.3:mbx /proc/irq/442/iavf-0000:04:0d.4:mbx /proc/irq/447/iavf-0000:04:0a.5:mbx /proc/irq/452/iavf-0000:04:0a.6:mbx /proc/irq/457/iavf-0000:04:0d.5:mbx /proc/irq/462/iavf-0000:04:0a.7:mbx /proc/irq/467/iavf-0000:04:0d.6:mbx /proc/irq/472/iavf-0000:04:0b.0:mbx /proc/irq/477/iavf-0000:04:0d.7:mbx /proc/irq/482/iavf-0000:04:0b.1:mbx /proc/irq/487/iavf-0000:04:0b.2:mbx /proc/irq/492/iavf-0000:04:0b.4:mbx /proc/irq/497/iavf-0000:04:0b.3:mbx 4. Check each iavf threads cpu pin. # for i in `seq 202 497`;do cat /proc/irq/$i/smp_affinity_list; done Reproduced with 4.18.0-228.rt7.40.el8.x86_64: After step 4, some iavf threads are pin to isolated cores, like 2,3. So this issue has been reproduced. # for i in `seq 202 497`;do cat /proc/irq/$i/smp_affinity_list; done 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0-1 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 0 0 1 2 3 Verified with 4.18.0-232.rt7.44.el8.x86_64: After step 4, all iavf threads are pin to housekeeping cores(in this example, housekeeping cores are 0,1). # for i in `seq 202 497`;do cat /proc/irq/$i/smp_affinity_list; done 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 0-1 1 0-1 1 0 So this issue has been fixed very well. Move to 'VERIFIED'. I am migrating this bz's doctext to bz#1867174, as the problem needs to be published in 8.2 and imho 1867174 suits better for that purposes. Regards, Jaroslav Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4431 |