Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2052947

Summary:	better handling of NIC irqs
Product:	Red Hat Enterprise Linux 8	Reporter:	Paolo Abeni <pabeni>
Component:	irqbalance	Assignee:	ltao
Status:	CLOSED MIGRATED	QA Contact:	Jiri Dluhos <jdluhos>
Severity:	high	Docs Contact:
Priority:	medium
Version:	8.5	CC:	danw, denli, jbainbri, jeder, jmario, jshortt, ruyang, rvr
Target Milestone:	rc	Keywords:	MigratedToJIRA, Triaged
Target Release:	---	Flags:	pm-rhel: mirror+
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2023-09-25 17:42:24 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Paolo Abeni 2022-02-10 10:22:59 UTC

Due to the NAPI infrastructure, the accounting for interrupts generated by network driver is problematic: under low load the interrupt number is proportional to network traffic, but the more the network load increases the more the interrupts generated by the NICS are mitigated in software. Under very high network load is common that no interrupt at all is generated.

As a consequence, actions taken by irqbalance based on NIC interrupt numbers accounting are more often then not incorrect, especially in the most relevant scenarios, when the network load is high.

The proposed solution is limiting irqbalance to spread the NIC IRQs on the available CPUs and then leave the IRQ alone.

Comment 1 Paolo Abeni 2022-02-10 10:30:33 UTC

Added Dan to the CC-list for OCP-team awareness

Comment 5 Jamie Bainbridge 2022-08-29 06:53:51 UTC

Paolo and I were discussing this via email.

aiui the optimal configuration for IRQs is to have NIC channels equal to the number of real cores (not HyperThreads) in the NUMA Node local to the NIC, and don't handle multiple IRQs for the same device on the same CPU.

Crossing a NUMA Node is definitely not good. Performance can be as bad as half wirespeed. This has worked properly since irqbalance-1.0.4-10.el6, so that's good:

Why is irqbalance not balancing interrupts?
https://access.redhat.com/solutions/677073

We often get customers to ban thread siblings from irqbalance:

How to calculate hexadecimal bit mask value for "IRQBALANCE_BANNED_CPUS" parameter
https://access.redhat.com/solutions/3152271

We often get customers to change the number of IRQ channels to the number of real cores

How should I configure network interface IRQ channels?
https://access.redhat.com/solutions/4367191

Some thoughts around irqbalance improvements here:

1) My understanding is that there is no advantage to spreading IRQ channels from the same device across HyperThreads, presumably because disabling IRQs is local to the physical core (two HyperThreads), not to the logical core (one HyperThread). I forget where I read this, but it was at least 7 years ago. It would be good to confirm if this is still the situation on modern CPUs, or if some behave differently. This might be particularly complex with AMD's "configurable NUMA" features within the one socket on some models, and which changes specifics from model to model.

2) If the above holds true, irqbalance could understand siblings from paths like "/sys/devices/system/cpu/cpuX/topology/{core,thread}_siblings_list" and automatically not consider siblings as valid for balancing when the first core is already handling an interrupt, effectively banning HT from irqbalance automatically. For additional complexity depending on CPU brand/family, there are ways to detect these too. Defining the right/wrong way to handle IRQs in code like irqbalance is much better than re-applying the internet knowledge I heard in the middle of last decade that might have changed with new CPU families.

3) irqbalance should not "double up" IRQs for the same device on CPU cores, but it's common for NIC drivers to create "nr_cpus" IRQ channels. On a NUMA system with HT, this results in many more IRQs than really should be made. irqbalance could detect the number of useful cores in a NUMA node and (where possible) issue commands to change the number of channels on a device to the optimal number. This would have to be "max(real CPUs in NUMA Node, NIC max IRQ channels)" because some devices place a hard limit on IRQ channels, like vmxnet3's maximum 8 channels. For networking, the netlink nl_schannels (used by "ethtool --set-channels") interface is the standard for networking

3a) Some storage HBA drivers also allow SMP affinity to be disabled so that their interrupts can be managed, however this appears to be mostly controlled by module options (megraid_sas has smp_affinity_enable, qla2xxx has ql2xuctrlirq). Unsure if there is a generic interface for these like netlink. Maybe we need storage maintainers to develop a generic interface like networking's netlink (or just use netlink).

4) The items 2 and 3 - automatically changing the number of cores/channels to be considered - is moving from an "interrupt balancer" to an "interrupt manager". This is a welcome change, it would proactively solve a number of performance problems that customers contact us about, and is in line with other auto-performance-tuning tools that Red Hat supplies like numad. However, this might be outside of the scope of what upstream irqbalance wants to do. In that case, it would be necessary to fork irqbalance or develop a new solution.

Comment 6 Jamie Bainbridge 2022-10-28 01:54:05 UTC

Marc and I are discussing updating Jon and my old performance tuning whitepaper and moving it into product documentation, which could address this by manual customer action:

 Red Hat Enterprise Linux Network Performance Tuning Guide
 https://access.redhat.com/articles/1391433

 Create a RHEL network performance tuning guide
 https://issues.redhat.com/browse/RHELPLAN-137653

Comment 7 ltao 2023-02-28 08:39:02 UTC

Hi Jamie,

Currently we are discussion about the rebase planning of irqbalance for upcoming rhel9.3 and rhel8.9. I see there is no code updates for this bug instead the performance tuning guide as you mentioned in comment6. I don't know if the issue is solved by the documentation, and should I close the bug?

Thanks,
Tao Liu

Comment 8 Jamie Bainbridge 2023-02-28 21:35:35 UTC

(In reply to ltao from comment #7)
> should I close the bug?

No, this is a bug for Paolo (and other network developers) to consider long-term improvement of irqbalance.

Please leave this bug as it is.

Comment 9 ltao 2023-03-01 01:09:32 UTC

(In reply to Jamie Bainbridge from comment #8)
> (In reply to ltao from comment #7)
> > should I close the bug?
> 
> No, this is a bug for Paolo (and other network developers) to consider
> long-term improvement of irqbalance.
> 
> Please leave this bug as it is.

OK, Thanks!

Comment 10 Jamie Bainbridge 2023-04-20 09:27:36 UTC

The kernel has since grown an attempt at vector spreading on creation in v4.8 (July 2016) starting with:

 genirq: Add a helper to spread an affinity mask for MSI/MSI-X vectors
 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5e385a6ef31f

The state of that can be seen in the git log for kernel/irq/affinity.c

At least in RHEL9 we can do something like this now:

rhel9 ~]# systemctl status irqbalance
○ irqbalance.service - irqbalance daemon
     Loaded: loaded (/usr/lib/systemd/system/irqbalance.service; disabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: man:irqbalance(1)
             https://github.com/Irqbalance/irqbalance

rhel9 ~]# grep virtio7 /proc/interrupts 
 64:          0          0          0          0   PCI-MSI 4718592-edge      virtio7-config
 65:      26744          0          1          0   PCI-MSI 4718593-edge      virtio7-input.0
 66:      23243          0          0          1   PCI-MSI 4718594-edge      virtio7-output.0
 67:          1     225172          0          0   PCI-MSI 4718595-edge      virtio7-input.1
 68:          0     132020          0          0   PCI-MSI 4718596-edge      virtio7-output.1
 69:          0          0     119062          0   PCI-MSI 4718597-edge      virtio7-input.2
 70:          0          0      82214          1   PCI-MSI 4718598-edge      virtio7-output.2
 71:          1          0          0      77294   PCI-MSI 4718599-edge      virtio7-input.3
 72:          0          1          0      39300   PCI-MSI 4718600-edge      virtio7-output.3

Upstream also had a discussion about doing balancing in the kernel here in Nov 2017:

 Implementing irqbalance into the Linux Kernel #59
 https://github.com/Irqbalance/irqbalance/issues/59

Ironically, PJ and Neil said it's not the kernel's job to enforce policy like IRQ affinity, which makes me wonder how the above genirq/affinity patches got in then.

Comment 12 RHEL Program Management 2023-09-25 17:38:52 UTC

Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 13 RHEL Program Management 2023-09-25 17:42:24 UTC

This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.