Bug 2219830
| Summary: | irqbalance: silently failing to enforce IRQBALANCE_BANNED_CPULIST | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Robin Jarry <rjarry> |
| Component: | os-net-config | Assignee: | Miguel Angel Nieto <mnietoji> |
| Status: | CLOSED ERRATA | QA Contact: | Miguel Angel Nieto <mnietoji> |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.1 (Wallaby) | CC: | apevec, atenart, bfournie, cfontain, chrisw, dmarchan, dsneddon, ekuris, gregraka, hakhande, jdluhos, jeder, jshortt, jslagle, ltao, mariel, mburns, mleitner, pgrist, ralonsoh, rrubins, ruyang, smooney, supadhya, vcandapp |
| Target Milestone: | z4 | Keywords: | Triaged |
| Target Release: | 17.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | os-net-config-14.2.1-17.1.20240917140802.61d7bd7.el9ost | Doc Type: | Known Issue |
| Doc Text: |
In RHOSP 17.1, there is a known issue of transient packet loss where hardware interrupt requests (IRQs) are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications.
+
This issue is the result of provisioning large numbers of VFs during deployment. VFs need IRQs, each of which must be bound to a physical CPU. When there are not enough housekeeping CPUs to handle the capacity of IRQs, `irqbalance` fails to bind all of them and the IRQs overspill on isolated CPUs.
+
Workaround: You can try one or more of these actions:
* Reduce the number of provisioned VFs to avoid unused VFs remaining bound to their default Linux driver.
* Increase the number of housekeeping CPUs to handle all IRQs.
* Force unused VF network interfaces down to avoid IRQs from interrupting isolated CPUs.
* Disable multicast and broadcast traffic on unused, down VF network interfaces to avoid IRQs from interrupting isolated CPUs.
|
Story Points: | --- |
| Clone Of: | 2184735 | Environment: | |
| Last Closed: | 2024-11-21 09:38:27 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2184735 | ||
| Bug Blocks: | 2034801, 2274492 | ||
|
Description
Robin Jarry
2023-07-05 14:52:06 UTC
I have been configured 96 vfs (64 in intel nics and 32 in mellanox nics) and compared results settings drivers_autoprobe to true/false. I obtained good performance results for dpdk/sriov in both cases and all testcases in basic regression run sucessfully. Somehow, I am not seeing in latest puddle performance degradation if i increase a lot the number of vfs and drivers_autoprobe is set to true. There are other improvements in this build that may be helping (nothing is writing to the serial/vga console anymore). In any case, i have tested that setting "drivers_autoprobe: false" does not break anything in a dpdk/sriov environment. (undercloud) [stack@undercloud-0 ~]$ cat core_puddle_version RHOS-17.1-RHEL-9-20241014.n.1 [root@compute-0 tripleo-admin]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.2 (Plow) [root@compute-0 tripleo-admin]# uname -a Linux compute-0 5.14.0-284.86.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Sep 23 12:42:39 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux - type: sriov_pf name: nic9 mtu: 9000 numvfs: 32 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false drivers_autoprobe: false - type: sriov_pf name: nic10 mtu: 9000 numvfs: 32 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false drivers_autoprobe: false - type: sriov_pf name: nic11 mtu: 9000 numvfs: 16 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false drivers_autoprobe: false - type: sriov_pf name: nic12 mtu: 9000 numvfs: 16 use_dhcp: false defroute: false nm_controlled: true hotplug: true promisc: false drivers_autoprobe: false Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHOSP 17.1.4 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9974 |