Back to bug 2219830
| Who | When | What | Removed | Added |
|---|---|---|---|---|
| RHEL Program Management | 2023-07-05 14:52:15 UTC | Target Release | 17.1 | --- |
| Eran Kuris | 2023-07-05 15:37:24 UTC | Severity | high | medium |
| Red Hat One Jira (issues.redhat.com) | 2023-07-05 15:37:56 UTC | Link ID | Red Hat Issue Tracker OSP-26344 | |
| Eran Kuris | 2023-07-05 15:45:20 UTC | Doc Type | If docs needed, set a value | Known Issue |
| Eran Kuris | 2023-07-05 15:47:26 UTC | Version | 17.0 (Wallaby) | 17.1 (Wallaby) |
| Eran Kuris | 2023-07-05 15:48:17 UTC | Status | NEW | ASSIGNED |
| RHEL Program Management | 2023-07-05 15:48:27 UTC | Target Release | --- | 17.1 |
| Robin Jarry | 2023-07-06 10:50:04 UTC | Doc Text | Cause ===== Every provisioned VF is automatically bound to its default linux driver (different depending on the PF model). The default driver will create a corresponding network interface with its own RX queues for every VF. Each RX queue corresponds to a hardware IRQ. If the VFs remain "unused" (i.e. not assigned to a VM), they will remain bound to the default linux driver and their corresponding network device will require IRQs for every RX queue. Provisioning a large number of VFs during deployment can cause a large number of IRQs to be created. Every IRQ needs to be bound to a physical CPU. For NFV deployments, all IRQs should be bound only to housekeeping CPUs to avoid packet loss. On x86_64, a single CPU can only handle 224 IRQs. When the number of housekeeping CPUs is not big enough to handle all IRQs, irqbalance will fail to bind them and the IRQs will overspill on isolated CPUs. Consequence =========== This isolation issue may cause transient packet loss if IRQs are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. Workaround (if any) =================== One or several of: 1) Reduce the number of provisioned VFs to avoid unused VFs to remain bound to their default linux driver. 2) Increase the number of housekeeping CPUs to handle all IRQs. 3) Force unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. 4) Disable multicast and broadcast traffic on unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. Result ====== Transient packet loss should not occur anymore. |
|
| Robin Jarry | 2023-07-06 10:50:49 UTC | Doc Text | Cause ===== Every provisioned VF is automatically bound to its default linux driver (different depending on the PF model). The default driver will create a corresponding network interface with its own RX queues for every VF. Each RX queue corresponds to a hardware IRQ. If the VFs remain "unused" (i.e. not assigned to a VM), they will remain bound to the default linux driver and their corresponding network device will require IRQs for every RX queue. Provisioning a large number of VFs during deployment can cause a large number of IRQs to be created. Every IRQ needs to be bound to a physical CPU. For NFV deployments, all IRQs should be bound only to housekeeping CPUs to avoid packet loss. On x86_64, a single CPU can only handle 224 IRQs. When the number of housekeeping CPUs is not big enough to handle all IRQs, irqbalance will fail to bind them and the IRQs will overspill on isolated CPUs. Consequence =========== This isolation issue may cause transient packet loss if IRQs are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. Workaround (if any) =================== One or several of: 1) Reduce the number of provisioned VFs to avoid unused VFs to remain bound to their default linux driver. 2) Increase the number of housekeeping CPUs to handle all IRQs. 3) Force unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. 4) Disable multicast and broadcast traffic on unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. Result ====== Transient packet loss should not occur anymore. | Cause ===== Every provisioned VF is automatically bound to its default linux driver (different depending on the PF model). The default driver will create a corresponding network interface with its own RX queues for every VF. Each RX queue corresponds to a hardware IRQ. If the VFs remain "unused" (i.e. not assigned to a VM), they will remain bound to the default linux driver and their corresponding network device will require IRQs for every RX queue. Provisioning a large number of VFs during deployment can cause a large number of IRQs to be created. Every IRQ needs to be bound to a physical CPU. For NFV deployments, all IRQs should be bound only to housekeeping CPUs to avoid packet loss. On x86_64, a single CPU can only handle 224 IRQs. When the number of housekeeping CPUs is not big enough to handle all IRQs, irqbalance will fail to bind them and the IRQs will overspill on isolated CPUs. Consequence =========== This isolation issue may cause transient packet loss if IRQs are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. Workaround (if any) =================== One or several of: 1) Reduce the number of provisioned VFs to avoid unused VFs to remain bound to their default linux driver. 2) Increase the number of housekeeping CPUs to handle all IRQs. 3) Force unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. 4) Disable multicast and broadcast traffic on unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. Result ====== Transient packet loss should not occur anymore. |
| Robin Jarry | 2023-08-02 09:26:21 UTC | CC | smooney | |
| Flags | needinfo?(smooney) needinfo?(ralonso) | |||
| CC | ralonso | |||
| Robin Jarry | 2023-08-02 09:30:15 UTC | Assignee | rhosp-nfv-int | rjarry |
| Ricardo Alonso | 2023-08-02 09:33:55 UTC | Flags | needinfo?(ralonso) | needinfo- |
| Robin Jarry | 2023-08-02 10:04:47 UTC | Flags | needinfo?(ralonsoh) | |
| CC | ralonsoh | |||
| Robin Jarry | 2023-08-02 10:05:15 UTC | CC | ralonso | |
| Ian Frangs | 2023-08-03 15:46:23 UTC | Flags | needinfo?(rjarry) | |
| Robin Jarry | 2023-08-04 13:46:40 UTC | Flags | needinfo?(smooney) needinfo?(ralonsoh) needinfo?(rjarry) | |
| Greg Rakauskas | 2023-08-09 20:46:06 UTC | Flags | needinfo?(rjarry) | |
| CC | gregraka | |||
| Doc Text | Cause ===== Every provisioned VF is automatically bound to its default linux driver (different depending on the PF model). The default driver will create a corresponding network interface with its own RX queues for every VF. Each RX queue corresponds to a hardware IRQ. If the VFs remain "unused" (i.e. not assigned to a VM), they will remain bound to the default linux driver and their corresponding network device will require IRQs for every RX queue. Provisioning a large number of VFs during deployment can cause a large number of IRQs to be created. Every IRQ needs to be bound to a physical CPU. For NFV deployments, all IRQs should be bound only to housekeeping CPUs to avoid packet loss. On x86_64, a single CPU can only handle 224 IRQs. When the number of housekeeping CPUs is not big enough to handle all IRQs, irqbalance will fail to bind them and the IRQs will overspill on isolated CPUs. Consequence =========== This isolation issue may cause transient packet loss if IRQs are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. Workaround (if any) =================== One or several of: 1) Reduce the number of provisioned VFs to avoid unused VFs to remain bound to their default linux driver. 2) Increase the number of housekeeping CPUs to handle all IRQs. 3) Force unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. 4) Disable multicast and broadcast traffic on unused VF network interfaces DOWN to avoid IRQs from interrupting isolated CPUs. Result ====== Transient packet loss should not occur anymore. | In RHOSP 17.1 GA there is a known issue of transient packet loss where hardware interrupt requests (IRQs) are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. + This issue is the result of provisioning large numbers of VFs during deployment. VFs need IRQs, each of which must be bound to a physical CPU. When there are not enough housekeeping CPUs to handle the capacity of IRQs, `irqbalance` fails to bind all of them and the IRQs overspill on isolated CPUs. + Workaround: You can try one or more of these actions: * Reduce the number of provisioned VFs to avoid unused VFs remaining bound to their default Linux driver. * Increase the number of housekeeping CPUs to handle all IRQs. * Force unused VF network interfaces down to avoid IRQs from interrupting isolated CPUs. * Disable multicast and broadcast traffic on unused, down VF network interfaces to avoid IRQs from interrupting isolated CPUs. |
||
| Greg Rakauskas | 2023-08-09 20:51:47 UTC | Flags | needinfo- | |
| Mike Burns | 2023-08-11 13:59:33 UTC | Target Milestone | z1 | z2 |
| Jenny-Anne Lynch | 2023-08-17 09:42:29 UTC | CC | jelynch | |
| Doc Text | In RHOSP 17.1 GA there is a known issue of transient packet loss where hardware interrupt requests (IRQs) are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. + This issue is the result of provisioning large numbers of VFs during deployment. VFs need IRQs, each of which must be bound to a physical CPU. When there are not enough housekeeping CPUs to handle the capacity of IRQs, `irqbalance` fails to bind all of them and the IRQs overspill on isolated CPUs. + Workaround: You can try one or more of these actions: * Reduce the number of provisioned VFs to avoid unused VFs remaining bound to their default Linux driver. * Increase the number of housekeeping CPUs to handle all IRQs. * Force unused VF network interfaces down to avoid IRQs from interrupting isolated CPUs. * Disable multicast and broadcast traffic on unused, down VF network interfaces to avoid IRQs from interrupting isolated CPUs. | In RHOSP 17.1, there is a known issue of transient packet loss where hardware interrupt requests (IRQs) are causing non-voluntary context switches on OVS-DPDK PMD threads or in guests running DPDK applications. + This issue is the result of provisioning large numbers of VFs during deployment. VFs need IRQs, each of which must be bound to a physical CPU. When there are not enough housekeeping CPUs to handle the capacity of IRQs, `irqbalance` fails to bind all of them and the IRQs overspill on isolated CPUs. + Workaround: You can try one or more of these actions: * Reduce the number of provisioned VFs to avoid unused VFs remaining bound to their default Linux driver. * Increase the number of housekeeping CPUs to handle all IRQs. * Force unused VF network interfaces down to avoid IRQs from interrupting isolated CPUs. * Disable multicast and broadcast traffic on unused, down VF network interfaces to avoid IRQs from interrupting isolated CPUs. |
Back to bug 2219830