Bug 2175041
| Summary: | Slowness due to high number nftables rules causes CNV to pile up cnv-bridge commands on VM shutdown | ||
|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | Networking | Assignee: | Edward Haas <edwardh> |
| Status: | CLOSED ERRATA | QA Contact: | Yossi Segev <ysegev> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.12.5 | CC: | edwardh, mduarted, phoracek, psutter, rgertzbe |
| Target Milestone: | --- | ||
| Target Release: | 4.13.1 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | cnv-containernetworking-plugins-rhel9 v4.13.1-2 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-06-20 13:41:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Germano Veit Michel
2023-03-03 01:12:58 UTC
It also causes further performance issues for any remaining workload.
SystemMemoryExceedsReservation is firing.
About 1h later my system CPU is at 90% sys and becoming unresponsive... will need to reboot.
Tasks: 420 total, 70 running, 349 sleeping, 0 stopped, 1 zombie
%Cpu(s): 9.1 us, 90.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.4 hi, 0.5 si, 0.0 st
MiB Mem : 63676.7 total, 44272.9 free, 10140.4 used, 9263.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 52758.4 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2676831 root 20 0 90548 45984 3752 R 14.6 0.1 1:56.92 nft
2688369 root 20 0 90548 46048 3812 R 14.3 0.1 1:17.33 nft
2640644 root 20 0 96752 46128 3796 R 14.0 0.1 4:49.54 nft
2635746 root 20 0 100876 46192 3800 R 12.0 0.1 5:47.65 nft
2629093 root 20 0 100984 46420 3908 R 11.3 0.1 7:03.59 nft
2642279 root 20 0 96752 46096 3764 R 11.0 0.1 4:47.25 nft
2691595 root 20 0 90548 45976 3740 R 11.0 0.1 1:10.51 nft
2701454 root 20 0 90548 45852 3620 R 9.6 0.1 0:45.91 nft
2632472 root 20 0 102328 47596 3908 R 9.0 0.1 6:17.97 nft
2673566 root 20 0 90548 45996 3764 R 8.3 0.1 2:05.88 nft
2662077 root 20 0 90548 46000 3764 R 8.0 0.1 2:47.36 nft
11778 nfsnobo+ 20 0 18.9g 3.5g 593068 S 7.6 5.7 416:48.34 prometheus
2670315 root 20 0 90548 45920 3620 R 6.3 0.1 2:16.28 nft
2719477 root 20 0 77788 45252 3572 R 6.3 0.1 0:06.07 nft
2630836 root 20 0 102328 47532 3840 R 6.0 0.1 6:42.57 nft
2652173 root 20 0 96752 46160 3836 R 6.0 0.1 3:40.10 nft
2660435 root 20 0 90548 46004 3772 R 5.6 0.1 2:50.65 nft
2618679 root 20 0 102572 47980 3880 R 5.3 0.1 11:31.10 nft
2665356 root 20 0 90548 46096 3860 R 5.3 0.1 2:30.95 nft
2712927 root 20 0 90548 46000 3764 R 5.3 0.1 0:18.51 nft
2616988 root 20 0 103296 48624 3800 R 5.0 0.1 12:27.95 nft
2625519 root 20 0 102244 47616 3840 R 5.0 0.1 8:13.98 nft
2650542 root 20 0 90548 46068 3836 R 5.0 0.1 3:46.58 nft
2694870 root 20 0 90548 46068 3836 R 5.0 0.1 1:00.83 nft
2707978 root 20 0 90532 46068 3836 R 5.0 0.1 0:27.24 nft
2620410 root 20 0 102304 47640 3808 R 4.7 0.1 10:31.64 nft
2622067 root 20 0 103632 49000 3836 R 4.7 0.1 9:35.03 nft
2637397 root 20 0 100876 46180 3796 R 4.7 0.1 5:28.79 nft
2647257 root 20 0 96752 46112 3780 R 4.7 0.1 4:01.83 nft
2648912 root 20 0 96752 46072 3752 R 4.7 0.1 4:08.45 nft
2657156 root 20 0 90548 46060 3808 R 4.7 0.1 3:15.75 nft
2658799 root 20 0 90548 45972 3740 R 4.7 0.1 3:06.88 nft
2668673 root 20 0 90548 46008 3772 R 4.7 0.1 2:22.91 nft
2686731 root 20 0 90548 46068 3836 R 4.7 0.1 1:19.20 nft
2689983 root 20 0 90524 46044 3812 R 4.7 0.1 1:13.95 nft
2704714 root 20 0 90548 46000 3764 R 4.7 0.1 0:35.78 nft
2709687 root 20 0 90548 45988 3756 R 4.7 0.1 0:26.04 nft
2711271 root 20 0 90548 46044 3812 R 4.7 0.1 0:22.96 nft
2623742 root 20 0 102388 47724 3808 R 4.3 0.1 8:56.21 nft
2634094 root 20 0 100876 46164 3776 R 4.3 0.1 5:57.05 nft
2667021 root 20 0 90548 45852 3620 R 4.3 0.1 2:23.67 nft
2671934 root 20 0 90548 46068 3836 R 4.3 0.1 2:14.16 nft
2685089 root 20 0 90548 45984 3752 R 4.3 0.1 1:25.33 nft
2706343 root 20 0 90548 45976 3740 R 4.3 0.1 0:32.82 nft
2627304 root 20 0 102684 47968 3812 R 4.0 0.1 7:25.48 nft
2653850 root 20 0 90548 46072 3836 R 4.0 0.1 3:26.21 nft
2663720 root 20 0 90548 46016 3784 R 4.0 0.1 2:42.30 nft
2680153 root 20 0 90548 45988 3752 R 4.0 0.1 1:36.95 nft
2681807 root 20 0 90548 46044 3812 R 4.0 0.1 1:35.82 nft
2683425 root 20 0 90548 46068 3836 R 4.0 0.1 1:33.73 nft
2693226 root 20 0 90548 46000 3768 R 4.0 0.1 1:04.78 nft
2696494 root 20 0 90548 46000 3764 R 4.0 0.1 0:56.00 nft
2699824 root 20 0 90548 46072 3836 R 4.0 0.1 0:48.42 nft
2714559 root 20 0 77260 44712 3564 R 4.0 0.1 0:15.21 nft
2639024 root 20 0 96752 46088 3764 R 3.7 0.1 4:57.81 nft
2643983 root 20 0 96752 46064 3748 R 3.7 0.1 4:34.95 nft
2645632 root 20 0 90548 46072 3836 R 3.7 0.1 4:21.11 nft
2675236 root 20 0 90548 46044 3808 R 3.7 0.1 1:57.68 nft
2678502 root 20 0 90548 46000 3768 R 3.7 0.1 1:49.10 nft
2698181 root 20 0 90548 46056 3824 R 3.7 0.1 0:52.71 nft
2703075 root 20 0 90548 46044 3808 R 3.7 0.1 0:38.60 nft
Apologies for the trouble. Edy, would you please explore the potential solutions you had in mind? At the moment, this is what we are planning: - Introduce a timeout to the `nft` calls, which should assure the CNI command fails if `nft` is not responsive. - Investigate the option of moving the nftables rules into the pod network namespace. I would update on the investigation results soon. Verified on
CNV 4.13.1
cnv-containernetworking-plugins-rhel9:v4.13.1-2
OS: Red Hat Enterprise Linux CoreOS 413.92.202305231734-0 (RHEL 9.2 based)
1. Create a linux bridge interface on a single node using this policy:
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: linux-bridge-ens11
spec:
desiredState:
interfaces:
- name: test-br
type: linux-bridge
state: up
ipv4:
dhcp: true
enabled: true
bridge:
options:
stp:
enabled: false
port:
- name: ens11
nodeSelector:
kubernetes.io/hostname: c01-n-ys-4131o-k59g2-worker-0-lc5wc
2. Create NetworkAttachmentDefinition that utilizes the bridge interface:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
annotations:
k8s.v1.cni.cncf.io/resourceName: bridge.network.kubevirt.io/test-br
name: test-br-nad
namespace: yoss-ns
spec:
config: '{"cniVersion": "0.3.1", "name": "test-br", "type":
"cnv-bridge", "bridge": "test-br", "macspoofchk":true,"ipam":{}}'
3. Create 14 similar VMs on the same node (the one where the bridge was created on section 1), with a secondary NIC backed by the NAD created in the previous section.
All VMs were created by applying the attached VM spec yaml, with the only difference of VM name in each spec.
$ oc apply -f vm1.yaml
virtualmachine.kubevirt.io/vm1 created
$ oc apply -f vm2.yaml
virtualmachine.kubevirt.io/vm2 created
...
4. Start all VMs
$ virtctl start vm1
VM vm14 was scheduled to start
$ virtctl start vm2
VM vm2 was scheduled to start
...
Note: After each VM start, I waited for the VM to be in running mode before running the next VM, for example:
$ virtctl start vm5
VM vm5 was scheduled to start
$
$ oc get vmi -w
NAME AGE PHASE IP NODENAME READY
vm1 8m57s Running 10.129.2.70 c01-n-ys-4131o-k59g2-worker-0-lc5wc True
vm2 8m44s Running 10.129.2.72 c01-n-ys-4131o-k59g2-worker-0-lc5wc True
vm3 8m31s Running 10.129.2.73 c01-n-ys-4131o-k59g2-worker-0-lc5wc True
vm4 8m18s Running 10.129.2.75 c01-n-ys-4131o-k59g2-worker-0-lc5wc True
vm5 2s Scheduling False
vm5 4s Scheduled c01-n-ys-4131o-k59g2-worker-0-lc5wc False
vm5 6s Scheduled c01-n-ys-4131o-k59g2-worker-0-lc5wc False
vm5 6s Running 10.129.2.76 c01-n-ys-4131o-k59g2-worker-0-lc5wc False
$
$ virtctl start vm6
VM vm6 was scheduled to start
5. On the node where all VMs run - run the script from the bug description in the background (. nft.sh &):
for a in {1..500}
do
nft add table ip table$a
for b in {1..500}
do
nft add chain ip table$a chain$b
done
done
6 After 10-15 minutes - I started shuttoing down all VMIs:
$ virtctl stop vm1
VM vm1 was scheduled to stop
$
$ virtctl stop vm2
VM vm2 was scheduled to stop
...
7. Every few minutes, for the next ~30 minutes, I checked top for:
a. nft instances
b. cnv-bridge instances
c. CPU usage
nft was seens every few socnds, in respect to the new nft trigger from the running script, but no "pile" os instances was obverved (I also verified it by checking for the nft in ps, and finding only one such process every few seconds).
cnv-bridge was not seen at all.
CPU usage remained solid, not increasing to the levels described in the BZ description (90%).
For example - looking at it ~25-30 after shutting down VMs (while the nft script still runs), the CPU usage is ~16%.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 4.13.1 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:3686 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |