after communicating the Dumitru, and get the situation of this bug: PSI did a very complicated and high-load setup, then ovs saturate the cpu.and that is expected with ovn because of the different process between ML2/OVS to ML2/OVN.
and the solution is as described in comment 28.
the simpler reproducer is as described in comment 22, but it depends on the specific db file, and the db file is not applied for every release of ovn, can't get the good way to automate the case for the moment. set coverage-.
Looking at the perf traces nothing really stands out, it looks like an OVS system that is really busy processing upcalls/of updates (see some outputs below). The logs for this time period shows this: 2021-04-06T13:33:02.819Z|14137|poll_loop(revalidator42699)|INFO|Dropped 75 log messages in last 87 seconds (most recently, 84 seconds ago) due to excessive rate 2021-04-06T13:33:02.819Z|14138|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:02.824Z|14139|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:02.824Z|14140|poll_loop(revalidator42699)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:988 (67% CPU usage) 2021-04-06T13:33:02.905Z|14141|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:02.912Z|14142|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:02.912Z|14143|poll_loop(revalidator42699)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:988 (67% CPU usage) 2021-04-06T13:33:02.993Z|14144|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:03.001Z|14145|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:03.001Z|14146|poll_loop(revalidator42699)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:988 (67% CPU usage) 2021-04-06T13:33:03.077Z|14147|poll_loop(revalidator42699)|INFO|wakeup due to [POLLIN] on fd 21 (FIFO pipe:[3523694259]) at ../lib/ovs-thread.c:341 (67% CPU usage) 2021-04-06T13:33:43.790Z|330917|connmgr|INFO|br-int<->unix#624: 28968 flow_mods in the 10 s starting 47 s ago (24010 adds, 2740 deletes, 2218 modifications) 2021-04-06T13:34:43.790Z|330918|connmgr|INFO|br-int<->unix#624: 2526 flow_mods in the 27 s starting 54 s ago (1283 adds, 1243 deletes) Why are there 30K+ flow changes, this will definitely impact the performance? Looking at the coverage counters, it shows that for the last hour there are an average of ~750/s dpif flow updates, and the last-minute OpenFlow send ~1K/s messages (278/s for the last hour). Coverage counters: 5 seconds last minute last hour, dpif_flow_put 177.4/sec 276.383/sec 392.3108/sec total: 1628897666 dpif_flow_del 180.4/sec 231.450/sec 377.9153/sec total: 1620546339 dpif_execute 178.0/sec 219.600/sec 384.0822/sec total: 1566116832 Which result in a lot of slow patch processing: 5 seconds last minute last hour, xlate_actions 8806.6/sec 15325.233/sec 17096.9067/sec total: 70982257405 There are also a lot of openflow message: 5 seconds last minute last hour, ofproto_recv_openflow 131.8/sec 995.283/sec 278.8836/sec total: 43660153 So from the OVS side, I see nothing worrying for now, but I guess we need to find why there are so many OpenFlow updates? ========== perf dumps ========== $ grep "01:33:0" ovs_pidstat.txt | sort -b -k 10,10 -n -r 01:33:03 PM 995 2838 - 424.60 0.00 0.00 0.80 424.60 58 ovs-vswitchd 01:33:08 PM 995 2838 - 345.60 0.00 0.00 0.40 345.60 36 ovs-vswitchd 01:33:03 PM 995 - 345050 53.80 3.60 0.00 0.00 57.40 56 |__ovs-vswitchd 01:33:08 PM 995 - 345050 26.00 2.20 0.00 0.20 28.20 34 |__ovs-vswitchd 01:33:03 PM 995 - 345061 24.00 1.40 0.00 0.20 25.40 0 |__ovs-vswitchd 01:33:03 PM 995 - 345072 23.60 1.40 0.00 0.00 25.00 50 |__ovs-vswitchd 01:33:03 PM 995 - 2838 18.80 4.60 0.00 0.80 23.40 31 |__ovs-vswitchd 01:33:03 PM 995 - 345067 21.60 1.20 0.00 0.00 22.80 68 |__ovs-vswitchd 01:33:03 PM 995 - 345060 21.60 1.00 0.00 0.20 22.60 66 |__ovs-vswitchd 01:33:03 PM 995 - 345074 20.20 1.00 0.00 0.20 21.20 26 |__ovs-vswitchd 01:33:08 PM 995 - 345053 19.80 1.20 0.00 0.00 21.00 48 |__ovs-vswitchd 01:33:03 PM 995 - 345068 19.20 1.00 0.00 0.00 20.20 24 |__ovs-vswitchd 01:33:03 PM 995 - 345054 19.00 1.00 0.00 0.20 20.00 52 |__ovs-vswitchd 01:33:08 PM 995 - 345061 15.40 1.40 0.00 0.00 16.80 0 |__ovs-vswitchd 01:33:08 PM 995 - 345074 14.20 0.80 0.00 0.00 15.00 80 |__ovs-vswitchd 01:33:08 PM 995 - 2838 10.80 2.40 0.00 0.40 13.20 36 |__ovs-vswitchd 01:33:03 PM 995 - 345053 11.80 1.40 0.00 0.20 13.20 28 |__ovs-vswitchd 01:33:08 PM 995 - 345054 12.00 1.00 0.00 0.00 13.00 40 |__ovs-vswitchd 01:33:08 PM 995 - 345072 11.60 0.80 0.00 0.00 12.40 64 |__ovs-vswitchd 01:33:08 PM 995 - 345056 11.20 1.00 0.00 0.00 12.20 60 |__ovs-vswitchd 01:33:03 PM 995 - 345065 10.60 0.80 0.00 0.20 11.40 16 |__ovs-vswitchd 01:33:03 PM 995 - 345056 9.80 1.00 0.00 0.20 10.80 12 |__ovs-vswitchd 01:33:08 PM 995 - 345067 9.80 0.80 0.00 0.20 10.60 6 |__ovs-vswitchd 01:33:08 PM 995 - 345060 9.60 0.60 0.00 0.00 10.20 12 |__ovs-vswitchd $ grep "01:33:0" ovs_pidstat.txt | sort -b -k 10,10 -n -r | awk '{print $5}' | grep -o '[0-9]*' | head -10 | uniq 345050 345061 345072 2838 345067 345060 345074 345053 345068 $ show_tid 345050; show_tid 345061; show_tid 345072; show_tid 2838; show_tid 345067; show_tid 345060; show_tid 345074; show_tid 345053; show_tid 345068 # Overhead Command Shared Object Symbol # ........ ............ .................. ............................................... # 55.88% ovs-vswitchd ovs-vswitchd [.] classifier_lookup__ 31.55% ovs-vswitchd ovs-vswitchd [.] ccmap_find 4.12% ovs-vswitchd ovs-vswitchd [.] ofproto_rule_ref 1.63% ovs-vswitchd [kernel.kallsyms] [k] _raw_spin_lock_bh 1.33% ovs-vswitchd ovs-vswitchd [.] clone_xlate_actions 1.25% ovs-vswitchd ovs-vswitchd [.] mf_set_flow_value 1.23% ovs-vswitchd ovs-vswitchd [.] patch_port_output.isra.97 # Overhead Command Shared Object Symbol # ........ ............ .................. ......................................... # 30.03% ovs-vswitchd libpthread-2.28.so [.] __pthread_mutex_lock 22.67% ovs-vswitchd libc-2.28.so [.] realloc 17.78% ovs-vswitchd ovs-vswitchd [.] ovs_mutex_lock_at 3.76% ovs-vswitchd ovs-vswitchd [.] mbundle_lookup 3.67% ovs-vswitchd ovs-vswitchd [.] commit_odp_actions 3.56% ovs-vswitchd ovs-vswitchd [.] xlate_push_stats_entry 3.46% ovs-vswitchd ovs-vswitchd [.] xlate_cache_clear # Overhead Command Shared Object Symbol # ........ ............ .................. ............................................. # 16.88% ovs-vswitchd libc-2.28.so [.] __memset_avx2_unaligned_erms 16.71% ovs-vswitchd ovs-vswitchd [.] get_ofp_port 11.91% ovs-vswitchd ovs-vswitchd [.] compose_output_action__.constprop.103 11.51% ovs-vswitchd ovs-vswitchd [.] ofpbuf_put_zeros 8.05% ovs-vswitchd ovs-vswitchd [.] xlate_actions 7.73% ovs-vswitchd libc-2.28.so [.] unlink_chunk.isra.2 5.69% ovs-vswitchd ovs-vswitchd [.] netdev_ref head: cannot open 'xx2' for reading: No such file or directory # Overhead Command Shared Object Symbol # ........ ............ .................. ........................................ # 19.01% ovs-vswitchd [kernel.kallsyms] [k] ovs_nla_put_identifier 13.86% ovs-vswitchd [kernel.kallsyms] [k] masked_flow_lookup 13.67% ovs-vswitchd [kernel.kallsyms] [k] ovs_flow_tbl_dump_next 10.07% ovs-vswitchd [kernel.kallsyms] [k] __update_load_avg_se 4.53% ovs-vswitchd [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe 4.50% ovs-vswitchd ovs-vswitchd [.] eth_addr_is_reserved 3.26% ovs-vswitchd ovs-vswitchd [.] frozen_state_hash # Overhead Command Shared Object Symbol # ........ ............ ................. .................................. # 18.14% ovs-vswitchd [kernel.kallsyms] [k] perf_event_update_userpage 12.62% ovs-vswitchd [kernel.kallsyms] [k] run_local_timers 10.20% ovs-vswitchd ovs-vswitchd [.] tnl_find_exact 7.73% ovs-vswitchd [kernel.kallsyms] [k] perf_swevent_add 7.25% ovs-vswitchd ovs-vswitchd [.] ofpbuf_put_uninit 6.04% ovs-vswitchd ovs-vswitchd [.] finish_freezing__ 3.71% ovs-vswitchd [kernel.kallsyms] [k] tick_program_event # Overhead Command Shared Object Symbol # ........ ............ ................. ......................................... # 38.93% ovs-vswitchd [kernel.kallsyms] [k] __update_load_avg_cfs_rq 10.84% ovs-vswitchd [kernel.kallsyms] [k] available_idle_cpu 6.36% ovs-vswitchd ovs-vswitchd [.] ovs_mutex_trylock_at 4.11% ovs-vswitchd [kernel.kallsyms] [k] __local_bh_enable_ip 3.55% ovs-vswitchd [vdso] [.] __vdso_clock_gettime 3.31% ovs-vswitchd [kernel.kallsyms] [k] switch_fpu_return 2.96% ovs-vswitchd ovs-vswitchd [.] 0x000000000003f734 # Overhead Command Shared Object Symbol # ........ ............ ................. .............................................. # 73.47% ovs-vswitchd [kernel.kallsyms] [k] syscall_return_via_sysret 9.91% ovs-vswitchd [kernel.kallsyms] [k] __x64_sys_futex 2.34% ovs-vswitchd [kernel.kallsyms] [k] __accumulate_pelt_segments 2.24% ovs-vswitchd ovs-vswitchd [.] seq_read 1.86% ovs-vswitchd ovs-vswitchd [.] time_poll 1.72% ovs-vswitchd ovs-vswitchd [.] netdev_vport_inc_tx 1.42% ovs-vswitchd ovs-vswitchd [.] free_conjunctive_matches.constprop.18 # Overhead Command Shared Object Symbol # ........ ............ .................. ........................................ # 34.53% ovs-vswitchd libpthread-2.28.so [.] __pthread_mutex_trylock 19.86% ovs-vswitchd [kernel.kallsyms] [k] cpumask_next 13.16% ovs-vswitchd [kernel.kallsyms] [k] __calc_delta 7.54% ovs-vswitchd [kernel.kallsyms] [k] dequeue_task_fair 6.38% ovs-vswitchd ovs-vswitchd [.] 0x000000000003f164 2.65% ovs-vswitchd ovs-vswitchd [.] xvlan_input_translate 1.36% ovs-vswitchd [kernel.kallsyms] [k] ret_from_intr