Bug 2217871 - ovs self dump flow when send traffic with 10k flows
Summary: ovs self dump flow when send traffic with 10k flows
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch
Version: RHEL 8.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Eelco Chaudron
QA Contact: mhou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-27 10:29 UTC by mhou
Modified: 2023-07-13 10:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-13 10:30:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2983 0 None None None 2023-06-27 10:32:01 UTC

Description mhou 2023-06-27 10:29:06 UTC
Description of problem:
This issue only occured on kernel-rt, when pre-configure mac-table-size and set-flow-limit, send traffic will trigger dump action. 

Version-Release number of selected component (if applicable):
kernel version: 4.18.0-497.rt7.286.el8.x86_64
ovs version: openvswitch2.17-2.17.0-105.el8fdp.x86_64

How reproducible: 100%
# ethtool -i ens1f0
driver: mlx5_core
version: 4.18.0-497.rt7.286.el8.x86_64
firmware-version: 16.34.1002 (MT_0000000080)
expansion-rom-version: 
bus-info: 0000:17:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes


Steps to Reproduce:
0. enable realtime-virtual-host profile and keep cpus of core0(cpu0, cpu1) as housekeeping cpu.
# cat /etc/tuned/realtime-virtual-host-variables.conf 
isolated_cores=2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55
isolate_managed_irq=Y
# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
node 0 size: 31214 MB
node 0 free: 1255 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55
node 1 size: 31222 MB
node 1 free: 2237 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10

1. configure offload pvp test scenario
setup_ovs_offload.sh ovsbr0 0000:17:00.0 --numvfs=1 --vf-macs='00:de:ad:01:01:01' --spoofchk=off --trust=on \
--hw-offload=on --tc-policy=none --steering-mode='smfs'
setup_ovs_offload.sh ovsbr1 0000:17:00.1 --numvfs=1 --vf-macs='00:de:ad:02:02:02' --spoofchk=off --trust=on \
--hw-offload=on --tc-policy=none --steering-mode='smfs'
# ovs-vsctl show
d9d543af-7fc1-44f0-9c20-5a81e695c354
    Bridge ovsbr1
        Port ens1f1
            Interface ens1f1
        Port eth1
            Interface eth1
        Port ovsbr1
            Interface ovsbr1
                type: internal
    Bridge ovsbr0
        Port ens1f0
            Interface ens1f0
        Port eth0
            Interface eth0
        Port ovsbr0
            Interface ovsbr0
                type: internal
    ovs_version: "2.17.7"
current handler and revalidator as below 
# pidstat -t -p $(pidof ovs-vswitchd)
Linux 4.18.0-497.rt7.286.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	06/27/2023 	_x86_64_	(56 CPU)

06:35:56 AM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
06:35:56 AM   994     13802         -    0.34    0.45    0.00    0.00    0.79     0  ovs-vswitchd
06:35:56 AM   994         -     13802    0.00    0.01    0.00    0.00    0.01     0  |__ovs-vswitchd
06:35:56 AM   994         -     13804    0.32    0.41    0.00    0.37    0.73     0  |__handler3
06:35:56 AM   994         -     13805    0.00    0.00    0.00    0.00    0.00     1  |__handler2
06:35:56 AM   994         -     13806    0.00    0.00    0.00    0.00    0.00     0  |__handler1
06:35:56 AM   994         -     13807    0.02    0.03    0.00    0.05    0.05     1  |__revalidator4
06:35:56 AM   994         -     13808    0.00    0.00    0.00    0.00    0.00     1  |__urcu5

ovsdb-tool show-log -m
record 0: "Open_vSwitch" schema, version="8.3.0", cksum="3781850481 26690"

record 1: 2023-06-26 15:36:14.712 "ovs-vsctl (invoked by /bin/sh): ovs-vsctl --no-wait -- init -- set Open_vSwitch . db-version=8.3.0"
  table Open_vSwitch insert row a3d2ef9e:

record 2: 2023-06-26 15:36:14.896 "ovs-vsctl (invoked by /bin/sh): ovs-vsctl --no-wait set Open_vSwitch . ovs-version=2.17.7 "external-ids:system-id=\"fd78e6ac-7940-4e77-812e-c761c72ad319\"" "external-ids:rundir=\"/var/run/openvswitch\"" "system-type=\"rhel\"" "system-version=\"8.9\"""
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 3: 2023-06-26 15:36:14.906 "ovs-vsctl (invoked by /bin/sh): ovs-vsctl --no-wait add Open_vSwitch . external-ids hostname=dell-per750-03.rhts.eng.pek2.redhat.com"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 4: 2023-06-26 15:36:15.251
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 5: 2023-06-26 15:36:15.252
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 6: 2023-06-26 15:37:04.909 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl --may-exist add-br ovsbr0"
  table Interface insert row "ovsbr0" (da3692bf):
  table Port insert row "ovsbr0" (db051bf5):
  table Bridge insert row "ovsbr0" (197157d7):
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 7: 2023-06-26 15:37:04.950
  table Interface row "ovsbr0" (da3692bf) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 8: 2023-06-26 15:37:04.951
  table Bridge row "ovsbr0" (197157d7) diff:

record 9: 2023-06-26 15:37:05.014 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl add-port ovsbr0 ens1f0"
  table Port insert row "ens1f0" (b6d21b34):
  table Interface insert row "ens1f0" (6ba2a06a):
  table Bridge row "ovsbr0" (197157d7) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 10: 2023-06-26 15:37:05.025
  table Interface row "ens1f0" (6ba2a06a) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 11: 2023-06-26 15:37:05.281 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl add-port ovsbr0 eth0"
  table Port insert row "eth0" (70e89e5d):
  table Interface insert row "eth0" (b03fc2c1):
  table Bridge row "ovsbr0" (197157d7) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 12: 2023-06-26 15:37:05.287
  table Interface row "eth0" (b03fc2c1) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 13: 2023-06-26 15:37:05.307 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set Open_vSwitch . other_config:hw-offload=false"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 14: 2023-06-26 15:37:05.308
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 15: 2023-06-26 15:37:05.315 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set Open_vSwitch . other_config:tc-policy=none"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 16: 2023-06-26 15:37:05.316
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 17: 2023-06-26 15:37:05.323 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set Open_vSwitch . other_config:max-idle=180000"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 18: 2023-06-26 15:37:05.324
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 19: 2023-06-26 15:37:50.968 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl --may-exist add-br ovsbr1"
  table Interface insert row "ovsbr1" (2ff16c11):
  table Port insert row "ovsbr1" (3f2c03d6):
  table Bridge insert row "ovsbr1" (58b87ac8):
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 20: 2023-06-26 15:37:50.979
  table Interface row "ovsbr1" (2ff16c11) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 21: 2023-06-26 15:37:50.979
  table Bridge row "ovsbr1" (58b87ac8) diff:

record 22: 2023-06-26 15:37:51.043 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl add-port ovsbr1 ens1f1"
  table Interface insert row "ens1f1" (620277d0):
  table Port insert row "ens1f1" (74709db5):
  table Bridge row "ovsbr1" (58b87ac8) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 23: 2023-06-26 15:37:51.050
  table Interface row "ens1f1" (620277d0) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 24: 2023-06-26 15:37:51.302 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl add-port ovsbr1 eth1"
  table Port insert row "eth1" (56dcd8b9):
  table Interface insert row "eth1" (8f08ceec):
  table Bridge row "ovsbr1" (58b87ac8) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 25: 2023-06-26 15:37:51.308
  table Interface row "eth1" (8f08ceec) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 26: 2023-06-26 15:37:57.043 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set Open_vSwitch . other_config:max-idle=3600000"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 27: 2023-06-26 15:37:57.044
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 28: 2023-06-26 15:37:57.085 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set Open_vSwitch . other_config:max-revalidator=3600000"
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 29: 2023-06-26 15:37:57.086
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 30: 2023-06-26 15:37:57.127 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set bridge ovsbr0 other-config:mac-aging-time=3600"
  table Bridge row "ovsbr0" (197157d7) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 31: 2023-06-26 15:37:57.128
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 32: 2023-06-26 15:37:57.169 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set bridge ovsbr0 other-config:mac-table-size=48192"
  table Bridge row "ovsbr0" (197157d7) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 33: 2023-06-26 15:37:57.170
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 34: 2023-06-26 15:37:57.212 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set bridge ovsbr1 other-config:mac-aging-time=3600"
  table Bridge row "ovsbr1" (58b87ac8) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 35: 2023-06-26 15:37:57.212
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 36: 2023-06-26 15:37:57.253 "ovs-vsctl (invoked by /bin/bash): ovs-vsctl set bridge ovsbr1 other-config:mac-table-size=48192"
  table Bridge row "ovsbr1" (58b87ac8) diff:
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:

record 37: 2023-06-26 15:37:57.254
  table Open_vSwitch row a3d2ef9e (a3d2ef9e) diff:
2. adjust max-idle/ max-revalidator/  mac-aging-time/ mac-table-size
ovs-vsctl set Open_vSwitch . other_config:max-idle=3600000
ovs-vsctl set Open_vSwitch . other_config:max-revalidator=3600000
ovs-vsctl set bridge ovsbr0 other-config:mac-aging-time=3600
ovs-vsctl set bridge ovsbr0 other-config:mac-table-size=48192
ovs-vsctl set bridge ovsbr1 other-config:mac-aging-time=3600
ovs-vsctl set bridge ovsbr1 other-config:mac-table-size=48192
ovs-appctl upcall/set-flow-limit 88192
# ovs-appctl upcall/show
system@ovs-system:
  flows         : (current 4) (avg 12808) (max 39241) (limit 88192)
  offloaded flows : 4
  dump duration : 3ms
  ufid enabled : true

  4: (keys 7)

3. send warmup traffic from peer site(back-to-back connection)
./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=1 --max-loss-pct=0.002 --search-runtime=10 --validation-runtime=10 --rate-tolerance=10 --runtime-tolerance=10 --rate=25 --rate-unit=% --duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-trial --warmup-trial-runtime=10 --rate=0.1 --rate-unit=mpps --one-shot=1 --src-macs=00:00:00:00:00:01,00:00:00:00:00:02 --dst-macs=00:00:00:00:00:02,00:00:00:00:00:01 --use-device-stats --vlan-ids=11,11

4. check ovs bridge already learn mac address
# ovs-appctl dpctl/dump-flows
recirc_id(0),in_port(3),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:01),eth_type(0x8100),vlan(vid=11,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:1994698, bytes:17944303208, used:493.080s, actions:2
recirc_id(0),in_port(2),eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:02),eth_type(0x8100),vlan(vid=11,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:1999999, bytes:17991987314, used:493.080s, actions:3
recirc_id(0),in_port(6),eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:02),eth_type(0x8100),vlan(vid=11,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:1999205, bytes:17984783856, used:493.080s, actions:5
recirc_id(0),in_port(5),eth(src=00:00:00:00:00:02,dst=00:00:00:00:00:01),eth_type(0x8100),vlan(vid=11,pcp=0),encap(eth_type(0x0800),ipv4(frag=no)), packets:1998705, bytes:17980287656, used:493.080s, actions:6

5. send test traffic with init 6%
./binary-search.py --traffic-generator=trex-txrx --frame-size=9000 --num-flows=10000 --max-loss-pct=0.002 --search-runtime=10 --validation-runtime=10 --rate-tolerance=10 --runtime-tolerance=10 --duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-trial --warmup-trial-runtime=10 --rate=6 --rate-unit=% --one-shot=0 --src-macs=10:00:00:00:00:00,20:00:00:00:00:00 --dst-macs=00:00:00:00:00:02,00:00:00:00:00:01 --use-src-ip-flows=0 --use-dst-ip-flows=0 --use-src-mac-flows=1 --use-dst-mac-flows=0 --warmup-trial --warmup-trial-runtime=30 --sniff-runtime=10 --warmup-trial --warmup-trial-runtime=10 --send-teaching-measurement --send-teaching-warmup --teaching-warmup-packet-type=generic --max-retries=10 --teaching-warmup-packet-rate=10000 --teaching-measurement-packet-type=generic --use-device-stats

6. monitor ovs automatically dump flows
# ovs-appctl upcall/show
system@ovs-system:
  flows         : (current 4972) (avg 9647) (max 68854) (limit 4000)
  offloaded flows : 4972
  dump duration : 33ms
  ufid enabled : true

  4: (keys 4972)


Actual results:
1. test failed with flow limit re-set to 4000

Expected results:
1. test should be pass

Additional info:
full of ovs-vswitchd.log as below
2023-06-27T09:25:35.622Z|00029|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:52.638Z|00030|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:53.624Z|00031|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:54.147Z|00032|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:54.524Z|00033|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:55.034Z|00034|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:55.215Z|00035|poll_loop(handler3)|INFO|Dropped 60 log messages in last 1598 seconds (most recently, 1598 seconds ago) due to excessive rate
2023-06-27T09:45:55.216Z|00036|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.220Z|00037|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.224Z|00038|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.230Z|00039|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.234Z|00040|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.240Z|00041|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.244Z|00042|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.248Z|00043|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.265Z|00044|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.297Z|00045|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (86% CPU usage)
2023-06-27T09:45:55.662Z|00046|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:57.538Z|00047|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:45:59.394Z|00001|ovs_rcu(urcu5)|WARN|blocked 1001 ms waiting for revalidator4 to quiesce
2023-06-27T09:46:00.394Z|00002|ovs_rcu(urcu5)|WARN|blocked 2001 ms waiting for revalidator4 to quiesce
2023-06-27T09:46:01.669Z|00116|ovs_rcu|WARN|blocked 1001 ms waiting for handler3 to quiesce
2023-06-27T09:46:02.394Z|00003|ovs_rcu(urcu5)|WARN|blocked 4002 ms waiting for revalidator4 to quiesce
2023-06-27T09:46:02.669Z|00117|ovs_rcu|WARN|blocked 2001 ms waiting for handler3 to quiesce
2023-06-27T09:46:03.647Z|00048|timeval(handler3)|WARN|Unreasonably long 5250ms poll interval (0ms user, 3ms system)
2023-06-27T09:46:03.647Z|00049|timeval(handler3)|WARN|faults: 14 minor, 0 major
2023-06-27T09:46:03.647Z|00050|timeval(handler3)|WARN|context switches: 16 voluntary, 54 involuntary
2023-06-27T09:46:03.647Z|00051|coverage(handler3)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=3b0e3a51:
2023-06-27T09:46:03.647Z|00052|coverage(handler3)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0058/sec   total: 21
2023-06-27T09:46:03.647Z|00053|coverage(handler3)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:03.647Z|00054|coverage(handler3)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:03.647Z|00055|coverage(handler3)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:03.647Z|00056|coverage(handler3)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:03.647Z|00057|coverage(handler3)|INFO|rev_mac_learning           0.0/sec     0.000/sec        0.0342/sec   total: 125
2023-06-27T09:46:03.647Z|00058|coverage(handler3)|INFO|dumped_duplicate_flow      0.0/sec     0.000/sec        0.0019/sec   total: 7
2023-06-27T09:46:03.647Z|00059|coverage(handler3)|INFO|handler_duplicate_upcall 1075.0/sec   103.833/sec        3.1158/sec   total: 11667
2023-06-27T09:46:03.647Z|00060|coverage(handler3)|INFO|revalidate_missed_dp_flow   0.0/sec     0.000/sec        0.1072/sec   total: 386
2023-06-27T09:46:03.647Z|00061|coverage(handler3)|INFO|upcall_flow_limit_hit      0.0/sec     0.000/sec      158.8639/sec   total: 571910
2023-06-27T09:46:03.647Z|00062|coverage(handler3)|INFO|upcall_ukey_replace        1.0/sec     0.267/sec        0.2253/sec   total: 811
2023-06-27T09:46:03.647Z|00063|coverage(handler3)|INFO|xlate_actions            8274.6/sec   958.033/sec      249.6086/sec   total: 899615
2023-06-27T09:46:03.647Z|00064|coverage(handler3)|INFO|cmap_expand              180.8/sec    42.667/sec        1.4558/sec   total: 5241
2023-06-27T09:46:03.647Z|00065|coverage(handler3)|INFO|cmap_shrink                0.0/sec     0.000/sec        0.7383/sec   total: 2658
2023-06-27T09:46:03.647Z|00066|coverage(handler3)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:03.647Z|00067|coverage(handler3)|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0828/sec   total: 298
2023-06-27T09:46:03.647Z|00068|coverage(handler3)|INFO|dpif_flow_put            7199.6/sec   854.200/sec       35.8689/sec   total: 129702
2023-06-27T09:46:03.647Z|00069|coverage(handler3)|INFO|dpif_flow_del              0.0/sec     0.000/sec       11.8478/sec   total: 42652
2023-06-27T09:46:03.647Z|00070|coverage(handler3)|INFO|dpif_execute             8274.6/sec   958.033/sec      187.6797/sec   total: 676671
2023-06-27T09:46:03.647Z|00071|coverage(handler3)|INFO|flow_extract             8274.6/sec   958.033/sec      187.6792/sec   total: 676669
2023-06-27T09:46:03.647Z|00072|coverage(handler3)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.1056/sec   total: 380
2023-06-27T09:46:03.647Z|00073|coverage(handler3)|INFO|hmap_pathological          0.4/sec     0.033/sec        0.0167/sec   total: 60
2023-06-27T09:46:03.647Z|00074|coverage(handler3)|INFO|hmap_expand                4.2/sec     1.450/sec        2.3581/sec   total: 8541
2023-06-27T09:46:03.647Z|00075|coverage(handler3)|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0011/sec   total: 4
2023-06-27T09:46:03.647Z|00076|coverage(handler3)|INFO|mac_learning_learned     7195.2/sec   853.550/sec       34.9944/sec   total: 126554
2023-06-27T09:46:03.647Z|00077|coverage(handler3)|INFO|netdev_get_stats           1.2/sec     1.100/sec        0.7017/sec   total: 2532
2023-06-27T09:46:03.647Z|00078|coverage(handler3)|INFO|txn_unchanged              0.0/sec     0.167/sec        0.1192/sec   total: 429
2023-06-27T09:46:03.647Z|00079|coverage(handler3)|INFO|txn_incomplete             0.2/sec     0.017/sec        0.0125/sec   total: 46
2023-06-27T09:46:03.647Z|00080|coverage(handler3)|INFO|txn_success                0.0/sec     0.000/sec        0.0103/sec   total: 38
2023-06-27T09:46:03.647Z|00081|coverage(handler3)|INFO|poll_create_node         502.4/sec    61.717/sec       21.2772/sec   total: 76702
2023-06-27T09:46:03.647Z|00082|coverage(handler3)|INFO|poll_zero_timeout        129.8/sec    15.183/sec        3.1792/sec   total: 11462
2023-06-27T09:46:03.647Z|00083|coverage(handler3)|INFO|seq_change               308.2/sec    38.433/sec       26.0342/sec   total: 93746
2023-06-27T09:46:03.647Z|00084|coverage(handler3)|INFO|pstream_open               0.0/sec     0.000/sec        0.0014/sec   total: 5
2023-06-27T09:46:03.647Z|00085|coverage(handler3)|INFO|stream_open                0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:03.647Z|00086|coverage(handler3)|INFO|unixctl_received           0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:03.647Z|00087|coverage(handler3)|INFO|unixctl_replied            0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:03.647Z|00088|coverage(handler3)|INFO|util_xalloc              62474.2/sec  7427.300/sec     1018.3594/sec   total: 3673805
2023-06-27T09:46:03.647Z|00089|coverage(handler3)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0350/sec   total: 126
2023-06-27T09:46:03.647Z|00090|coverage(handler3)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:03.647Z|00091|coverage(handler3)|INFO|netlink_overflow           1.2/sec     0.133/sec        0.0053/sec   total: 19
2023-06-27T09:46:03.647Z|00092|coverage(handler3)|INFO|netlink_received         15500.2/sec  1816.600/sec      279.7611/sec   total: 1008748
2023-06-27T09:46:03.647Z|00093|coverage(handler3)|INFO|netlink_recv_jumbo         0.8/sec    15.267/sec        1.6794/sec   total: 6050
2023-06-27T09:46:03.647Z|00094|coverage(handler3)|INFO|netlink_sent             15492.2/sec  1815.750/sec      278.7436/sec   total: 1005085
2023-06-27T09:46:03.647Z|00095|coverage(handler3)|INFO|nln_changed                0.0/sec     0.000/sec        0.0083/sec   total: 30
2023-06-27T09:46:03.647Z|00096|coverage(handler3)|INFO|109 events never hit
2023-06-27T09:46:05.275Z|00097|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:06.330Z|00098|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:07.830Z|00335|ofproto_dpif_upcall(revalidator4)|INFO|Spent an unreasonably long 12236ms dumping flows
2023-06-27T09:46:07.830Z|00336|timeval(revalidator4)|WARN|Unreasonably long 12236ms poll interval (1119ms user, 1670ms system)
2023-06-27T09:46:07.830Z|00337|timeval(revalidator4)|WARN|faults: 981 minor, 0 major
2023-06-27T09:46:07.830Z|00338|timeval(revalidator4)|WARN|context switches: 1140 voluntary, 23446 involuntary
2023-06-27T09:46:07.830Z|00339|coverage(revalidator4)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=d57648a6:
2023-06-27T09:46:07.830Z|00340|coverage(revalidator4)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0058/sec   total: 21
2023-06-27T09:46:07.830Z|00341|coverage(revalidator4)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:07.830Z|00342|coverage(revalidator4)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:07.830Z|00343|coverage(revalidator4)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:07.831Z|00344|coverage(revalidator4)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:07.831Z|00345|coverage(revalidator4)|INFO|rev_mac_learning           0.4/sec     0.033/sec        0.0347/sec   total: 125
2023-06-27T09:46:07.831Z|00346|coverage(revalidator4)|INFO|dumped_duplicate_flow      0.0/sec     0.000/sec        0.0019/sec   total: 155
2023-06-27T09:46:07.831Z|00347|coverage(revalidator4)|INFO|handler_duplicate_upcall  90.0/sec   111.333/sec        3.2408/sec   total: 13202
2023-06-27T09:46:07.831Z|00348|coverage(revalidator4)|INFO|revalidate_missed_dp_flow   0.0/sec     0.000/sec        0.1072/sec   total: 2067
2023-06-27T09:46:07.831Z|00349|coverage(revalidator4)|INFO|upcall_flow_limit_hit      0.0/sec     0.000/sec      158.8639/sec   total: 571910
2023-06-27T09:46:07.831Z|00350|coverage(revalidator4)|INFO|upcall_ukey_replace        0.0/sec     0.267/sec        0.2253/sec   total: 811
2023-06-27T09:46:07.831Z|00351|coverage(revalidator4)|INFO|xlate_actions            204.8/sec   975.100/sec      249.8931/sec   total: 978953
2023-06-27T09:46:07.831Z|00352|coverage(revalidator4)|INFO|cmap_expand                0.0/sec    42.667/sec        1.4558/sec   total: 5383
2023-06-27T09:46:07.831Z|00353|coverage(revalidator4)|INFO|cmap_shrink                0.0/sec     0.000/sec        0.7383/sec   total: 2658
2023-06-27T09:46:07.831Z|00354|coverage(revalidator4)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:07.831Z|00355|coverage(revalidator4)|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0828/sec   total: 628
2023-06-27T09:46:07.831Z|00356|coverage(revalidator4)|INFO|dpif_flow_put            114.8/sec   863.767/sec       36.0283/sec   total: 167237
2023-06-27T09:46:07.831Z|00357|coverage(revalidator4)|INFO|dpif_flow_del              0.0/sec     0.000/sec       11.8478/sec   total: 42676
2023-06-27T09:46:07.831Z|00358|coverage(revalidator4)|INFO|dpif_execute             204.8/sec   975.100/sec      187.9642/sec   total: 692348
2023-06-27T09:46:07.831Z|00359|coverage(revalidator4)|INFO|flow_extract             204.8/sec   975.100/sec      187.9636/sec   total: 692346
2023-06-27T09:46:07.831Z|00360|coverage(revalidator4)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.1056/sec   total: 380
2023-06-27T09:46:07.831Z|00361|coverage(revalidator4)|INFO|hmap_pathological          0.0/sec     0.033/sec        0.0167/sec   total: 62
2023-06-27T09:46:07.831Z|00362|coverage(revalidator4)|INFO|hmap_expand               10.4/sec     2.217/sec        2.3725/sec   total: 8552
2023-06-27T09:46:07.831Z|00363|coverage(revalidator4)|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0011/sec   total: 4
2023-06-27T09:46:07.831Z|00364|coverage(revalidator4)|INFO|mac_learning_learned     114.8/sec   863.117/sec       35.1539/sec   total: 140696
2023-06-27T09:46:07.831Z|00365|coverage(revalidator4)|INFO|mac_learning_evicted       0.0/sec     0.000/sec        0.0000/sec   total: 4305
2023-06-27T09:46:07.831Z|00366|coverage(revalidator4)|INFO|netdev_get_stats           1.2/sec     1.100/sec        0.7033/sec   total: 2532
2023-06-27T09:46:07.831Z|00367|coverage(revalidator4)|INFO|txn_unchanged              0.0/sec     0.150/sec        0.1192/sec   total: 429
2023-06-27T09:46:07.831Z|00368|coverage(revalidator4)|INFO|txn_incomplete             0.2/sec     0.033/sec        0.0128/sec   total: 46
2023-06-27T09:46:07.832Z|00369|coverage(revalidator4)|INFO|txn_success                0.2/sec     0.017/sec        0.0106/sec   total: 39
2023-06-27T09:46:07.832Z|00370|coverage(revalidator4)|INFO|poll_create_node          20.8/sec    62.850/sec       21.3061/sec   total: 78112
2023-06-27T09:46:07.832Z|00371|coverage(revalidator4)|INFO|poll_zero_timeout          3.4/sec    15.467/sec        3.1839/sec   total: 11714
2023-06-27T09:46:07.832Z|00372|coverage(revalidator4)|INFO|seq_change                 4.6/sec    38.800/sec       26.0406/sec   total: 97597
2023-06-27T09:46:07.832Z|00373|coverage(revalidator4)|INFO|pstream_open               0.0/sec     0.000/sec        0.0014/sec   total: 5
2023-06-27T09:46:07.832Z|00374|coverage(revalidator4)|INFO|stream_open                0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:07.832Z|00375|coverage(revalidator4)|INFO|unixctl_received           0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:07.832Z|00376|coverage(revalidator4)|INFO|unixctl_replied            0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:07.832Z|00377|coverage(revalidator4)|INFO|util_xalloc              1542.2/sec  7551.050/sec     1020.5014/sec   total: 4142565
2023-06-27T09:46:07.832Z|00378|coverage(revalidator4)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0350/sec   total: 126
2023-06-27T09:46:07.832Z|00379|coverage(revalidator4)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:07.833Z|00380|coverage(revalidator4)|INFO|netlink_overflow           0.0/sec     0.133/sec        0.0053/sec   total: 21
2023-06-27T09:46:07.834Z|00381|coverage(revalidator4)|INFO|netlink_received         321.6/sec  1843.233/sec      280.2078/sec   total: 1133154
2023-06-27T09:46:07.834Z|00382|coverage(revalidator4)|INFO|netlink_recv_jumbo         0.8/sec    15.267/sec        1.6806/sec   total: 6055
2023-06-27T09:46:07.834Z|00383|coverage(revalidator4)|INFO|netlink_sent             321.6/sec  1842.383/sec      279.1903/sec   total: 1128952
2023-06-27T09:46:07.834Z|00384|coverage(revalidator4)|INFO|nln_changed                0.0/sec     0.000/sec        0.0083/sec   total: 30
2023-06-27T09:46:07.834Z|00385|coverage(revalidator4)|INFO|108 events never hit
2023-06-27T09:46:07.878Z|00386|ofproto_dpif_upcall(revalidator4)|WARN|Number of datapath flows (68854) twice as high as current dynamic flow limit (14698).  Starting to delete flows unconditionally as an emergency measure.
2023-06-27T09:46:08.010Z|00099|ofproto_dpif_upcall(handler3)|WARN|Dropped 570375 log messages in last 1624 seconds (most recently, 1611 seconds ago) due to excessive rate
2023-06-27T09:46:08.010Z|00100|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 7349 flows.
2023-06-27T09:46:08.011Z|00101|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 7349 flows.
2023-06-27T09:46:08.012Z|00102|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 7349 flows.
2023-06-27T09:46:08.013Z|00103|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 7349 flows.
2023-06-27T09:46:08.014Z|00104|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 7349 flows.
2023-06-27T09:46:08.203Z|00105|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:09.953Z|00106|timeval(handler3)|WARN|Unreasonably long 1731ms poll interval (0ms user, 0ms system)
2023-06-27T09:46:09.953Z|00107|timeval(handler3)|WARN|context switches: 0 voluntary, 18 involuntary
2023-06-27T09:46:09.954Z|00108|coverage(handler3)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=2bdec87d:
2023-06-27T09:46:09.954Z|00109|coverage(handler3)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0058/sec   total: 21
2023-06-27T09:46:09.954Z|00110|coverage(handler3)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:09.954Z|00111|coverage(handler3)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:09.954Z|00112|coverage(handler3)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:09.954Z|00113|coverage(handler3)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:09.954Z|00114|coverage(handler3)|INFO|rev_mac_learning           0.0/sec     0.033/sec        0.0347/sec   total: 125
2023-06-27T09:46:09.954Z|00115|coverage(handler3)|INFO|dumped_duplicate_flow     29.6/sec     2.467/sec        0.0431/sec   total: 155
2023-06-27T09:46:09.954Z|00116|coverage(handler3)|INFO|handler_duplicate_upcall 672.8/sec   167.400/sec        4.1753/sec   total: 15031
2023-06-27T09:46:09.954Z|00117|coverage(handler3)|INFO|revalidate_missed_dp_flow 336.2/sec    28.017/sec        0.5742/sec   total: 2067
2023-06-27T09:46:09.954Z|00118|coverage(handler3)|INFO|upcall_flow_limit_hit    492.4/sec    41.033/sec      159.5478/sec   total: 574372
2023-06-27T09:46:09.954Z|00119|coverage(handler3)|INFO|upcall_ukey_replace        0.0/sec     0.267/sec        0.2253/sec   total: 811
2023-06-27T09:46:09.954Z|00120|coverage(handler3)|INFO|xlate_actions            17310.0/sec  2417.600/sec      273.9347/sec   total: 986165
2023-06-27T09:46:09.954Z|00121|coverage(handler3)|INFO|cmap_expand               48.4/sec    46.700/sec        1.5231/sec   total: 5483
2023-06-27T09:46:09.954Z|00122|coverage(handler3)|INFO|cmap_shrink                0.0/sec     0.000/sec        0.7383/sec   total: 2658
2023-06-27T09:46:09.954Z|00123|coverage(handler3)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:09.954Z|00124|coverage(handler3)|INFO|dpif_flow_get             66.0/sec     5.500/sec        0.1744/sec   total: 628
2023-06-27T09:46:09.954Z|00125|coverage(handler3)|INFO|dpif_flow_put            8091.2/sec  1538.033/sec       47.2661/sec   total: 170158
2023-06-27T09:46:09.954Z|00126|coverage(handler3)|INFO|dpif_flow_del              4.8/sec     0.400/sec       11.8544/sec   total: 42676
2023-06-27T09:46:09.954Z|00127|coverage(handler3)|INFO|dpif_execute             4577.8/sec  1356.583/sec      194.3222/sec   total: 699560
2023-06-27T09:46:09.954Z|00128|coverage(handler3)|INFO|flow_extract             4577.8/sec  1356.583/sec      194.3217/sec   total: 699558
2023-06-27T09:46:09.954Z|00129|coverage(handler3)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.1056/sec   total: 380
2023-06-27T09:46:09.954Z|00130|coverage(handler3)|INFO|hmap_pathological          0.4/sec     0.067/sec        0.0172/sec   total: 62
2023-06-27T09:46:09.954Z|00131|coverage(handler3)|INFO|hmap_expand                2.2/sec     2.300/sec        2.3756/sec   total: 8552
2023-06-27T09:46:09.954Z|00132|coverage(handler3)|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0011/sec   total: 4
2023-06-27T09:46:09.954Z|00133|coverage(handler3)|INFO|mac_learning_learned     3412.6/sec  1147.500/sec       39.8936/sec   total: 143617
2023-06-27T09:46:09.954Z|00134|coverage(handler3)|INFO|mac_learning_evicted     1445.2/sec   120.433/sec        2.0072/sec   total: 7226
2023-06-27T09:46:09.954Z|00135|coverage(handler3)|INFO|netdev_get_stats           0.0/sec     1.000/sec        0.7033/sec   total: 2532
2023-06-27T09:46:09.954Z|00136|coverage(handler3)|INFO|txn_unchanged              0.0/sec     0.133/sec        0.1192/sec   total: 429
2023-06-27T09:46:09.954Z|00137|coverage(handler3)|INFO|txn_incomplete             0.0/sec     0.033/sec        0.0128/sec   total: 46
2023-06-27T09:46:09.954Z|00138|coverage(handler3)|INFO|txn_success                0.2/sec     0.033/sec        0.0108/sec   total: 39
2023-06-27T09:46:09.954Z|00139|coverage(handler3)|INFO|poll_create_node         342.0/sec    90.750/sec       21.7811/sec   total: 78412
2023-06-27T09:46:09.954Z|00140|coverage(handler3)|INFO|poll_zero_timeout         82.8/sec    22.367/sec        3.2989/sec   total: 11876
2023-06-27T09:46:09.954Z|00141|coverage(handler3)|INFO|seq_change               820.0/sec   107.117/sec       27.1794/sec   total: 97846
2023-06-27T09:46:09.954Z|00142|coverage(handler3)|INFO|pstream_open               0.0/sec     0.000/sec        0.0014/sec   total: 5
2023-06-27T09:46:09.954Z|00143|coverage(handler3)|INFO|stream_open                0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:09.954Z|00144|coverage(handler3)|INFO|unixctl_received           0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:09.954Z|00145|coverage(handler3)|INFO|unixctl_replied            0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:09.954Z|00146|coverage(handler3)|INFO|util_xalloc              101159.6/sec 15976.250/sec     1161.0008/sec   total: 4179603
2023-06-27T09:46:09.954Z|00147|coverage(handler3)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0350/sec   total: 126
2023-06-27T09:46:09.954Z|00148|coverage(handler3)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:09.954Z|00149|coverage(handler3)|INFO|netlink_overflow           0.6/sec     0.183/sec        0.0061/sec   total: 22
2023-06-27T09:46:09.954Z|00150|coverage(handler3)|INFO|netlink_received         26912.0/sec  4085.733/sec      317.5856/sec   total: 1143308
2023-06-27T09:46:09.954Z|00151|coverage(handler3)|INFO|netlink_recv_jumbo         1.0/sec    15.283/sec        1.6819/sec   total: 6055
2023-06-27T09:46:09.954Z|00152|coverage(handler3)|INFO|netlink_sent             26802.8/sec  4075.783/sec      316.4164/sec   total: 1139099
2023-06-27T09:46:09.954Z|00153|coverage(handler3)|INFO|nln_changed                0.0/sec     0.000/sec        0.0083/sec   total: 30
2023-06-27T09:46:09.954Z|00154|coverage(handler3)|INFO|108 events never hit
2023-06-27T09:46:14.654Z|00118|ovs_rcu|WARN|blocked 1000 ms waiting for revalidator4 to quiesce
2023-06-27T09:46:15.654Z|00119|ovs_rcu|WARN|blocked 2000 ms waiting for revalidator4 to quiesce
2023-06-27T09:46:15.954Z|00155|poll_loop(handler3)|INFO|Dropped 207 log messages in last 20 seconds (most recently, 17 seconds ago) due to excessive rate
2023-06-27T09:46:15.955Z|00156|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:46:15.955Z|00157|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:46:15.956Z|00158|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:46:16.057Z|00387|ofproto_dpif_upcall(revalidator4)|INFO|Spent an unreasonably long 8181ms dumping flows
2023-06-27T09:46:16.057Z|00388|timeval(revalidator4)|WARN|Unreasonably long 8223ms poll interval (915ms user, 1513ms system)
2023-06-27T09:46:16.057Z|00389|timeval(revalidator4)|WARN|faults: 241 minor, 0 major
2023-06-27T09:46:16.057Z|00390|timeval(revalidator4)|WARN|context switches: 426 voluntary, 59285 involuntary
2023-06-27T09:46:16.058Z|00391|coverage(revalidator4)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=e4598588:
2023-06-27T09:46:16.058Z|00392|coverage(revalidator4)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0058/sec   total: 21
2023-06-27T09:46:16.058Z|00393|coverage(revalidator4)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:16.058Z|00394|coverage(revalidator4)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:16.058Z|00395|coverage(revalidator4)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-06-27T09:46:16.058Z|00396|coverage(revalidator4)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:16.058Z|00397|coverage(revalidator4)|INFO|rev_mac_learning           0.4/sec     0.067/sec        0.0353/sec   total: 127
2023-06-27T09:46:16.058Z|00398|coverage(revalidator4)|INFO|dumped_duplicate_flow      0.0/sec     2.467/sec        0.0431/sec   total: 155
2023-06-27T09:46:16.058Z|00399|coverage(revalidator4)|INFO|handler_duplicate_upcall   0.0/sec   167.400/sec        4.1753/sec   total: 15031
2023-06-27T09:46:16.058Z|00400|coverage(revalidator4)|INFO|revalidate_missed_dp_flow   0.0/sec    28.017/sec        0.5742/sec   total: 2067
2023-06-27T09:46:16.058Z|00401|coverage(revalidator4)|INFO|upcall_flow_limit_hit    33709.2/sec  2850.133/sec      206.3661/sec   total: 904421
2023-06-27T09:46:16.058Z|00402|coverage(revalidator4)|INFO|upcall_flow_limit_kill     0.0/sec     0.000/sec        0.0000/sec   total: 1091
2023-06-27T09:46:16.058Z|00403|coverage(revalidator4)|INFO|upcall_ukey_replace        0.0/sec     0.267/sec        0.2253/sec   total: 811
2023-06-27T09:46:16.058Z|00404|coverage(revalidator4)|INFO|xlate_actions            33709.2/sec  5226.700/sec      320.7531/sec   total: 1335513
2023-06-27T09:46:16.058Z|00405|coverage(revalidator4)|INFO|cmap_expand                0.0/sec    46.700/sec        1.5231/sec   total: 5483
2023-06-27T09:46:16.058Z|00406|coverage(revalidator4)|INFO|cmap_shrink                0.0/sec     0.000/sec        0.7383/sec   total: 3355
2023-06-27T09:46:16.058Z|00407|coverage(revalidator4)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:16.058Z|00408|coverage(revalidator4)|INFO|dpif_flow_get              0.0/sec     5.500/sec        0.1744/sec   total: 628
2023-06-27T09:46:16.058Z|00409|coverage(revalidator4)|INFO|dpif_flow_put              0.0/sec  1538.033/sec       47.2661/sec   total: 170158
2023-06-27T09:46:16.058Z|00410|coverage(revalidator4)|INFO|dpif_flow_del              0.0/sec     0.400/sec       11.8544/sec   total: 98972
2023-06-27T09:46:16.058Z|00411|coverage(revalidator4)|INFO|dpif_execute             33709.2/sec  4165.683/sec      241.1406/sec   total: 1028231
2023-06-27T09:46:16.058Z|00412|coverage(revalidator4)|INFO|flow_extract             33709.2/sec  4165.683/sec      241.1400/sec   total: 1028229
2023-06-27T09:46:16.059Z|00413|coverage(revalidator4)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.1056/sec   total: 380
2023-06-27T09:46:16.059Z|00414|coverage(revalidator4)|INFO|hmap_pathological          0.0/sec     0.067/sec        0.0172/sec   total: 62
2023-06-27T09:46:16.059Z|00415|coverage(revalidator4)|INFO|hmap_expand               15.0/sec     3.450/sec        2.3964/sec   total: 8627
2023-06-27T09:46:16.059Z|00416|coverage(revalidator4)|INFO|hmap_shrink                0.0/sec     0.000/sec        0.0011/sec   total: 4
2023-06-27T09:46:16.059Z|00417|coverage(revalidator4)|INFO|mac_learning_learned      60.6/sec  1152.550/sec       39.9778/sec   total: 144023
2023-06-27T09:46:16.059Z|00418|coverage(revalidator4)|INFO|mac_learning_evicted      60.6/sec   125.483/sec        2.0914/sec   total: 7632
2023-06-27T09:46:16.059Z|00419|coverage(revalidator4)|INFO|netdev_get_stats           2.4/sec     1.100/sec        0.7067/sec   total: 2544
2023-06-27T09:46:16.059Z|00420|coverage(revalidator4)|INFO|txn_unchanged              0.0/sec     0.117/sec        0.1192/sec   total: 429
2023-06-27T09:46:16.059Z|00421|coverage(revalidator4)|INFO|txn_incomplete             0.4/sec     0.067/sec        0.0133/sec   total: 48
2023-06-27T09:46:16.059Z|00422|coverage(revalidator4)|INFO|txn_success                0.2/sec     0.050/sec        0.0111/sec   total: 40
2023-06-27T09:46:16.059Z|00423|coverage(revalidator4)|INFO|poll_create_node         1057.4/sec   178.267/sec       23.2497/sec   total: 88995
2023-06-27T09:46:16.059Z|00424|coverage(revalidator4)|INFO|poll_zero_timeout        527.6/sec    66.333/sec        4.0317/sec   total: 17018
2023-06-27T09:46:16.059Z|00425|coverage(revalidator4)|INFO|seq_change               529.0/sec   151.183/sec       27.9142/sec   total: 105443
2023-06-27T09:46:16.059Z|00426|coverage(revalidator4)|INFO|pstream_open               0.0/sec     0.000/sec        0.0014/sec   total: 5
2023-06-27T09:46:16.059Z|00427|coverage(revalidator4)|INFO|stream_open                0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-06-27T09:46:16.059Z|00428|coverage(revalidator4)|INFO|unixctl_received           0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:16.059Z|00429|coverage(revalidator4)|INFO|unixctl_replied            0.0/sec     0.000/sec        0.0103/sec   total: 37
2023-06-27T09:46:16.059Z|00430|coverage(revalidator4)|INFO|util_xalloc              103784.6/sec 24620.200/sec     1305.1461/sec   total: 5418626
2023-06-27T09:46:16.059Z|00431|coverage(revalidator4)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0350/sec   total: 126
2023-06-27T09:46:16.059Z|00432|coverage(revalidator4)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-06-27T09:46:16.059Z|00433|coverage(revalidator4)|INFO|netlink_overflow           4.4/sec     0.550/sec        0.0122/sec   total: 63
2023-06-27T09:46:16.059Z|00434|coverage(revalidator4)|INFO|netlink_received         33731.8/sec  6896.550/sec      364.4353/sec   total: 1585212
2023-06-27T09:46:16.059Z|00435|coverage(revalidator4)|INFO|netlink_recv_jumbo         1.6/sec    15.350/sec        1.6842/sec   total: 6064
2023-06-27T09:46:16.059Z|00436|coverage(revalidator4)|INFO|netlink_sent             33725.6/sec  6886.083/sec      363.2575/sec   total: 1580548
2023-06-27T09:46:16.059Z|00437|coverage(revalidator4)|INFO|nln_changed                0.0/sec     0.000/sec        0.0083/sec   total: 30
2023-06-27T09:46:16.060Z|00438|coverage(revalidator4)|INFO|107 events never hit
2023-06-27T09:46:19.127Z|00159|dpif_netlink(handler3)|WARN|Dropped 46 log messages in last 9 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:46:19.127Z|00160|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:21.955Z|00161|poll_loop(handler3)|INFO|Dropped 5876 log messages in last 6 seconds (most recently, 3 seconds ago) due to excessive rate
2023-06-27T09:46:21.955Z|00162|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:46:25.216Z|00163|poll_loop(handler3)|INFO|Dropped 4365 log messages in last 4 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:46:25.216Z|00164|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (76% CPU usage)
2023-06-27T09:46:31.071Z|00165|dpif_netlink(handler3)|WARN|Dropped 98 log messages in last 12 seconds (most recently, 1 seconds ago) due to excessive rate
2023-06-27T09:46:31.071Z|00166|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:31.215Z|00167|poll_loop(handler3)|INFO|Dropped 20068 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:46:31.215Z|00168|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (53% CPU usage)
2023-06-27T09:46:37.216Z|00169|poll_loop(handler3)|INFO|Dropped 8052 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:46:37.216Z|00170|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (76% CPU usage)
2023-06-27T09:46:43.684Z|00171|dpif_netlink(handler3)|WARN|Dropped 99 log messages in last 12 seconds (most recently, 2 seconds ago) due to excessive rate
2023-06-27T09:46:43.684Z|00172|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:46.153Z|00173|poll_loop(handler3)|INFO|Dropped 20353 log messages in last 9 seconds (most recently, 4 seconds ago) due to excessive rate
2023-06-27T09:46:46.154Z|00174|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (61% CPU usage)
2023-06-27T09:46:49.216Z|00175|poll_loop(handler3)|INFO|Dropped 3997 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:46:49.219Z|00176|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (74% CPU usage)
2023-06-27T09:46:55.558Z|00177|dpif_netlink(handler3)|WARN|Dropped 85 log messages in last 12 seconds (most recently, 2 seconds ago) due to excessive rate
2023-06-27T09:46:55.558Z|00178|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:46:58.524Z|00179|poll_loop(handler3)|INFO|Dropped 2181 log messages in last 9 seconds (most recently, 6 seconds ago) due to excessive rate
2023-06-27T09:46:58.525Z|00180|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:47:01.219Z|00181|poll_loop(handler3)|INFO|Dropped 3472 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:01.219Z|00182|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:47:07.216Z|00183|poll_loop(handler3)|INFO|Dropped 13128 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:07.216Z|00184|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (74% CPU usage)
2023-06-27T09:47:07.935Z|00185|dpif_netlink(handler3)|WARN|Dropped 98 log messages in last 12 seconds (most recently, 2 seconds ago) due to excessive rate
2023-06-27T09:47:07.935Z|00186|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:47:08.011Z|00187|ofproto_dpif_upcall(handler3)|WARN|Dropped 3925670 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:08.011Z|00188|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.
2023-06-27T09:47:13.216Z|00189|poll_loop(handler3)|INFO|Dropped 9026 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:13.216Z|00190|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (65% CPU usage)
2023-06-27T09:47:19.215Z|00191|poll_loop(handler3)|INFO|Dropped 9501 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:19.216Z|00192|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (75% CPU usage)
2023-06-27T09:47:20.360Z|00193|dpif_netlink(handler3)|WARN|Dropped 73 log messages in last 12 seconds (most recently, 3 seconds ago) due to excessive rate
2023-06-27T09:47:20.360Z|00194|dpif_netlink(handler3)|WARN|system@ovs-system: lost packet on handler 0
2023-06-27T09:47:25.216Z|00195|poll_loop(handler3)|INFO|Dropped 10557 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:25.216Z|00196|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (92% CPU usage)
2023-06-27T09:47:31.239Z|00197|poll_loop(handler3)|INFO|Dropped 8927 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2023-06-27T09:47:31.239Z|00198|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (99% CPU usage)
2023-06-27T09:47:37.216Z|00199|poll_loop(handler3)|INFO|Dropped 9746 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:37.216Z|00200|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:47:43.604Z|00201|poll_loop(handler3)|INFO|Dropped 6880 log messages in last 6 seconds (most recently, 1 seconds ago) due to excessive rate
2023-06-27T09:47:43.604Z|00202|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (71% CPU usage)
2023-06-27T09:47:49.216Z|00203|poll_loop(handler3)|INFO|Dropped 11062 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:47:49.216Z|00204|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (70% CPU usage)
2023-06-27T09:47:55.962Z|00205|poll_loop(handler3)|INFO|Dropped 7519 log messages in last 7 seconds (most recently, 1 seconds ago) due to excessive rate
2023-06-27T09:47:55.962Z|00206|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (72% CPU usage)
2023-06-27T09:48:01.216Z|00207|poll_loop(handler3)|INFO|Dropped 11401 log messages in last 5 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:01.217Z|00208|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:48:07.216Z|00209|poll_loop(handler3)|INFO|Dropped 8484 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:07.216Z|00210|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:48:08.315Z|00211|ofproto_dpif_upcall(handler3)|WARN|Dropped 4300776 log messages in last 60 seconds (most recently, 1 seconds ago) due to excessive rate
2023-06-27T09:48:08.315Z|00212|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.
2023-06-27T09:48:13.216Z|00213|poll_loop(handler3)|INFO|Dropped 10109 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:13.216Z|00214|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:48:19.216Z|00215|poll_loop(handler3)|INFO|Dropped 8278 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:19.216Z|00216|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:48:25.216Z|00217|poll_loop(handler3)|INFO|Dropped 8677 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:25.216Z|00218|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:48:31.216Z|00219|poll_loop(handler3)|INFO|Dropped 8537 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:31.216Z|00220|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (72% CPU usage)
2023-06-27T09:48:37.363Z|00221|poll_loop(handler3)|INFO|Dropped 10234 log messages in last 6 seconds (most recently, 3 seconds ago) due to excessive rate
2023-06-27T09:48:37.363Z|00222|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:48:43.216Z|00223|poll_loop(handler3)|INFO|Dropped 7198 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:43.218Z|00224|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (71% CPU usage)
2023-06-27T09:48:52.362Z|00225|poll_loop(handler3)|INFO|Dropped 10608 log messages in last 9 seconds (most recently, 6 seconds ago) due to excessive rate
2023-06-27T09:48:52.363Z|00226|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:48:55.215Z|00227|poll_loop(handler3)|INFO|Dropped 29364 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:48:55.216Z|00228|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (54% CPU usage)
2023-06-27T09:49:04.362Z|00229|poll_loop(handler3)|INFO|Dropped 22998 log messages in last 9 seconds (most recently, 6 seconds ago) due to excessive rate
2023-06-27T09:49:04.363Z|00230|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:07.215Z|00231|poll_loop(handler3)|INFO|Dropped 29388 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:07.216Z|00232|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:08.009Z|00233|ofproto_dpif_upcall(handler3)|WARN|Dropped 3202483 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:08.009Z|00234|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.
2023-06-27T09:49:16.362Z|00235|poll_loop(handler3)|INFO|Dropped 22186 log messages in last 9 seconds (most recently, 6 seconds ago) due to excessive rate
2023-06-27T09:49:16.363Z|00236|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:19.215Z|00237|poll_loop(handler3)|INFO|Dropped 28729 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:19.216Z|00238|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:28.503Z|00239|poll_loop(handler3)|INFO|Dropped 23634 log messages in last 9 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:49:28.503Z|00240|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:31.215Z|00241|poll_loop(handler3)|INFO|Dropped 30119 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:31.215Z|00242|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (54% CPU usage)
2023-06-27T09:49:40.869Z|00243|poll_loop(handler3)|INFO|Dropped 25783 log messages in last 9 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:49:40.869Z|00244|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:43.215Z|00245|poll_loop(handler3)|INFO|Dropped 23382 log messages in last 3 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:43.216Z|00246|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:49:53.210Z|00247|poll_loop(handler3)|INFO|Dropped 32457 log messages in last 10 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:49:53.211Z|00248|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:49:55.215Z|00249|poll_loop(handler3)|INFO|Dropped 20361 log messages in last 2 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:49:55.215Z|00250|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:05.558Z|00251|poll_loop(handler3)|INFO|Dropped 35900 log messages in last 10 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:50:05.558Z|00252|poll_loop(handler3)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (55% CPU usage)
2023-06-27T09:50:07.215Z|00253|poll_loop(handler3)|INFO|Dropped 17728 log messages in last 2 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:07.215Z|00254|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:08.009Z|00255|ofproto_dpif_upcall(handler3)|WARN|Dropped 2167622 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:08.009Z|00256|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.
2023-06-27T09:50:17.914Z|00257|poll_loop(handler3)|INFO|Dropped 39355 log messages in last 10 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:50:17.914Z|00258|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:19.216Z|00259|poll_loop(handler3)|INFO|Dropped 13703 log messages in last 2 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:19.216Z|00260|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:30.325Z|00261|poll_loop(handler3)|INFO|Dropped 32619 log messages in last 11 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:50:30.326Z|00262|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (53% CPU usage)
2023-06-27T09:50:31.215Z|00263|poll_loop(handler3)|INFO|Dropped 9403 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:31.215Z|00264|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (53% CPU usage)
2023-06-27T09:50:42.692Z|00265|poll_loop(handler3)|INFO|Dropped 46965 log messages in last 11 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:50:42.693Z|00266|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:43.215Z|00267|poll_loop(handler3)|INFO|Dropped 6036 log messages in last 1 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:43.215Z|00268|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (55% CPU usage)
2023-06-27T09:50:55.028Z|00269|poll_loop(handler3)|INFO|Dropped 50795 log messages in last 12 seconds (most recently, 7 seconds ago) due to excessive rate
2023-06-27T09:50:55.028Z|00270|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:50:55.215Z|00271|poll_loop(handler3)|INFO|Dropped 2143 log messages in last 0 seconds (most recently, 0 seconds ago) due to excessive rate
2023-06-27T09:50:55.215Z|00272|poll_loop(handler3)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (54% CPU usage)
2023-06-27T09:56:44.990Z|00273|ofproto_dpif_upcall(handler3)|WARN|Dropped 1913123 log messages in last 397 seconds (most recently, 345 seconds ago) due to excessive rate
2023-06-27T09:56:44.990Z|00274|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.
2023-06-27T09:56:44.990Z|00275|ofproto_dpif_upcall(handler3)|WARN|upcall: datapath reached the dynamic limit of 4000 flows.

Comment 3 Eelco Chaudron 2023-07-07 15:03:59 UTC
So when looking at the system behavior during the times OVS reports long poll times, it looks like the kernel is oversubscribed. There is more work thant the CPU threads can handle.

The system only has two threads (one on each NUMA) assigned to the kernel and threads running. All the other threads are isolated with tuned's realtime-virtual-host profile.

Not sure why this is configured, so when testing I added more threads to the kernel. I removed threads 48 till 55 from the isolated list, and assigned them to the kernel.

After a reboot, I no longer see any problems with the test cases, all revalidator runs are within 500ms. However with default parameters this will result in 3 revalidator threads. To make the test more queal, giving it 1 revalidator, it also passed. This is done with 'ovs-vsctl set Open_vSwitch . other-config:n-revalidator-threads=1' after the script reconfigures ovs.

So the conclusion is that we need to increase the number of threads available for the kernel and OVS. I think even MicroShift has 4 threads for kerne and OVS.

Here are some of the debug outputs showing OVS fighting for resources (here a revalidator run takes 1 second, and OVS threads are waiting to be put on the CPU for more than 1.4second in a 1 second window, THREAD_READY_STATISTICS. There might be other (kernel)threads also fighting for the CPU time):

# Start sampling (trigger@76250485999291) @2023-07-07T09:01:29.411088 (13:01:29 UTC)
# Stop sampling (trigger@76251507372624) @2023-07-07T09:01:30.432529 (13:01:30 UTC)
# Triggered sample dump, stop-start delta 1,021,373,333 ns @2023-07-07T09:01:30.432619 (13:01:30 UTC)
TID        THREAD           <RESOURCE SPECIFIC>
---------- ---------------- ----------------------------------------------------------------------------
   4160013 <unknown:4157952/4160013> [SYSCALL STATISTICS]
                            NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                            madvise                  28           1             3,937             3,937
                            futex                   202           1             1,382             1,382
                            TOTAL( - poll):                       2             5,319

                            [THREAD RUN STATISTICS]
                            SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                     2            56,270            24,714            31,556

                            [THREAD READY STATISTICS]
                            SCHED_CNT           TOTAL ns            MAX ns
                                     2           118,885           116,490

   4157954 handler1         [SYSCALL STATISTICS]
                            NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                            sendmsg                  46        2955       643,299,950       134,692,681
                            futex                   202         515       292,299,879       157,601,468
                            recvmsg                  47        5771        14,259,408         3,503,784
                            mprotect                 10        1338         5,278,404            50,088
                            write                     1          60         1,157,917            60,486
                            poll                      7          88            55,768             1,282
                            getrusage                98          44            31,256             1,197
                            TOTAL( - poll):                   10683       956,326,814

                            [THREAD RUN STATISTICS]
                            SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                  3419       289,917,455             2,791         9,979,606

                            [THREAD READY STATISTICS]
                            SCHED_CNT           TOTAL ns            MAX ns
                                  3418       668,495,620       157,596,608

   4157952 ovs-vswitchd     [SYSCALL STATISTICS]
                            NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                            poll                      7         416       712,171,369       239,808,423
                            futex                   202         805       102,034,906        16,712,043
                            sendto                   44           3        13,529,652        13,024,428
                            getdents64              217          50         4,794,491         4,187,097
                            read                      0         417         3,869,498         3,486,836
                            accept                   43         126         1,403,232           819,416
                            getrusage                98         417           321,120            32,248
                            write                     1          26           303,700            22,986
                            sendmsg                  46          10           248,124            44,550
                            openat                  257          25           214,149            12,859
                            recvmsg                  47         212           204,377             4,867
                            recvfrom                 45          29            40,492             4,703
                            fstat                     5          25            31,554             1,495
                            close                     3          26            24,193             1,635
                            fcntl                    72           2             2,177             1,536
                            TOTAL( - poll):                    2173       127,021,665

                            [THREAD RUN STATISTICS]
                            SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                  1276       137,654,258             2,793        10,514,708

                            [THREAD READY STATISTICS]
                            SCHED_CNT           TOTAL ns            MAX ns
                                  1276       106,112,824        16,695,005

   4157957 revalidator5     [SYSCALL STATISTICS]
                            NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                            write                     1        1072       257,549,046        11,645,365
                            recvmsg                  47         240       147,566,491        11,370,868
                            futex                   202        1261       108,237,237        13,309,978
                            sendto                   44           9         3,500,185         1,789,136
                            mprotect                 10         128           567,083            23,619
                            sendmsg                  46           9            86,170            11,205
                            TOTAL( - poll):                    2719       517,506,212

                            [THREAD RUN STATISTICS]
                            SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                  3376       290,156,705             3,088         5,042,013

                            [THREAD READY STATISTICS]
                            SCHED_CNT           TOTAL ns            MAX ns
                                  3377       730,563,978       166,652,188

   4157958 urcu4            [SYSCALL STATISTICS]
                            NAME                 NUMBER       COUNT          TOTAL ns            MAX ns
                            poll                      7         807       943,998,370       179,496,065
                            futex                   202        1512        70,932,980         7,966,602
                            read                      0         768           911,434             2,954
                            getrusage                98         768           641,360            30,354
                            mprotect                 10           6            47,966            10,180
                            write                     1           3             7,226             2,543
                            TOTAL( - poll):                    3057        72,540,966

                            [THREAD RUN STATISTICS]
                            SCHED_CNT           TOTAL ns            MIN ns            MAX ns
                                  2284        12,374,419             2,208            19,872

                            [THREAD READY STATISTICS]
                            SCHED_CNT           TOTAL ns            MAX ns
                                  2284         4,317,868           188,755

Comment 4 mhou 2023-07-09 10:29:54 UTC
Hello Eelco

So many thanks for your debugging. I need to highlight why cpu isolation as seen. 

> The system only has two threads (one on each NUMA) assigned to the kernel and threads running. All the other threads are isolated with tuned's realtime-virtual-host profile.
From realtime-virtual-host profile, the default isolation strategy is reserve 1 core per socket for housekeeping, isolate the rest. 

# cat /etc/tuned/realtime-virtual-host-variables.conf

#
# Variable settings below override the definitions from the
# /etc/tuned/realtime-variables.conf file.
#
# Examples:
# isolated_cores=2,4-7
# isolated_cores=2-23
#
# Reserve 1 core per socket for housekeeping, isolate the rest.
isolated_cores=${f:calc_isolated_cores:1}

> So the conclusion is that we need to increase the number of threads available for the kernel and OVS. I think even MicroShift has 4 threads for kerne and OVS.

If I'm misunderstanding anything, please let me know. Current openvswitch Hardware offload need more cpu resource to meet more 10k flows test scenario. Because ovs needs to offload hardware for 10k flows (or higher flows). The handler threads and revalidator threads responsible for offload are now running in the housekeeping cpu.

From kernel-rt, I didn't recommend ovs threads(handler + revalidator) mix up with kernel thread. In the CPU isolation strategy of kernel rt, the CPU of housekeeping is only used as the basic CPU required for the kernel to run. 

Can the handler threads or revalidator threads of openvswitch have parameter configuration so that it can run on an/multiple isolated cpu/s? 

Because I know that openvswitch can adjust the number of threads through n-revalidator-threads and n-handler-threads. 

If ovs threads have other reasons to be placed in housekeeping, I will change the default isolation strategy.

Comment 5 Eelco Chaudron 2023-07-10 08:28:19 UTC
(In reply to mhou from comment #4)

> > The system only has two threads (one on each NUMA) assigned to the kernel and threads running. All the other threads are isolated with tuned's realtime-virtual-host profile.
> From realtime-virtual-host profile, the default isolation strategy is
> reserve 1 core per socket for housekeeping, isolate the rest.

I guess OVS seems to need more during the flow setup due to the 40K upcalls that happen quickly. This results in the revalidator scaling down the DP flows, resulting in flows hitting the upcalls.

> > So the conclusion is that we need to increase the number of threads available for the kernel and OVS. I think even MicroShift has 4 threads for kerne and OVS.
> 
> If I'm misunderstanding anything, please let me know. Current openvswitch
> Hardware offload need more cpu resource to meet more 10k flows test
> scenario. Because ovs needs to offload hardware for 10k flows (or higher
> flows). The handler threads and revalidator threads responsible for offload
> are now running in the housekeeping CPU.

OVS does not run on isolated CPUs only CPU available general purpose processing. As you quickly sent upcall for 40K flows with this test it overloads the system.

> From kernel-rt, I didn't recommend ovs threads(handler + revalidator) mix up
> with kernel thread. In the CPU isolation strategy of kernel rt, the CPU of
> housekeeping is only used as the basic CPU required for the kernel to run. 
> 
> Can the handler threads or revalidator threads of openvswitch have parameter
> configuration so that it can run on an/multiple isolated cpu/s? 

There is not manual distribution configuration for these threads, the kernel scheduler is responsible for this.
 
> Because I know that openvswitch can adjust the number of threads through
> n-revalidator-threads and n-handler-threads. 
> 
> If ovs threads have other reasons to be placed in housekeeping, I will
> change the default isolation strategy.

OVS depends on the kernel scheduler to be put on a CPU, so when CPUs are isolated the kernel will not schedule tasks on these threads. I guess you could override this with taskset.

Comment 6 mhou 2023-07-11 02:54:16 UTC
Test on openvswitch2.17-2.17.0-105.el8fdp, and the analysis as blow.

If just keep numa0 cpus except core0 as isolation cpus, any other cpu of numa1 and core0 of numa0 as housekeeping. The Housekeeping cpu have 29. 

# cat /proc/cmdline 
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-4.18.0-499.rt7.288.el8.x86_64 root=/dev/mapper/rhel_dell--per750--03-root ro ksdevice=bootif pci=realloc resume=/dev/mapper/rhel_dell--per750--03-swap rd.lvm.lv=rhel_dell-per750-03/root rd.lvm.lv=rhel_dell-per750-03/swap console=ttyS0,115200n81 skew_tick=1 isolcpus=managed_irq,domain,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 intel_pstate=disable nosoftlockup tsc=reliable nohz=on nohz_full=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 rcu_nocbs=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 irqaffinity=0,1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 crashkernel=1G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G rcutree.kthread_prio=51 default_hugepagesz=1G hugepagesz=1G hugepages=48 intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable

# numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54
node 0 size: 31257 MB
node 0 free: 951 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55
node 1 size: 31178 MB
node 1 free: 2283 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

The handler threads and revalidator threads of OVS will use 39(31 handlers and 8 revalidators).

8 revalidators calculate by this rule(The default is the number of cpu cores divided by four plus one. 29/4 + 1 ). From default setting of handler, The default value is the number of online CPU cores minus the number of revalidators(if online cpu contain isolation cpu, the total of cpu is 56, so the handler is 56-8=48. If online cpu is housekeeping cpu, the housekeeping cpu is 29).  But why handler will use 31?

But this allocation will cause the handler thread and the revalidator thread to occupy the same cpu. It will also trigger self dump as below:

2023-07-10T15:11:46.699Z|00025|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-10T15:11:50.420Z|00064|ovs_rcu|WARN|blocked 1000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:50.460Z|00001|ovs_rcu(urcu5)|WARN|blocked 1000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:51.420Z|00065|ovs_rcu|WARN|blocked 2000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:51.460Z|00002|ovs_rcu(urcu5)|WARN|blocked 2000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:53.419Z|00066|ovs_rcu|WARN|blocked 4000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:53.460Z|00003|ovs_rcu(urcu5)|WARN|blocked 4000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:57.420Z|00067|ovs_rcu|WARN|blocked 8000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:57.459Z|00004|ovs_rcu(urcu5)|WARN|blocked 8000 ms waiting for revalidator40 to quiesce
2023-07-10T15:11:58.172Z|00004|timeval(revalidator36)|WARN|Unreasonably long 8872ms poll interval (4ms user, 46ms system)
2023-07-10T15:11:58.172Z|00005|timeval(revalidator36)|WARN|context switches: 2052 voluntary, 13 involuntary
2023-07-10T15:11:58.173Z|00050|timeval(revalidator39)|WARN|Unreasonably long 8873ms poll interval (7ms user, 42ms system)
2023-07-10T15:11:58.173Z|00051|timeval(revalidator39)|WARN|context switches: 2056 voluntary, 5 involuntary
2023-07-10T15:11:58.181Z|00004|timeval(revalidator40)|WARN|Unreasonably long 8881ms poll interval (17ms user, 41ms system)
2023-07-10T15:11:58.181Z|00005|timeval(revalidator40)|WARN|context switches: 2078 voluntary, 10 involuntary
2023-07-10T15:11:58.182Z|00050|ofproto_dpif_upcall(revalidator33)|INFO|Spent an unreasonably long 8883ms dumping flows


If I still keep 29 cpu as housekeeping cpu, and configure n-revalidator-threads=6. and other_config:n-handler-threads=24. OVS still configure handler to 31.
2023-07-11T02:07:13.277Z|00049|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 31, setting n-revalidator-threads to 6
2023-07-11T02:07:13.277Z|00050|ofproto_dpif_upcall|INFO|Starting 37 threads
...
2023-07-11T02:13:10.278Z|00066|ovs_rcu|WARN|blocked 1000 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:10.280Z|00002|ovs_rcu(urcu5)|WARN|blocked 1000 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:11.279Z|00067|ovs_rcu|WARN|blocked 2001 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:11.282Z|00003|ovs_rcu(urcu5)|WARN|blocked 2001 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:13.278Z|00068|ovs_rcu|WARN|blocked 4000 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:13.281Z|00004|ovs_rcu(urcu5)|WARN|blocked 4000 ms waiting for revalidator33 to quiesce
2023-07-11T02:13:16.805Z|00005|timeval(revalidator34)|WARN|Unreasonably long 7577ms poll interval (18ms user, 37ms system)
2023-07-11T02:13:16.805Z|00006|timeval(revalidator34)|WARN|context switches: 392 voluntary, 7 involuntary
2023-07-11T02:13:16.805Z|00050|timeval(revalidator38)|WARN|Unreasonably long 7577ms poll interval (33ms user, 22ms system)
2023-07-11T02:13:16.805Z|00051|timeval(revalidator38)|WARN|context switches: 370 voluntary, 9 involuntary
2023-07-11T02:13:16.805Z|00004|timeval(revalidator35)|WARN|Unreasonably long 7577ms poll interval (21ms user, 34ms system)
2023-07-11T02:13:16.805Z|00005|timeval(revalidator35)|WARN|context switches: 399 voluntary, 3 involuntary
2023-07-11T02:13:16.805Z|00049|timeval(revalidator33)|WARN|Unreasonably long 7579ms poll interval (27ms user, 27ms system)
2023-07-11T02:13:16.805Z|00050|timeval(revalidator33)|WARN|context switches: 407 voluntary, 7 involuntary
2023-07-11T02:13:16.805Z|00004|timeval(revalidator36)|WARN|Unreasonably long 7577ms poll interval (31ms user, 24ms system)
2023-07-11T02:13:16.805Z|00005|timeval(revalidator36)|WARN|context switches: 392 voluntary, 8 involuntary
2023-07-11T02:13:16.810Z|00004|timeval(revalidator37)|WARN|Unreasonably long 7582ms poll interval (37ms user, 35ms system)
2023-07-11T02:13:16.810Z|00005|timeval(revalidator37)|WARN|context switches: 425 voluntary, 5 involuntary
2023-07-11T02:13:16.810Z|00051|ofproto_dpif_upcall(revalidator33)|INFO|Spent an unreasonably long 7584ms dumping flows
2023-07-11T02:13:16.811Z|00020|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 11870 flows.
2023-07-11T02:13:16.811Z|00021|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 11870 flows.
2023-07-11T02:13:16.811Z|00022|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 11870 flows.
2023-07-11T02:13:16.811Z|00023|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 11870 flows.
2023-07-11T02:13:16.811Z|00024|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 11870 flows.
2023-07-11T02:13:16.813Z|00052|ofproto_dpif_upcall(revalidator33)|WARN|Number of datapath flows (36560) twice as high as current dynamic flow limit (23740).  Starting to delete flows unconditionally as an emergency measure.
2023-07-11T02:13:16.938Z|00025|dpif_netlink(handler1)|WARN|Dropped 4 log messages in last 9 seconds (most recently, 1 seconds ago) due to excessive rate
2023-07-11T02:13:16.938Z|00026|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0


Then I reallocated housekeeping cpu to 4. the cpu0,1,2,3 as housekeeping cpu. Try to assign n-revalidator-threads=1 and n-handler-threads=4. But still no luck.

# pidstat -t -p $(pidof ovs-vswitchd)
Linux 4.18.0-499.rt7.288.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	07/10/2023 	_x86_64_	(56 CPU)

10:49:41 PM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
10:49:41 PM   994      3939         -    6.54    9.84    0.00    0.17   16.38     3  ovs-vswitchd
10:49:41 PM   994         -      3939    0.05    0.07    0.00    0.17    0.12     3  |__ovs-vswitchd
10:49:41 PM   994         -      3944    0.08    0.08    0.00    0.41    0.16     1  |__urcu4
10:49:41 PM   994         -      4766    3.30    4.64    0.00    1.11    7.94     2  |__handler13
10:49:41 PM   994         -      4767    0.00    0.00    0.00    0.00    0.00     2  |__handler14
10:49:41 PM   994         -      4768    0.00    0.00    0.00    0.00    0.00     3  |__handler18
10:49:41 PM   994         -      4769    0.00    0.00    0.00    0.00    0.00     2  |__handler15
10:49:41 PM   994         -      4770    0.00    0.00    0.00    0.00    0.00     0  |__handler16
10:49:41 PM   994         -      4771    3.10    5.03    0.00    0.49    8.13     3  |__revalidator17

2023-07-11T02:43:16.832Z|00064|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 5, setting n-revalidator-threads to 1
2023-07-11T02:43:16.833Z|00065|ofproto_dpif_upcall|INFO|Starting 6 threads
.....
2023-07-11T02:49:15.036Z|00256|ofproto_dpif_upcall(revalidator17)|INFO|Spent an unreasonably long 3473ms dumping flows
2023-07-11T02:49:15.036Z|00257|timeval(revalidator17)|WARN|Unreasonably long 3473ms poll interval (765ms user, 2333ms system)
2023-07-11T02:49:15.036Z|00258|timeval(revalidator17)|WARN|context switches: 844 voluntary, 2481 involuntary
2023-07-11T02:49:15.036Z|00259|coverage(revalidator17)|INFO|Dropped 12 log messages in last 58 seconds (most recently, 4 seconds ago) due to excessive rate
2023-07-11T02:49:15.037Z|00260|coverage(revalidator17)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=ff3c5a2e:
2023-07-11T02:49:15.037Z|00261|coverage(revalidator17)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0064/sec   total: 23
2023-07-11T02:49:15.037Z|00262|coverage(revalidator17)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-07-11T02:49:15.037Z|00263|coverage(revalidator17)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-07-11T02:49:15.037Z|00264|coverage(revalidator17)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0006/sec   total: 2
2023-07-11T02:49:15.037Z|00265|coverage(revalidator17)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-07-11T02:49:15.037Z|00266|coverage(revalidator17)|INFO|rev_mac_learning           6.0/sec    24.450/sec        0.6047/sec   total: 2178
2023-07-11T02:49:15.037Z|00267|coverage(revalidator17)|INFO|dumped_duplicate_flow     29.4/sec     7.950/sec        0.3242/sec   total: 1169
2023-07-11T02:49:15.037Z|00268|coverage(revalidator17)|INFO|handler_duplicate_upcall   0.0/sec   246.567/sec       14.6642/sec   total: 52791
2023-07-11T02:49:15.037Z|00269|coverage(revalidator17)|INFO|revalidate_missed_dp_flow  28.4/sec    99.050/sec        4.8175/sec   total: 17345
2023-07-11T02:49:15.037Z|00270|coverage(revalidator17)|INFO|upcall_flow_limit_hit    17643.2/sec 36686.800/sec     1366.5842/sec   total: 4975609
2023-07-11T02:49:15.037Z|00271|coverage(revalidator17)|INFO|upcall_flow_limit_kill     0.0/sec    23.733/sec        0.4039/sec   total: 1454
2023-07-11T02:49:15.037Z|00272|coverage(revalidator17)|INFO|upcall_ukey_replace      1139.2/sec   194.733/sec        4.0975/sec   total: 14751
2023-07-11T02:49:15.037Z|00273|coverage(revalidator17)|INFO|xlate_actions            77030.4/sec 87645.817/sec     2787.3842/sec   total: 10123329
2023-07-11T02:49:15.037Z|00274|coverage(revalidator17)|INFO|cmap_expand               68.4/sec    39.833/sec        1.5558/sec   total: 5601
2023-07-11T02:49:15.037Z|00275|coverage(revalidator17)|INFO|cmap_shrink                0.0/sec    42.383/sec        0.8811/sec   total: 3172
2023-07-11T02:49:15.037Z|00276|coverage(revalidator17)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-07-11T02:49:15.037Z|00277|coverage(revalidator17)|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0067/sec   total: 24
2023-07-11T02:49:15.037Z|00278|coverage(revalidator17)|INFO|dpif_flow_put            7192.6/sec  6503.000/sec      176.2228/sec   total: 661069
2023-07-11T02:49:15.037Z|00279|coverage(revalidator17)|INFO|dpif_flow_del            3754.6/sec  4278.800/sec      107.4194/sec   total: 387803
2023-07-11T02:49:15.037Z|00280|coverage(revalidator17)|INFO|dpif_execute             23175.0/sec 40768.233/sec     1491.0572/sec   total: 5423024
2023-07-11T02:49:15.037Z|00281|coverage(revalidator17)|INFO|flow_extract             23175.0/sec 40768.233/sec     1491.0567/sec   total: 5423022
2023-07-11T02:49:15.037Z|00282|coverage(revalidator17)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0908/sec   total: 327
2023-07-11T02:49:15.037Z|00283|coverage(revalidator17)|INFO|hmap_pathological          0.0/sec     0.000/sec        0.0094/sec   total: 34
2023-07-11T02:49:15.037Z|00284|coverage(revalidator17)|INFO|hmap_expand               90.4/sec   313.167/sec        9.6281/sec   total: 34692
2023-07-11T02:49:15.037Z|00285|coverage(revalidator17)|INFO|mac_learning_learned     2488.8/sec  6419.517/sec      224.7356/sec   total: 873384
2023-07-11T02:49:15.037Z|00286|coverage(revalidator17)|INFO|mac_learning_expired       0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-07-11T02:49:15.037Z|00287|coverage(revalidator17)|INFO|mac_learning_evicted     2488.8/sec  6419.517/sec      209.0731/sec   total: 816999
2023-07-11T02:49:15.037Z|00288|coverage(revalidator17)|INFO|netdev_get_stats           1.2/sec     1.200/sec        0.1683/sec   total: 612
2023-07-11T02:49:15.037Z|00289|coverage(revalidator17)|INFO|txn_unchanged              0.0/sec     0.000/sec        0.0297/sec   total: 107
2023-07-11T02:49:15.037Z|00290|coverage(revalidator17)|INFO|txn_incomplete             0.2/sec     0.233/sec        0.0153/sec   total: 56
2023-07-11T02:49:15.037Z|00291|coverage(revalidator17)|INFO|txn_success                0.2/sec     0.200/sec        0.0111/sec   total: 40
2023-07-11T02:49:15.037Z|00292|coverage(revalidator17)|INFO|poll_create_node         5854.0/sec  6309.383/sec      173.3708/sec   total: 666896
2023-07-11T02:49:15.037Z|00293|coverage(revalidator17)|INFO|poll_zero_timeout        906.0/sec  1270.933/sec       39.9678/sec   total: 155473
2023-07-11T02:49:15.037Z|00294|coverage(revalidator17)|INFO|seq_change               4398.6/sec  4638.800/sec      227.4747/sec   total: 838586
2023-07-11T02:49:15.037Z|00295|coverage(revalidator17)|INFO|pstream_open               0.0/sec     0.000/sec        0.0014/sec   total: 5
2023-07-11T02:49:15.037Z|00296|coverage(revalidator17)|INFO|stream_open                0.0/sec     0.000/sec        0.0003/sec   total: 1
2023-07-11T02:49:15.037Z|00297|coverage(revalidator17)|INFO|unixctl_received           0.0/sec     0.050/sec        0.0047/sec   total: 17
2023-07-11T02:49:15.037Z|00298|coverage(revalidator17)|INFO|unixctl_replied            0.0/sec     0.050/sec        0.0047/sec   total: 17
2023-07-11T02:49:15.037Z|00299|coverage(revalidator17)|INFO|util_xalloc              147822.0/sec 208055.067/sec     6811.8719/sec   total: 25118870
2023-07-11T02:49:15.037Z|00300|coverage(revalidator17)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0383/sec   total: 138
2023-07-11T02:49:15.037Z|00301|coverage(revalidator17)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0028/sec   total: 10
2023-07-11T02:49:15.037Z|00302|coverage(revalidator17)|INFO|netlink_overflow           1.0/sec     2.167/sec        0.1000/sec   total: 360
2023-07-11T02:49:15.037Z|00303|coverage(revalidator17)|INFO|netlink_received         42786.2/sec 63483.933/sec     2074.7633/sec   total: 7633708
2023-07-11T02:49:15.037Z|00304|coverage(revalidator17)|INFO|netlink_recv_jumbo         3.0/sec     3.950/sec        0.2350/sec   total: 851
2023-07-11T02:49:15.037Z|00305|coverage(revalidator17)|INFO|netlink_sent             42308.2/sec 63052.800/sec     2062.8081/sec   total: 7590265
2023-07-11T02:49:15.037Z|00306|coverage(revalidator17)|INFO|nln_changed                0.0/sec     0.000/sec        0.0083/sec   total: 30
2023-07-11T02:49:15.037Z|00307|coverage(revalidator17)|INFO|107 events never hit
2023-07-11T02:49:16.439Z|00308|timeval(revalidator17)|WARN|Unreasonably long 1403ms poll interval (270ms user, 967ms system)
2023-07-11T02:49:16.439Z|00309|timeval(revalidator17)|WARN|context switches: 846 voluntary, 939 involuntary
2023-07-11T02:49:17.253Z|00081|dpif_netlink(handler13)|WARN|Dropped 7 log messages in last 11 seconds (most recently, 6 seconds ago) due to excessive rate
2023-07-11T02:49:17.253Z|00082|dpif_netlink(handler13)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T02:49:18.814Z|00310|poll_loop(revalidator17)|INFO|Dropped 7848 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-07-11T02:49:18.814Z|00311|poll_loop(revalidator17)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:1044 (91% CPU usage)
2023-07-11T02:49:20.673Z|00312|timeval(revalidator17)|WARN|Unreasonably long 1117ms poll interval (268ms user, 750ms system)
2023-07-11T02:49:20.673Z|00313|timeval(revalidator17)|WARN|context switches: 297 voluntary, 757 involuntary
2023-07-11T02:49:21.706Z|00314|timeval(revalidator17)|WARN|Unreasonably long 1033ms poll interval (252ms user, 652ms system)
2023-07-11T02:49:21.706Z|00315|timeval(revalidator17)|WARN|context switches: 276 voluntary, 741 involuntary
2023-07-11T02:49:24.315Z|00316|poll_loop(revalidator17)|INFO|Dropped 16686 log messages in last 5 seconds (most recently, 0 seconds ago) due to excessive rate
2023-07-11T02:49:24.315Z|00317|poll_loop(revalidator17)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:1044 (91% CPU usage)
2023-07-11T02:49:28.496Z|00318|ofproto_dpif_upcall(revalidator17)|INFO|Spent an unreasonably long 3323ms dumping flows
2023-07-11T02:49:28.496Z|00319|timeval(revalidator17)|WARN|Unreasonably long 3323ms poll interval (671ms user, 2281ms system)
2023-07-11T02:49:28.496Z|00320|timeval(revalidator17)|WARN|context switches: 770 voluntary, 2059 involuntary
2023-07-11T02:49:30.202Z|00321|timeval(revalidator17)|WARN|Unreasonably long 1706ms poll interval (365ms user, 1093ms system)
2023-07-11T02:49:30.202Z|00322|timeval(revalidator17)|WARN|context switches: 539 voluntary, 1178 involuntary
2023-07-11T02:49:30.296Z|00083|poll_loop(handler13)|INFO|Dropped 33895 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-07-11T02:49:30.296Z|00084|poll_loop(handler13)|INFO|wakeup due to [POLLIN] on fd 27 (NETLINK_GENERIC<->NETLINK_GENERIC) at ../lib/dpif-netlink.c:3195 (56% CPU usage)
2023-07-11T02:49:31.728Z|00085|dpif_netlink(handler13)|WARN|Dropped 4 log messages in last 13 seconds (most recently, 6 seconds ago) due to excessive rate
2023-07-11T02:49:31.728Z|00086|dpif_netlink(handler13)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T02:49:36.297Z|00087|poll_loop(handler13)|INFO|Dropped 13725 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
2023-07-11T02:49:36.297Z|00088|poll_loop(handler13)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:816 (87% CPU usage)

Comment 7 mhou 2023-07-11 06:53:15 UTC
From openvswitch3.1-3.1.0-39.el8fdp, I can also see more handler on housekeeping cpu.

Current isolated cpu is 0,1,2,3. and I set n-revalidator-threads to 1. Here is ovs other_config

# ovs-vsctl list Open_vSwitch .
_uuid               : 3136b9a9-8ba9-4e41-9100-fa4357d1f619
bridges             : [736b9ace-b662-4ed1-92b3-db3e9ec534c8, 884bc551-07b3-4505-bf53-16c5c2b6ae1a]
cur_cfg             : 40
datapath_types      : [netdev, system]
datapaths           : {}
db_version          : "8.3.1"
dpdk_initialized    : false
dpdk_version        : "DPDK 22.11.1"
external_ids        : {hostname=dell-per750-03.rhts.eng.pek2.redhat.com, rundir="/var/run/openvswitch", system-id="fd5a0088-c85a-4746-9940-5c29a820020b"}
iface_types         : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options     : []
next_cfg            : 40
other_config        : {hw-offload="true", max-idle="3600000", max-revalidator="3600000", n-revalidator-threads="1", tc-policy=none}
ovs_version         : "3.1.2"
ssl                 : []
statistics          : {}
system_type         : rhel
system_version      : "8.9"


I can see ovs start 5 handler. almost handler host on cpu1.
2023-07-11T06:48:15.005Z|00049|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 5, setting n-revalidator-threads to 1
2023-07-11T06:48:15.005Z|00050|ofproto_dpif_upcall|INFO|Starting 6 threads

# pidstat -t -p $(pidof ovs-vswitchd)
Linux 4.18.0-499.rt7.288.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	07/11/2023 	_x86_64_	(56 CPU)

02:48:31 AM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
02:48:31 AM   994     10030         -    0.00    0.00    0.00    0.00    0.00     2  ovs-vswitchd
02:48:31 AM   994         -     10030    0.00    0.00    0.00    0.00    0.00     2  |__ovs-vswitchd
02:48:31 AM   994         -     10032    0.00    0.00    0.00    0.00    0.00     1  |__handler1
02:48:31 AM   994         -     10033    0.00    0.00    0.00    0.00    0.00     1  |__handler2
02:48:31 AM   994         -     10034    0.00    0.00    0.00    0.00    0.00     1  |__handler3
02:48:31 AM   994         -     10035    0.00    0.00    0.00    0.00    0.00     2  |__handler4
02:48:31 AM   994         -     10036    0.00    0.00    0.00    0.00    0.00     3  |__urcu5
02:48:31 AM   994         -     10037    0.00    0.00    0.00    0.00    0.00     3  |__handler6
02:48:31 AM   994         -     10038    0.00    0.00    0.00    0.00    0.00     3  |__revalidator7


2023-07-11T03:31:02.749Z|00063|memory|INFO|handlers:5 idl-cells-Open_vSwitch:403 ports:6 revalidators:1 rules:10 udpif keys:2
2023-07-11T06:39:35.532Z|00001|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T06:39:36.478Z|00002|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T06:39:36.990Z|00003|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T06:39:37.448Z|00004|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T06:39:37.965Z|00005|dpif_netlink(handler1)|WARN|system@ovs-system: lost packet on handler 0
2023-07-11T06:39:38.078Z|00006|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.082Z|00007|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.089Z|00008|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.100Z|00009|poll_loop(handler1)|INFO|wakeup due to 1-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.117Z|00010|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.122Z|00011|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.128Z|00012|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.135Z|00013|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.140Z|00014|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:38.144Z|00015|poll_loop(handler1)|INFO|wakeup due to 0-ms timeout at ../ofproto/ofproto-dpif-upcall.c:817 (87% CPU usage)
2023-07-11T06:39:39.790Z|00016|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 48192 flows.
2023-07-11T06:39:39.790Z|00017|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 48192 flows.
2023-07-11T06:39:39.790Z|00018|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 48192 flows.
2023-07-11T06:39:39.790Z|00019|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 48192 flows.
2023-07-11T06:39:39.790Z|00020|ofproto_dpif_upcall(handler1)|WARN|upcall: datapath reached the dynamic limit of 48192 flows.
2023-07-11T06:39:40.692Z|00001|ofproto_dpif_upcall(revalidator6)|INFO|Spent an unreasonably long 2228ms dumping flows
2023-07-11T06:39:40.692Z|00002|timeval(revalidator6)|WARN|Unreasonably long 2228ms poll interval (498ms user, 1287ms system)
2023-07-11T06:39:40.692Z|00003|timeval(revalidator6)|WARN|faults: 2578 minor, 0 major
2023-07-11T06:39:40.692Z|00004|timeval(revalidator6)|WARN|context switches: 529 voluntary, 2149 involuntary
2023-07-11T06:39:40.692Z|00005|coverage(revalidator6)|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=ee7e0b89:
2023-07-11T06:39:40.692Z|00006|coverage(revalidator6)|INFO|bridge_reconfigure         0.0/sec     0.000/sec        0.0000/sec   total: 17
2023-07-11T06:39:40.692Z|00007|coverage(revalidator6)|INFO|ofproto_flush              0.0/sec     0.000/sec        0.0000/sec   total: 2
2023-07-11T06:39:40.692Z|00008|coverage(revalidator6)|INFO|ofproto_update_port        0.0/sec     0.000/sec        0.0000/sec   total: 11
2023-07-11T06:39:40.692Z|00009|coverage(revalidator6)|INFO|rev_port_toggled           0.0/sec     0.000/sec        0.0000/sec   total: 2
2023-07-11T06:39:40.692Z|00010|coverage(revalidator6)|INFO|rev_flow_table             0.0/sec     0.000/sec        0.0000/sec   total: 1
2023-07-11T06:39:40.692Z|00011|coverage(revalidator6)|INFO|rev_mac_learning           0.0/sec     0.017/sec        0.0003/sec   total: 6
2023-07-11T06:39:40.692Z|00012|coverage(revalidator6)|INFO|dumped_duplicate_flow      0.0/sec     0.000/sec        0.0000/sec   total: 230
2023-07-11T06:39:40.692Z|00013|coverage(revalidator6)|INFO|handler_duplicate_upcall 1396.2/sec   118.417/sec        1.9736/sec   total: 9429
2023-07-11T06:39:40.692Z|00014|coverage(revalidator6)|INFO|upcall_flow_limit_hit      0.0/sec     0.000/sec        0.0000/sec   total: 9830
2023-07-11T06:39:40.692Z|00015|coverage(revalidator6)|INFO|upcall_ukey_replace        6.2/sec     0.517/sec        0.0086/sec   total: 31
2023-07-11T06:39:40.692Z|00016|coverage(revalidator6)|INFO|xlate_actions            8633.6/sec   721.717/sec       12.0286/sec   total: 111399
2023-07-11T06:39:40.692Z|00017|coverage(revalidator6)|INFO|cmap_expand              473.2/sec    39.550/sec        0.6592/sec   total: 2742
2023-07-11T06:39:40.692Z|00018|coverage(revalidator6)|INFO|cmap_shrink                0.0/sec     0.050/sec        0.0008/sec   total: 159
2023-07-11T06:39:40.692Z|00019|coverage(revalidator6)|INFO|dpif_flow_flush            0.0/sec     0.000/sec        0.0000/sec   total: 1
2023-07-11T06:39:40.692Z|00020|coverage(revalidator6)|INFO|dpif_flow_get              0.0/sec     0.000/sec        0.0000/sec   total: 24
2023-07-11T06:39:40.692Z|00021|coverage(revalidator6)|INFO|dpif_flow_put            7237.4/sec   603.267/sec       10.0544/sec   total: 58612
2023-07-11T06:39:40.692Z|00022|coverage(revalidator6)|INFO|dpif_flow_del              0.0/sec     0.000/sec        0.0000/sec   total: 4698
2023-07-11T06:39:40.692Z|00023|coverage(revalidator6)|INFO|dpif_execute             8633.6/sec   721.650/sec       12.0275/sec   total: 69206
2023-07-11T06:39:40.692Z|00024|coverage(revalidator6)|INFO|flow_extract             8633.6/sec   721.650/sec       12.0275/sec   total: 69204
2023-07-11T06:39:40.692Z|00025|coverage(revalidator6)|INFO|miniflow_malloc            0.0/sec     0.000/sec        0.0000/sec   total: 361
2023-07-11T06:39:40.692Z|00026|coverage(revalidator6)|INFO|hmap_pathological          6.8/sec     0.567/sec        0.0094/sec   total: 36
2023-07-11T06:39:40.692Z|00027|coverage(revalidator6)|INFO|hmap_expand               13.4/sec     5.467/sec        1.3828/sec   total: 17347
2023-07-11T06:39:40.692Z|00028|coverage(revalidator6)|INFO|mac_learning_learned     7229.0/sec   602.483/sec       10.0414/sec   total: 54242
2023-07-11T06:39:40.692Z|00029|coverage(revalidator6)|INFO|mac_learning_expired       0.0/sec     0.000/sec        0.0000/sec   total: 3
2023-07-11T06:39:40.692Z|00030|coverage(revalidator6)|INFO|netdev_get_stats           1.2/sec     1.200/sec        1.1917/sec   total: 13590
2023-07-11T06:39:40.692Z|00031|coverage(revalidator6)|INFO|txn_unchanged              0.0/sec     0.100/sec        0.1969/sec   total: 2290
2023-07-11T06:39:40.692Z|00032|coverage(revalidator6)|INFO|txn_incomplete             0.2/sec     0.117/sec        0.0019/sec   total: 22
2023-07-11T06:39:40.692Z|00033|coverage(revalidator6)|INFO|txn_success                0.0/sec     0.083/sec        0.0014/sec   total: 18
2023-07-11T06:39:40.692Z|00034|coverage(revalidator6)|INFO|poll_create_node         427.8/sec    52.533/sec        8.6267/sec   total: 100040
2023-07-11T06:39:40.692Z|00035|coverage(revalidator6)|INFO|poll_zero_timeout        137.0/sec    11.533/sec        0.1922/sec   total: 1130
2023-07-11T06:39:40.692Z|00036|coverage(revalidator6)|INFO|seq_change               352.6/sec    38.850/sec        1.0064/sec   total: 14204
2023-07-11T06:39:40.692Z|00037|coverage(revalidator6)|INFO|pstream_open               0.0/sec     0.000/sec        0.0000/sec   total: 5
2023-07-11T06:39:40.692Z|00038|coverage(revalidator6)|INFO|stream_open                0.0/sec     0.000/sec        0.0000/sec   total: 1
2023-07-11T06:39:40.692Z|00039|coverage(revalidator6)|INFO|unixctl_received           0.0/sec     0.067/sec        0.0011/sec   total: 5
2023-07-11T06:39:40.692Z|00040|coverage(revalidator6)|INFO|unixctl_replied            0.0/sec     0.067/sec        0.0011/sec   total: 5
2023-07-11T06:39:40.692Z|00041|coverage(revalidator6)|INFO|util_xalloc              63928.6/sec  5466.950/sec      147.7958/sec   total: 1368392
2023-07-11T06:39:40.692Z|00042|coverage(revalidator6)|INFO|netdev_set_policing        0.0/sec     0.000/sec        0.0000/sec   total: 102
2023-07-11T06:39:40.692Z|00043|coverage(revalidator6)|INFO|netdev_get_ethtool         0.0/sec     0.000/sec        0.0000/sec   total: 11
2023-07-11T06:39:40.692Z|00044|coverage(revalidator6)|INFO|netlink_overflow           1.0/sec     0.083/sec        0.0014/sec   total: 8
2023-07-11T06:39:40.692Z|00045|coverage(revalidator6)|INFO|netlink_received         15889.8/sec  1329.050/sec       24.1461/sec   total: 188091
2023-07-11T06:39:40.692Z|00046|coverage(revalidator6)|INFO|netlink_recv_jumbo         0.8/sec     0.800/sec        0.7944/sec   total: 9207
2023-07-11T06:39:40.692Z|00047|coverage(revalidator6)|INFO|netlink_sent             15884.2/sec  1328.350/sec       24.1144/sec   total: 187219
2023-07-11T06:39:40.692Z|00048|coverage(revalidator6)|INFO|nln_changed                0.0/sec     0.000/sec        0.0000/sec   total: 31
2023-07-11T06:39:40.692Z|00049|coverage(revalidator6)|INFO|108 events never hit
2023-07-11T06:39:40.696Z|00050|ofproto_dpif_upcall(revalidator6)|WARN|Number of datapath flows (48608) twice as high as current dynamic flow limit (48192).  Starting to delete flows unconditionally as an emergency measure.
2023-07-11T06:39:41.739Z|00051|timeval(revalidator6)|WARN|Unreasonably long 1046ms poll interval (336ms user, 619ms system)
2023-07-11T06:39:41.739Z|00052|timeval(revalidator6)|WARN|faults: 158 minor, 0 major
2023-07-11T06:39:41.739Z|00053|timeval(revalidator6)|WARN|context switches: 234 voluntary, 559 involuntary

Comment 8 mhou 2023-07-11 07:19:56 UTC
Base on comment6, I change n-revalidator-threads and n-handler-threads by ovs-vsctl. But only n-revalidator-threads take effect.

> If I still keep 29 cpu as housekeeping cpu, and configure n-revalidator-threads=6. and other_config:n-handler-threads=24. OVS still configure handler to 31.
> 2023-07-11T02:07:13.277Z|00049|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 31, setting n-revalidator-threads to 6
> 2023-07-11T02:07:13.277Z|00050|ofproto_dpif_upcall|INFO|Starting 37 threads

Base on comment7, If I only change n-revalidator-threads to 1, the handler thread still can't keep 1:1 ratio with cpu. I can see ovs start 5 handler. almost handler host on cpu1.
> 2023-07-11T06:48:15.005Z|00049|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 5, setting n-revalidator-threads to 1
> 2023-07-11T06:48:15.005Z|00050|ofproto_dpif_upcall|INFO|Starting 6 threads

> # pidstat -t -p $(pidof ovs-vswitchd)
> Linux 4.18.0-499.rt7.288.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	07/11/2023 	_x86_64_	(56 CPU)

> 02:48:31 AM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
> 02:48:31 AM   994     10030         -    0.00    0.00    0.00    0.00    0.00     2  ovs-vswitchd
> 02:48:31 AM   994         -     10030    0.00    0.00    0.00    0.00    0.00     2  |__ovs-vswitchd
> 02:48:31 AM   994         -     10032    0.00    0.00    0.00    0.00    0.00     1  |__handler1
> 02:48:31 AM   994         -     10033    0.00    0.00    0.00    0.00    0.00     1  |__handler2
> 02:48:31 AM   994         -     10034    0.00    0.00    0.00    0.00    0.00     1  |__handler3
> 02:48:31 AM   994         -     10035    0.00    0.00    0.00    0.00    0.00     2  |__handler4
> 02:48:31 AM   994         -     10036    0.00    0.00    0.00    0.00    0.00     3  |__urcu5
> 02:48:31 AM   994         -     10037    0.00    0.00    0.00    0.00    0.00     3  |__handler6
> 02:48:31 AM   994         -     10038    0.00    0.00    0.00    0.00    0.00     3  |__revalidator7

if I use taskset to pin ovs threads to isolated cpu. ovs will generator new handler threads.
# pidstat -t -p $(pidof ovs-vswitchd)
Linux 4.18.0-499.rt7.288.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	07/11/2023 	_x86_64_	(56 CPU)

03:03:55 AM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
03:03:55 AM   994     10805         -    0.00    0.00    0.00    0.00    0.00     1  ovs-vswitchd
03:03:55 AM   994         -     10805    0.00    0.00    0.00    0.00    0.00     1  |__ovs-vswitchd
03:03:55 AM   994         -     10807    0.00    0.00    0.00    0.00    0.00     2  |__handler4
03:03:55 AM   994         -     10808    0.00    0.00    0.00    0.00    0.00     2  |__handler2
03:03:55 AM   994         -     10809    0.00    0.00    0.00    0.00    0.00     1  |__handler1
03:03:55 AM   994         -     10810    0.00    0.00    0.00    0.00    0.00     1  |__handler3
03:03:55 AM   994         -     10811    0.00    0.00    0.00    0.00    0.00     2  |__handler5
03:03:55 AM   994         -     10812    0.00    0.00    0.00    0.00    0.00     3  |__revalidator6
03:03:55 AM   994         -     10813    0.00    0.00    0.00    0.00    0.00     3  |__urcu7

# taskset -a -p 3,5,7,9,11,13 10805

find the handler threads increase to 17
2023-07-11T07:03:48.979Z|00061|bridge|INFO|ovs-vswitchd (Open vSwitch) 3.1.2
2023-07-11T07:03:58.989Z|00062|memory|INFO|174208 kB peak resident set size after 10.3 seconds
2023-07-11T07:03:58.989Z|00063|memory|INFO|handlers:5 idl-cells-Open_vSwitch:403 ports:6 revalidators:1 rules:10
2023-07-11T07:04:49.034Z|00064|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 17, setting n-revalidator-threads to 1

# pidstat -t -p $(pidof ovs-vswitchd)
Linux 4.18.0-499.rt7.288.el8.x86_64 (dell-per750-03.rhts.eng.pek2.redhat.com) 	07/11/2023 	_x86_64_	(56 CPU)

03:04:59 AM   UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
03:04:59 AM   994     10805         -    0.00    0.00    0.00    0.00    0.00     1  ovs-vswitchd
03:04:59 AM   994         -     10805    0.00    0.00    0.00    0.00    0.00     1  |__ovs-vswitchd
03:04:59 AM   994         -     10813    0.00    0.00    0.00    0.00    0.00     3  |__urcu7
03:04:59 AM   994         -     10847    0.00    0.00    0.00    0.00    0.00     0  |__handler8
03:04:59 AM   994         -     10848    0.00    0.00    0.00    0.00    0.00     0  |__handler9
03:04:59 AM   994         -     10849    0.00    0.00    0.00    0.00    0.00     0  |__handler10
03:04:59 AM   994         -     10850    0.00    0.00    0.00    0.00    0.00     0  |__handler11
03:04:59 AM   994         -     10851    0.00    0.00    0.00    0.00    0.00     0  |__handler12
03:04:59 AM   994         -     10852    0.00    0.00    0.00    0.00    0.00     0  |__handler13
03:04:59 AM   994         -     10853    0.00    0.00    0.00    0.00    0.00     0  |__handler14
03:04:59 AM   994         -     10854    0.00    0.00    0.00    0.00    0.00     1  |__handler15
03:04:59 AM   994         -     10855    0.00    0.00    0.00    0.00    0.00     1  |__handler16
03:04:59 AM   994         -     10856    0.00    0.00    0.00    0.00    0.00     0  |__handler17
03:04:59 AM   994         -     10857    0.00    0.00    0.00    0.00    0.00     1  |__handler18
03:04:59 AM   994         -     10858    0.00    0.00    0.00    0.00    0.00     1  |__handler22
03:04:59 AM   994         -     10859    0.00    0.00    0.00    0.00    0.00     1  |__handler19
03:04:59 AM   994         -     10860    0.00    0.00    0.00    0.00    0.00     1  |__handler21
03:04:59 AM   994         -     10861    0.00    0.00    0.00    0.00    0.00     0  |__handler20
03:04:59 AM   994         -     10862    0.00    0.00    0.00    0.00    0.00     0  |__handler23
03:04:59 AM   994         -     10863    0.00    0.00    0.00    0.00    0.00     0  |__handler24
03:04:59 AM   994         -     10864    0.00    0.00    0.00    0.00    0.00     1  |__revalidator25

Comment 9 Eelco Chaudron 2023-07-12 14:34:22 UTC
So this problem happens when something in the testbed is not configured correctly, and traffic is initially flooded. When the test is failing I see this kind of entries:

recirc_id(0),in_port(2),eth(src=00:00:00:00:19:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:25:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:20:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:27:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:23:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:21:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:1f:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:28:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 
recirc_id(0),in_port(2),eth(src=00:00:00:00:26:01,dst=20:00:00:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:0, bytes:0, used:never, actions:1,3 

The problem is not happening with the following QA tools settings:

./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=1 --max-loss-pct=0.002 --search-runtime=10 \
--validation-runtime=60 --rate-tolerance=10 --runtime-tolerance=10 --rate=25 --rate-unit=% \
--duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-trial --warmup-trial-runtime=10 \
--rate=0.1 --rate-unit=mpps --one-shot=1 --src-macs=00:00:00:00:00:01,00:00:00:00:00:02 \
--dst-macs=10:00:00:00:00:00,20:00:00:00:00:00 --use-device-stats

./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=10000 --max-loss-pct=0.002 \
--search-runtime=10 --validation-runtime=60 --rate-tolerance=10 --runtime-tolerance=10 \
--duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-trial --warmup-trial-runtime=10 \
--rate=25 --rate-unit=% --one-shot=0 --src-macs=00:00:00:00:00:01,00:00:00:00:00:02 \
--dst-macs=00:00:00:00:00:02,00:00:00:00:00:01 --use-src-ip-flows=0 --use-dst-ip-flows=0 --use-src-mac-flows=1 \
--use-dst-mac-flows=0 --warmup-trial --warmup-trial-runtime=30 --max-retries=10 --sniff-runtime=10 --warmup-trial \
--warmup-trial-runtime=10 --send-teaching-measurement --send-teaching-warmup --teaching-warmup-packet-type=generic \
--teaching-warmup-packet-rate=10000 --teaching-measurement-packet-type=generic --use-device-stats


But it's happening with the following:

./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=1 --max-loss-pct=0.002 --search-runtime=10 --validation-runtime=10 --rate-tolerance=10 --runtime-tolerance=10 --duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-tri
al --warmup-trial-runtime=10 --rate=0.1 --rate-unit=mpps --one-shot=1 --src-macs=00:00:00:00:00:01,00:00:00:00:00:02 --dst-macs=00:00:00:00:00:02,00:00:00:00:00:01 --use-device-stats

 ./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=10000 --max-loss-pct=0.002 --search-runtime=10 --validation-runtime=10 --rate-tolerance=10 --runtime-tolerance=10 --duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup
-trial --warmup-trial-runtime=10 --rate=25 --rate-unit=% --one-shot=0 --src-macs=10:00:00:00:00:00,20:00:00:00:00:00 --dst-macs=00:00:00:00:00:02,00:00:00:00:00:01 --use-src-ip-flows=0 --use-dst-ip-flows=0 --use-src-mac-flows=1 --use-dst-mac-flows=0 --warmup-trial --warmup-tria
l-runtime=30 --sniff-runtime=10 --warmup-trial --warmup-trial-runtime=10 --send-teaching-measurement --send-teaching-warmup --teaching-warmup-packet-type=generic --max-retries=10 --teaching-warmup-packet-rate=10000 --teaching-measurement-packet-type=generic --use-device-stats



I let QA figure out how to ensure the proper FDB entries exist so not traffic gets flooded.

Comment 10 mhou 2023-07-13 10:30:10 UTC
As follows Eelco suggestions, performance test didn't hit this issue.


Note You need to log in before you can comment on or make changes to this bug.