Description of problem: We are currently troubleshooting a situation in which the customer's controller nodes were fenced. After initial investigation, it seems the following happened: 1. MAC addresses of two vlans (vlan104 and vlan105) on the controller changed on Jul 16 around 16:31, for whatever reason still to discover 2. The change in mac addresses caused the switch(es) to block traffic (confirmed by the customer as visible on the switch) 3. Because the traffic was blocked/dropped, rather than the interface being reset, we don't see "Link DOWN" on the node 4. Also because of the blocked/dropped traffic, pcs monitors timed out, causing the fencing of the node (as described earlier by Ondrej) Before the event, we see on the logs warnings like this: 2021-07-22T15:30:18.432Z|06178|ofproto_dpif_upcall(handler163)|WARN|Dropped 46143 log messages in last 59 seconds (most recently, 0 seconds ago) due to excessive rate 2021-07-22T15:30:18.432Z|06179|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached 2021-07-22T15:31:18.514Z|02070|ofproto_dpif_upcall(handler165)|WARN|Dropped 31887 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate 2021-07-22T15:31:18.517Z|02071|ofproto_dpif_upcall(handler165)|WARN|upcall: datapath flow limit reached 2021-07-22T15:32:18.433Z|02072|ofproto_dpif_upcall(handler165)|WARN|Dropped 31465 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate 2021-07-22T15:32:18.433Z|02073|ofproto_dpif_upcall(handler165)|WARN|upcall: datapath flow limit reached 2021-07-22T15:33:18.433Z|06180|ofproto_dpif_upcall(handler163)|WARN|Dropped 69083 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate 2021-07-22T15:33:18.433Z|06181|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached 2021-07-22T15:34:19.176Z|06182|ofproto_dpif_upcall(handler163)|WARN|Dropped 29325 log messages in last 61 seconds (most recently, 1 seconds ago) due to excessive rate 2021-07-22T15:34:19.176Z|06183|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached Version-Release number of selected component (if applicable): openstack-neutron-openvswitch-9.4.1-53.el7ost.noarch Sat Mar 6 11:15:31 2021 openvswitch-2.9.0-114.el7fdp.x86_64 Sat Feb 1 11:51:13 2020 openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch Sat Feb 1 11:50:48 2020 python-openvswitch-2.9.0-114.el7fdp.x86_64 Sat Feb 1 11:53:34 2020 How reproducible: Not reproduced, looking for root cause. Actual results: MAC addresses of vlans changed, reason not yet found. Expected results: MAC addresses of vlans don't change on the controller node.
*** Bug 1992835 has been marked as a duplicate of this bug. ***
I created an upstream RFE for storing the MAC address of a VLAN interface to ensure that it remains static across reboots and restarts. https://bugs.launchpad.net/os-net-config/+bug/1941002