Bug 1989057

Summary: [RFE] Implement method to avoid change of MAC address of vlans on OpenStack controller node
Product: Red Hat OpenStack Reporter: Eric Nothen <enothen>
Component: os-net-configAssignee: OSP Team <rhos-maint>
Status: NEW --- QA Contact: Nobody <nobody>
Severity: low Docs Contact:
Priority: unspecified    
Version: 17.0 (Wallaby)CC: apavlovs, bfournie, ccamposr, chrisw, dsneddon, ekuris, enothen, ggrimaux, hbrock, jbeaudoi, jkreger, jslagle, mburns, pweeks, sbaker, scohen, sshnaidm
Target Milestone: betaKeywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eric Nothen 2021-08-02 10:12:29 UTC
Description of problem:
We are currently troubleshooting a situation in which the customer's controller nodes were fenced. After initial investigation, it seems the following happened:

1. MAC addresses of two vlans (vlan104 and vlan105) on the controller changed on Jul 16 around 16:31, for whatever reason still to discover
2. The change in mac addresses caused the switch(es) to block traffic (confirmed by the customer as visible on the switch)
3. Because the traffic was blocked/dropped, rather than the interface being reset, we don't see "Link DOWN" on the node
4. Also because of the blocked/dropped traffic, pcs monitors timed out, causing the fencing of the node (as described earlier by Ondrej)

Before the event, we see on the logs warnings like this:

2021-07-22T15:30:18.432Z|06178|ofproto_dpif_upcall(handler163)|WARN|Dropped 46143 log messages in last 59 seconds (most recently, 0 seconds ago) due to excessive rate
2021-07-22T15:30:18.432Z|06179|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached
2021-07-22T15:31:18.514Z|02070|ofproto_dpif_upcall(handler165)|WARN|Dropped 31887 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2021-07-22T15:31:18.517Z|02071|ofproto_dpif_upcall(handler165)|WARN|upcall: datapath flow limit reached
2021-07-22T15:32:18.433Z|02072|ofproto_dpif_upcall(handler165)|WARN|Dropped 31465 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2021-07-22T15:32:18.433Z|02073|ofproto_dpif_upcall(handler165)|WARN|upcall: datapath flow limit reached
2021-07-22T15:33:18.433Z|06180|ofproto_dpif_upcall(handler163)|WARN|Dropped 69083 log messages in last 60 seconds (most recently, 0 seconds ago) due to excessive rate
2021-07-22T15:33:18.433Z|06181|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached
2021-07-22T15:34:19.176Z|06182|ofproto_dpif_upcall(handler163)|WARN|Dropped 29325 log messages in last 61 seconds (most recently, 1 seconds ago) due to excessive rate
2021-07-22T15:34:19.176Z|06183|ofproto_dpif_upcall(handler163)|WARN|upcall: datapath flow limit reached

Version-Release number of selected component (if applicable):
openstack-neutron-openvswitch-9.4.1-53.el7ost.noarch        Sat Mar  6 11:15:31 2021
openvswitch-2.9.0-114.el7fdp.x86_64                         Sat Feb  1 11:51:13 2020
openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch        Sat Feb  1 11:50:48 2020
python-openvswitch-2.9.0-114.el7fdp.x86_64                  Sat Feb  1 11:53:34 2020


How reproducible:
Not reproduced, looking for root cause.


Actual results:
MAC addresses of vlans changed, reason not yet found.

Expected results:
MAC addresses of vlans don't change on the controller node.

Comment 8 Steve Baker 2021-08-17 19:59:29 UTC
*** Bug 1992835 has been marked as a duplicate of this bug. ***

Comment 11 Dan Sneddon 2021-08-24 19:32:44 UTC
I created an upstream RFE for storing the MAC address of a VLAN interface to ensure that it remains static across reboots and restarts.

https://bugs.launchpad.net/os-net-config/+bug/1941002