Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2323873

Summary: During live migration of controller between nodes, MAC flaps between source and destination worker resulting in port shutdown [17.1]
Product: Red Hat OpenStack Reporter: David Hill <dhill>
Component: osp-director-operator-containerAssignee: OSP Team <rhos-maint>
Status: CLOSED MIGRATED QA Contact:
Severity: medium Docs Contact: Irina <igallagh>
Priority: medium    
Version: 17.1 (Wallaby)CC: chjones, grosenbe, jschluet, lmadsen, mschuppe, owalsh
Target Milestone: asyncKeywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-12-10 19:19:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Hill 2024-11-05 15:01:31 UTC
Description of problem:
when live migrating a  controller from one worker node to the other, mac starts flapping between source and destination worker getting the switch port shutted down.  The The customer involved the network hardware vendor which provided a patch and this solved the issue but there's still between 40 and 80 mac flapping when live-migrating.  There's a fix for that in OCP 4.16 which needs to be configured in the osp operator as per this :
~~~
From OCP Virt, to solve the MAC flaps during the migration, we have to pass the parameter "disableContainerInterface: true" to the NAD. However, we cannot edit it directly since the NADs are managed by the director operator. Also, there are OSP pods using some of the NADs. However, it looks like the NADs used by the VMs and other OSP pods are different:


VMs using NAD without "static" suffix:


# oc get vm vm-ctl-0 -o yaml |yq '.spec.template.spec.networks'
- name: default
  pod: {}
- multus:
    networkName: ctlplane
  name: ctlplane
- multus:
    networkName: external
  name: external
- multus:
    networkName: internalapi
  name: internalapi
- multus:
    networkName: storage
  name: storage
- multus:
    networkName: tenant
  name: tenant
openstackclient pod is using NAD with "static" suffix:


# oc get pod openstackclient -o yaml |yq '.metadata.annotations["k8s.v1.cni.cncf.io/networks"]'
[{"name": "ctlplane-static", "namespace": "openstack", "ips": ["10.10.104.10/22"]}, {"name": "internalapi-static", "namespace": "openstack", "ips": ["10.10.103.10/23"]}, {"name": "external-static", "namespace": "openstack", "ips": ["10.10.102.10/28"]}]
So it maybe possible only to pass "disableContainerInterface" on NADs used only by VMs?


As of now, customer increased the threshold of mac flaps so it won't get blocklisted, but they still looking for solution from us.
~~~


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 7 Leif Madsen 2024-12-10 19:19:14 UTC
This issue has been migrated to https://issues.redhat.com/browse/OSPRH-12355