Bug 2323873
| Summary: | During live migration of controller between nodes, MAC flaps between source and destination worker resulting in port shutdown [17.1] | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | David Hill <dhill> |
| Component: | osp-director-operator-container | Assignee: | OSP Team <rhos-maint> |
| Status: | CLOSED MIGRATED | QA Contact: | |
| Severity: | medium | Docs Contact: | Irina <igallagh> |
| Priority: | medium | ||
| Version: | 17.1 (Wallaby) | CC: | chjones, grosenbe, jschluet, lmadsen, mschuppe, owalsh |
| Target Milestone: | async | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-12-10 19:19:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
This issue has been migrated to https://issues.redhat.com/browse/OSPRH-12355 |
Description of problem: when live migrating a controller from one worker node to the other, mac starts flapping between source and destination worker getting the switch port shutted down. The The customer involved the network hardware vendor which provided a patch and this solved the issue but there's still between 40 and 80 mac flapping when live-migrating. There's a fix for that in OCP 4.16 which needs to be configured in the osp operator as per this : ~~~ From OCP Virt, to solve the MAC flaps during the migration, we have to pass the parameter "disableContainerInterface: true" to the NAD. However, we cannot edit it directly since the NADs are managed by the director operator. Also, there are OSP pods using some of the NADs. However, it looks like the NADs used by the VMs and other OSP pods are different: VMs using NAD without "static" suffix: # oc get vm vm-ctl-0 -o yaml |yq '.spec.template.spec.networks' - name: default pod: {} - multus: networkName: ctlplane name: ctlplane - multus: networkName: external name: external - multus: networkName: internalapi name: internalapi - multus: networkName: storage name: storage - multus: networkName: tenant name: tenant openstackclient pod is using NAD with "static" suffix: # oc get pod openstackclient -o yaml |yq '.metadata.annotations["k8s.v1.cni.cncf.io/networks"]' [{"name": "ctlplane-static", "namespace": "openstack", "ips": ["10.10.104.10/22"]}, {"name": "internalapi-static", "namespace": "openstack", "ips": ["10.10.103.10/23"]}, {"name": "external-static", "namespace": "openstack", "ips": ["10.10.102.10/28"]}] So it maybe possible only to pass "disableContainerInterface" on NADs used only by VMs? As of now, customer increased the threshold of mac flaps so it won't get blocklisted, but they still looking for solution from us. ~~~ Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: