Bug 2180956

Summary: OSP Tracker -Multichassis ports on localnet-attached logical switches should receive ICMP Path Discovery hints if their effective MTU is lower than localnet MTU
Product: Red Hat OpenStack Reporter: Ihar Hrachyshka <ihrachys>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED CURRENTRELEASE QA Contact: Fiorella Yanac <fyanac>
Severity: high Docs Contact:
Priority: urgent    
Version: 17.1 (Wallaby)CC: bcafarel, chrisw, fyanac, gthiemon, jschluet, mlavalle, pgrist, scohen, twilson
Target Milestone: gaKeywords: TestOnly, Tracking, Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn22.12-22.12.0-91.el8fdp ovn22.12-22.12.0-91.el9fdp Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-03-22 16:04:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2180955    
Bug Blocks:    

Description Ihar Hrachyshka 2023-03-22 18:02:02 UTC
This bug was initially created as a copy of Bug #2180955

I am copying this bug because: the underlying OVN bug affects live migration scenario in OSP where we utilize multichassis ports to reduce network downtime during libvirt live migration switch to a new node. (This feature is used since 17.1.)

I expect this bug to become a blocker for 17.1.

This bug is to track the fix in OVN. We should test the fix in OSP for live migration scenario.

===

Description of problem:

When a port is multichassis (requested-chassis is a comma separated list), if the LS that the port belongs to has a localnet port, traffic to and from the multichassis port is tunneled anyway. (This is done to guarantee delivery of packets destined to the port MAC address to all its locations.)

This enforced tunneling may be a problem if the effective MTU for the ports becomes different from the theoretical MTU of the physical network that underlies the LS (defined by MTU of localnet port in the same switch). In this case, the port should not communicate with the outside world using the max MTU.

The proposal here is for OVN controller to set up ICMP Path Discovery replies to oversized packets received from a multichassis port, so that the port owner is aware of the change in circumstances and can adequately adjust their effective MTU.

The problem was originally discussed in ovs-dev ML: https://www.mail-archive.com/ovs-dev@openvswitch.org/msg68204.html

Implementation was proposed at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-November/398981.html

This bug is to take the patch over and get it tested / merged in OVN.

The bug affects OSP live migration scenario for VMs attached to physical networks (=switches with localnet port).

I expect this bug to become a blocker for OSP 17.1 because of its effect on live migration scenario.

Comment 2 Ihar Hrachyshka 2023-05-03 01:17:56 UTC
Patch series that fixes the issue posted: https://patchwork.ozlabs.org/project/ovn/list/?series=353226

Cover letter: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404172.html

Comment 3 Ihar Hrachyshka 2023-05-10 12:33:05 UTC
Status:

I'm still waiting for OVN core team to look into it in upstream. I'm asking about reviews on every occasion (in upstream meetings and elsewhere). I've explicitly asked to review the series in scope of their next release process (soft-freeze was just announced). I hope we'll get some attention in next days...

Comment 4 Ihar Hrachyshka 2023-05-24 12:33:02 UTC
UPD: this is on the finish line in upstream, I expect it to land in next days.

Comment 6 Ihar Hrachyshka 2023-06-29 14:03:08 UTC
There's no automation on OSP side for this scenario.

I assume that verification of the issue on OSP side would involve manual checks
- start a vm on vlan network
- establish tcp session with iperf
- start live migration
- make sure that the session doesn’t get degraded / dropped during the process
(- confirm the migration is complete)