Bug 2187651

Summary: too long connectivity downtime during VM live-migration with BGP
Product: Red Hat OpenStack Reporter: Eduardo Olivares <eolivare>
Component: ovn-bgp-agentAssignee: Luis Tomas Bolivar <ltomasbo>
Status: CLOSED ERRATA QA Contact: Eduardo Olivares <eolivare>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: dalvarez, lmartins, ltomasbo
Target Milestone: gaKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn-bgp-agent-0.4.1-1.20230512001004.e697e35.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-16 01:14:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Olivares 2023-04-18 10:30:45 UTC
Description of problem:
This issue can be reproduced with the upstream tempest test test_server_connectivity_live_migration, but it needs to be updated with this change (otherwise it wrongly passes): https://review.opendev.org/c/openstack/tempest/+/880719
It only fails on BGP setups and that's the reason why the component is set initially to ovn-bgp-agent, although the fix may be implemented in neutron or somewhere else.

The manual reproduction is simple:
- create a VM connected to a provider network with external connectivity
- start a ping from the VM to an external IP (8.8.8.8) - by default one ping is sent per second
- run the following command: openstack server migrate --live-migration vm0
- stop the ping command and check the number of pings not replied

In a non-BGP setup, only one ping is lost (~1 second of connectivity downtime). In a BGP setup, the downtime takes between 15 and 20 seconds.
The reason is that the default GWs MAC address changes when the VM is migrated to a different compute, because it corresponds with the compute's br-ex interface in case of BGP setups. This change doesn't happen immediately in the VM's ARP table. It happens when the VM sends an ARP asking for the MAC of that default GW.


Packets captured at the VM eth0 interface before the migration from comp-0 to comp-1 (the MAC a6:e1:df:19:b3:45 corresponds with the comp-0 br-ex interface) show that the pings are successfully replied:
09:41:20.683835 fa:16:3e:27:ba:5b > a6:e1:df:19:b3:45, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 2493, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 22, length 64
09:41:20.756554 a6:e1:df:19:b3:45 > fa:16:3e:27:ba:5b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 51, id 0, offset 0, flags [none], proto ICMP (1), length 84)
    8.8.8.8 > 172.24.100.16: ICMP echo reply, id 1, seq 22, length 64
09:41:21.685383 fa:16:3e:27:ba:5b > a6:e1:df:19:b3:45, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 3418, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 23, length 64
09:41:21.757542 a6:e1:df:19:b3:45 > fa:16:3e:27:ba:5b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 51, id 0, offset 0, flags [none], proto ICMP (1), length 84)
    8.8.8.8 > 172.24.100.16: ICMP echo reply, id 1, seq 23, length 64


When the VM is migrated to comp-1, the following ARP is captured (46:fd:fb:5d:e1:41 is comp-1 br-ex MAC):
09:41:22.794658 fa:16:3e:27:ba:5b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 172.24.100.16 tell 172.24.100.16, length 28
09:41:23.028863 46:fd:fb:5d:e1:41 > fa:16:3e:27:ba:5b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 172.24.100.16 is-at 46:fd:fb:5d:e1:41, length 28

After that, pings are not replied during ~17 seconds - they are sent to the wrong destination MAC (a6:e1:df:19:b3:45 is from comp-0 br-ex, but the VM is running on comp-1 now):
09:41:23.734689 fa:16:3e:27:ba:5b > a6:e1:df:19:b3:45, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 4027, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 25, length 64
09:41:24.758616 fa:16:3e:27:ba:5b > a6:e1:df:19:b3:45, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 4042, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 26, length 64
...
09:41:40.118865 fa:16:3e:27:ba:5b > a6:e1:df:19:b3:45, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 12990, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 41, length 64


Then, the following ARP fixes the problem with the destination MAC (46:fd:fb:5d:e1:41 is from comp-1 br-ex):
09:41:41.143189 fa:16:3e:27:ba:5b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Request who-has 172.24.100.1 tell 172.24.100.16, length 28
09:41:41.917087 46:fd:fb:5d:e1:41 > fa:16:3e:27:ba:5b, ethertype ARP (0x0806), length 42: Ethernet (len 6), IPv4 (len 4), Reply 172.24.100.1 is-at 46:fd:fb:5d:e1:41, length 28
09:41:41.917114 fa:16:3e:27:ba:5b > 46:fd:fb:5d:e1:41, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 13055, offset 0, flags [DF], proto ICMP (1), length 84)
    172.24.100.16 > 8.8.8.8: ICMP echo request, id 1, seq 42, length 64
09:41:41.990308 46:fd:fb:5d:e1:41 > fa:16:3e:27:ba:5b, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 51, id 0, offset 0, flags [none], proto ICMP (1), length 84)
    8.8.8.8 > 172.24.100.16: ICMP echo reply, id 1, seq 42, length 64




The tempest test test_server_connectivity_live_migration covers the scenario of a VM with a port from a tenant network and with a FIP. It fails too.

I will add a comment when I test the scenario with a tenant network and no FIP.



Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230404.n.1

How reproducible:
100%


Actual results:
Connectivity downtime of 15+ seconds

Expected results:
Lower connectivity downtime during/after live-migration

Comment 1 Eduardo Olivares 2023-04-18 11:17:42 UTC
The downtime without FIP is 3 seconds or less. Even if the VM is migrated to a compute from a different rack (connected to different leaf/s), the downtime is low because the destination MAC corresponds with the router gateway (typically the IP X.X.X.1), which doesn't change during the migration.

Comment 4 Eduardo Olivares 2023-04-18 14:35:23 UTC
This bug only occurs when no other VM is running on the destination compute.
If another VM was running on that compute before the VM under test is migrated, the flows from [1] already existed and the measured downtime is 2 seconds or less

If no previous VM was running on that compute, this flows would not exist until they would be created by the sync process.


[1] 
[root@cmp-1-0 ~]# ovs-ofctl dump-flows br-ex
 cookie=0x3e7, duration=75.948s, table=0, n_packets=1, n_bytes=90, priority=900,ip,in_port="patch-provnet-4" actions=mod_dl_dst:46:fd:fb:5d:e1:41,NORMAL                                                                                     
 cookie=0x3e7, duration=75.938s, table=0, n_packets=0, n_bytes=0, priority=900,ipv6,in_port="patch-provnet-4" actions=mod_dl_dst:46:fd:fb:5d:e1:41,NORMAL                                                                                    
 cookie=0x0, duration=592819.333s, table=0, n_packets=6151, n_bytes=823797, priority=0 actions=NORMAL

Comment 17 errata-xmlrpc 2023-08-16 01:14:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577