Bug 2216778 - Long VM network downtime (~20 min) when running ovn migration revert playbook
Summary: Long VM network downtime (~20 min) when running ovn migration revert playbook
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: documentation
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: James Smith
QA Contact: RHOS Documentation Team
URL:
Whiteboard:
Depends On:
Blocks: 1823324
TreeView+ depends on / blocked
 
Reported: 2023-06-22 14:16 UTC by Roman Safronov
Modified: 2023-09-08 03:10 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-09-08 03:10:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-25990 0 None None None 2023-06-22 14:16:51 UTC

Description Roman Safronov 2023-06-22 14:16:25 UTC
Description of problem:
When starting ovn migration revert (already after controller nodes are up) there is still a long downtime until VMs are responsive again. 

Version-Release number of selected component (if applicable):
RHOS-17.1-RHEL-9-20230621.n.1
openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200969.el9ost.noarch
python3-neutron-18.6.1-1.20230518200969.el9ost.noarch
ovn22.12-22.12.0-46.el9fdp.x86_64
openvswitch3.1-3.1.0-14.el9fdp.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Deploy HA environment with OVS neutron backend
2.Create networks, subnets, security groups, spawn VMs connected to the created networks, make sure VMs are either connected to external network or have FIPs in order to be accessible from the external network
3.Create a backup of controller nodes in order to be able to revert afterwards
4.Migrate the neutron backend to OVS
5.Make sure VMs are responding to ping requests from the external network
6.Restore controller nodes from the backup
7.Make sure controller nodes are restored and VMs are still pingable.
8. Start an infinite ping processes to all VMs and run revert playbook (/usr/share/ansible/neutron-ovn-migration/playbooks/revert.yml)

Actual results:
VMs are not responding to pings for about 20 minutes

Expected results:
There is a rather short network downtime (several seconds, at least less than a minute).

Additional info:
If this long downtime is considered reasonable we need to document it. Customers would expect rather short network downtime.


Note You need to log in before you can comment on or make changes to this bug.