Today, if the migration process fails at some point, there's a need for manual intervention due to the lack of resilience of the tool/process. For example, if some issue happens in the activate-ovn phase [0] or in the middle of the process, such as [1] when changing the network type to Geneve the cloud is left in an intermediate state that it is neither on ML2/OVS nor ML2/OVN and requires engineering intervention to complete the process. A daemon/agent running on all the nodes could help that tracks the current state of the migration, retry on errors and maybe even allow reverting things back if all goes wrong. [0] https://github.com/openstack/neutron/blob/master/tools/ovn_migration/tripleo_environment/playbooks/roles/migration/templates/activate-ovn.sh.j2 [1] https://github.com/openstack/neutron/blob/master/tools/ovn_migration/tripleo_environment/playbooks/ovn-migration.yml#L39
In case migration fails, we should be able to revert to OVS using the revert mechanism tracked in bug 1823324 *** This bug has been marked as a duplicate of bug 1823324 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days