Created attachment 1514063 [details] logs Description of problem: during migration from ml2/ovs to ml2/ovn process we expect to have downtime of the live instances. when checking the downtime we can see that the downtime is around 5 to 6 minutes. https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/networking-ovn/job/DFG-network-networking-ovn-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-ml2ovs-to-ovn-migration/228/artifact/.workspaces/workspace_2018-12-12_11-44-32/ovn_migration/ovn_migration/ Version-Release number of selected component (if applicable): core_puddle: 2018-12-12.4 How reproducible: 100% Steps to Reproduce: 1. run the migration job and check the ping logs. 2. 3. Actual results: Expected results: Additional info:
I found the culprit. It's the workaround introduced here: https://github.com/openstack/networking-ovn/blob/47983cc61194888750f1c4cb08ff350a13914903/migration/tripleo_environment/playbooks/roles/migration/templates/clone-br-int.sh.j2#L79 The moment we delete the controller on br-int, all the openflow rules are removed = downtime. And then we wait 5 minutes after this script is ran :-/ The workaround was introduced because of: https://bugzilla.redhat.com/show_bug.cgi?id=1640045 We need to figure out if that workaround can now be removed, or otherwise, in that script: 1) Save the flows 2) Apply the workaround 3) Restore the flows
I discovered the culprit because I was recording screen during the migration: https://www.youtube.com/watch?v=sA7xfTpPMJc
(In reply to Miguel Angel Ajo from comment #2) > I found the culprit. > > It's the workaround introduced here: > > https://github.com/openstack/networking-ovn/blob/ > 47983cc61194888750f1c4cb08ff350a13914903/migration/tripleo_environment/ > playbooks/roles/migration/templates/clone-br-int.sh.j2#L79 > > The moment we delete the controller on br-int, all the openflow rules are > removed = downtime. And then we wait 5 minutes after this script is ran :-/ > > > The workaround was introduced because of: > https://bugzilla.redhat.com/show_bug.cgi?id=1640045 > > > > We need to figure out if that workaround can now be removed, or otherwise, > in that script: > > 1) Save the flows > 2) Apply the workaround > 3) Restore the flows I think we have the fix available in ovs in the version - 2.10.0-21+ . So I don't think we need the workaround anymore.
Correct, I verified we don't need the workaround anymore, it works without it.
According to our records, this should be resolved by python-networking-ovn-5.0.2-0.20190430191338.e673daf.el7ost. This build is available now.
cant verify - depends on https://bugzilla.redhat.com/show_bug.cgi?id=1694572
Closing as bug 1694572 won't make it to OSP14 before EOL.