Bug 1659066
Summary: | [OVN migration] long downtime during migration from ml2/ovs to ml2/ovn | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eran Kuris <ekuris> | ||||
Component: | python-networking-ovn | Assignee: | Lucas Alvares Gomes <lmartins> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Eran Kuris <ekuris> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 14.0 (Rocky) | CC: | apevec, dalvarez, jlibosva, lhh, lmartins, majopela, nusiddiq, twilson | ||||
Target Milestone: | z4 | Keywords: | TestOnly, Triaged, ZStream | ||||
Target Release: | 14.0 (Rocky) | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | python-networking-ovn-5.0.2-0.20190430191338.e673daf.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-11-26 13:38:21 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1694572 | ||||||
Bug Blocks: | |||||||
Attachments: |
|
Description
Eran Kuris
2018-12-13 13:55:42 UTC
I found the culprit. It's the workaround introduced here: https://github.com/openstack/networking-ovn/blob/47983cc61194888750f1c4cb08ff350a13914903/migration/tripleo_environment/playbooks/roles/migration/templates/clone-br-int.sh.j2#L79 The moment we delete the controller on br-int, all the openflow rules are removed = downtime. And then we wait 5 minutes after this script is ran :-/ The workaround was introduced because of: https://bugzilla.redhat.com/show_bug.cgi?id=1640045 We need to figure out if that workaround can now be removed, or otherwise, in that script: 1) Save the flows 2) Apply the workaround 3) Restore the flows I discovered the culprit because I was recording screen during the migration: https://www.youtube.com/watch?v=sA7xfTpPMJc (In reply to Miguel Angel Ajo from comment #2) > I found the culprit. > > It's the workaround introduced here: > > https://github.com/openstack/networking-ovn/blob/ > 47983cc61194888750f1c4cb08ff350a13914903/migration/tripleo_environment/ > playbooks/roles/migration/templates/clone-br-int.sh.j2#L79 > > The moment we delete the controller on br-int, all the openflow rules are > removed = downtime. And then we wait 5 minutes after this script is ran :-/ > > > The workaround was introduced because of: > https://bugzilla.redhat.com/show_bug.cgi?id=1640045 > > > > We need to figure out if that workaround can now be removed, or otherwise, > in that script: > > 1) Save the flows > 2) Apply the workaround > 3) Restore the flows I think we have the fix available in ovs in the version - 2.10.0-21+ . So I don't think we need the workaround anymore. Correct, I verified we don't need the workaround anymore, it works without it. According to our records, this should be resolved by python-networking-ovn-5.0.2-0.20190430191338.e673daf.el7ost. This build is available now. cant verify - depends on https://bugzilla.redhat.com/show_bug.cgi?id=1694572 Closing as bug 1694572 won't make it to OSP14 before EOL. |