Hide Forgot
OVN must support migration of a port from one host to another. This is a common scenario when a vm is migrated to another host. To migrate a port in the current implementation, the external-ids:iface-id property must be removed on the source host, and added on the destination host. This change will then trigger the underlying flows to be changed. There are a few problems in this scenario: - synchronization - when do we know that the source port is unplugged, so that we can plug in the destination port, how do we make sure that the propagation of the changes to the flows takes place in the correct order - timing - what happens to the packets destined for this port in the period when the port is not plugged anywhere - duration - how long will the propagation to the flows take, the vm would be left without networking during this time Some of the possible approaches: involve libvirt in this and let it decide when to do the switching allow two ports to be active for some time (have two sets of flows active) - this would however bring some risks, like for example delivering some packages twice The convinient way for us for now (a stopgap) would be to update the source and destination external-ids:iface-id in north db in one transaction, and leave it up to ovn to handle it synchronously (can this be even done considering that it has to be done on multiple hosts?) The conversation which has preceeded this bug: Edward Haas OVN currently detects the association between an OVS port and an OVN logical switch port using the “iface-id” external-id. To make migration work as you expect, you would need to remove this ID from the OVS port on the source and add it to the OVS port at the destination when you’re ready for OVN to change the flows throughout the environment to reflect the new location. Who is setting the iface-id? libvirt? Seems to me like this mechanism is very slow: It will take too much time for the change to propagate. It will be good to have both flow rules in place until one is removed, is that possible? Russell Bryant libvirt can probably set iface-id, you can also set it using the "ovs-vsctl" command. I'm not sure how slow is "too slow". Changes should propagate through the environment in less than a second. Maybe we should do some experimentation here? It's not possible for a port to live on two hypervisors at the same time right now. I'm not sure what the desired behavior would be. Where would packets destined for that VM be sent? I'm definitely open to working on changes to making this work better. I'm just trying to explain how it would work with the current state. Dan Kenigsberg Russell, even if it is only a second, we still need to know that the change has taken place before we set iface-id on the destination, and let the VM start there. Do you know what the openstack vif driver is doing in this regard? It seems that libvirt must be involved, since only it knows when the VM state has migrated and the VM is ready to be started on the destination. Russell Bryant This conversation has convinced me that we haven't sorted out live migration properly for OpenStack, either. We need to open a bug to track this one.
Solution is under discussion upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329148.html
Upstream discussion has not progressed, there is no solution at this time.
Any updates?
Upstream discussion has stalled out, will discuss how to revive it with Russell.
Current status: high-level proposal exists, implementation does not. At this point it seems this will be included in upstream 2.9.
New scheme (discussed in last RHV/OVN monthly meeting) has been posted upstream: https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337648.html It has since been committed to the master and 2.8 branches.
The patch for this issue is in upstream master and 2.8 branches, and is contained in released version 2.8.0. Note that a follow-up enhancement patch from Russell is needed for neutron integration (not yet committed): https://patchwork.ozlabs.org/patch/809039/ Author: Lance Richardson <lrichard> Date: Sat Aug 19 16:23:34 2017 -0400 ovn: support requested-chassis option for logical switch ports This patch adds support for a "requested-chassis" option for logical switch ports. If set, the only chassis that will claim this port is the chassis identfied by this option; if already bound by another chassis, it will be released. The primary benefit of this enhancement is allowing a CMS to prevent "thrashing" in the southbound database during live migration by keeping the original chassis from attempting to re-bind a port that is in the process of migrating. This would also allow (with some additional work) RBAC to be applied to the Port_Binding table for additional security. Signed-off-by: Lance Richardson <lrichard> Signed-off-by: Russell Bryant <russell>
Hi, This is part of branch-2.8: commit f37dc273243cdc32e74e20a0b97f15c0acebc11e Author: Lance Richardson <lrichard> Date: Sat Aug 19 16:23:34 2017 -0400 ovn: support requested-chassis option for logical switch ports I am closing this bug as it's done in upstream and eventually will part of our package when it gets rebased to 2.8 or newer. If you need earlier, please re-open stating when and why it is needed. Thanks, fbl
Flavio, why wouldn't we set a proper target version, and wait for QA to properly test it before it closes? I prefer doing so in order to have an indication when downstream RHV can consume the future
Reopening to review the request in comment 10.
(In reply to Dan Kenigsberg from comment #10) > Flavio, why wouldn't we set a proper target version, and wait for QA to > properly test it before it closes? I was told that this was tracking upstream effort and there was no target release for this to be backported. > I prefer doing so in order to have an indication when downstream RHV can > consume the future 2.8 is in fdBeta, so you can already try it.
(In reply to Flavio Leitner from comment #12) > (In reply to Dan Kenigsberg from comment #10) > > Flavio, why wouldn't we set a proper target version, and wait for QA to > > properly test it before it closes? > > I was told that this was tracking upstream effort and there was no target > release for this to be backported. > > > I prefer doing so in order to have an indication when downstream RHV can > > consume the future > > 2.8 is in fdBeta, so you can already try it. RHV has a requirement for this. Please add it to the QE test plan, so we can get this feature test and won't hit any integration roadblocks.
Hi Flavio, Is it necessary to configure the "requested-chassis" option before vm migration in ovn environment,and will the option help make vm migration faster?The migration time sometimes is 4s when I haven't use the option before,hope the it will help to make faster.Need I use a special version of libvirt? Please help to explain,thanks a lot!
The requested-chassis option is designed to prevent the situation where multiple ovn-controller instances are trying to claim a specific logical port at the same time, resulting in "thrashing". The idea is that the active ovn-controller instance should be the one in the requested-chassis setting. This way, the standby ovn-controller instance will not attempt to claim the port for itself. Can you explain the procedure you are using for VM migration? It may be possible that migration can happen faster, but my instinct is that this will not greatly speed up the process.
Hi Mark, Thanks for the explanation, maybe there is some problem with my environment,I will give more test on the vm migration.
This bug is verified on the latest version: [root@dell-per730-19 ovn]# ip link add name hv1-if0 type veth peer name hv1-if1 [root@dell-per730-19 ovn]# ovn-nbctl ls-add ls0 [root@dell-per730-19 ovn]# ovn-nbctl lsp-add ls0 lsp0 [root@dell-per730-19 ovn]# ovs-vsctl -- add-port br-int hv1-if0 [root@dell-per730-19 ovn]# ovs-vsctl set interface hv1-if0 external-ids:iface-id=lsp0 [root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv1 [root@dell-per730-19 ovn]# ovn-sbctl list port_binding _uuid : b1da9bcc-5624-45c9-b3c7-118b7e145758 chassis : 096ece9a-b99f-4064-b0b4-494c9816a8d0 datapath : 81310ca0-e6d9-4f6c-bdb5-770eeddb3ef0 external_ids : {} gateway_chassis : [] logical_port : "lsp0" mac : [] nat_addresses : [] options : {requested-chassis="hv1"} parent_port : [] tag : [] tunnel_key : 1 type : "" [root@dell-per730-19 ovn]# ovn-sbctl list chassis _uuid : 772c8676-3132-40c8-a629-d02394725aa2 encaps : [d01a3931-dac5-4d75-a6ec-6ed64f76be43] external_ids : {datapath-type="", iface-types="geneve,gre,internal,lisp,patch,stt,system,tap,vxlan", ovn-bridge-mappings=""} hostname : "dell-per730-49.rhts.eng.pek2.redhat.com" name : "hv0" nb_cfg : 0 vtep_logical_switches: [] _uuid : 096ece9a-b99f-4064-b0b4-494c9816a8d0 encaps : [c5a5171f-8062-49c7-9100-51a9ed1ebfc7] external_ids : {datapath-type="", iface-types="geneve,gre,internal,lisp,patch,stt,system,tap,vxlan", ovn-bridge-mappings=""} hostname : "dell-per730-19.rhts.eng.pek2.redhat.com" name : "hv1" nb_cfg : 0 [root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv0 [root@dell-per730-19 ovn]# ovn-sbctl list port_binding _uuid : b1da9bcc-5624-45c9-b3c7-118b7e145758 chassis : 772c8676-3132-40c8-a629-d02394725aa2 datapath : 81310ca0-e6d9-4f6c-bdb5-770eeddb3ef0 external_ids : {} gateway_chassis : [] logical_port : "lsp0" mac : [] nat_addresses : [] options : {requested-chassis="hv0"} parent_port : [] tag : [] tunnel_key : 1 type : "" [root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv1 [root@dell-per730-19 ovn]# ovn-sbctl --bare --columns chassis find port_binding logical_port=366f925b-36d6-42e8-b6d2-64e251fe17c9 096ece9a-b99f-4064-b0b4-494c9816a8d0 [root@dell-per730-19 ovn]# [root@dell-per730-19 ovn]# ovs-vsctl show bde05c29-7f7a-4508-b670-f4260ea41772 Bridge br-int fail_mode: secure Port "hv1-if0" Interface "hv1-if0" Port "hv1_vm00_vnet1" Interface "hv1_vm00_vnet1" Port br-int Interface br-int type: internal Port "ovn-hv0-0" Interface "ovn-hv0-0" type: geneve options: {csum="true", key=flow, remote_ip="20.0.0.26"} ovs_version: "2.9.0" [root@dell-per730-19 ovn]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0550