Bug 2192726 - OVN migration tool doesn't send network events to nova on wiring changes
Summary: OVN migration tool doesn't send network events to nova on wiring changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ga
: 17.1
Assignee: Jakub Libosvar
QA Contact: Roman Safronov
URL:
Whiteboard:
Depends On: 2212768
Blocks: 2052341
TreeView+ depends on / blocked
 
Reported: 2023-05-02 22:06 UTC by Jakub Libosvar
Modified: 2023-08-16 01:15 UTC (History)
16 users (show)

Fixed In Version: openstack-neutron-18.6.1-1.20230518200961.el9ost
Doc Type: Known Issue
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-16 01:14:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
nova-compute.log (7.95 MB, text/plain)
2023-05-02 22:06 UTC, Jakub Libosvar
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-24709 0 None None None 2023-05-02 22:08:01 UTC
Red Hat Product Errata RHEA-2023:4577 0 None None None 2023-08-16 01:15:31 UTC

Description Jakub Libosvar 2023-05-02 22:06:17 UTC
Created attachment 1961824 [details]
nova-compute.log

Description of problem:
After a compute node has been switched from ml2/ovs to ml2/ovn and a compute node hosts a trunk VM, the trunk bridge is preserved on the compute node and traffic flows through the tbr- OVS bridge. The vif_details however no longer contain the tbr- defined. If a node is taken down and booted up again, then neutron-cleanup removes the trunk bridge but nova-compute service re-creates the trunk bridge even though it's no longer defined anywhere. The trunk bridge remains not connected to the br-int and no traffic flows.

Version-Release number of selected component (if applicable):
│openstack-nova-compute-23.2.3-1.20230321110952.d6a296e.el9ost.noarch

How reproducible:
Always

Steps to Reproduce:
1. Create a trunk VM on ml2/ovs compute node
2. Migrate compute node with ovs-agent to OVN
3. Shutdown the compute node and start it again

Actual results:
tbr- bridge exists

Expected results:
No bridge that are no longer defined should be created

Additional info:
Here is a log snipped that shows creation of the bridge. Full logs are attached.

2023-05-02 21:02:50.785 2 DEBUG nova.virt.libvirt.vif [req-82cb3f99-ef0e-4a87-a55d-87cc53f102d6 - - - - -] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=False,availability_zone='green',cell_name=None,cleaned=False,config_drive='',created_at=2023-04-20T17:24:44Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description=None,display_name='g-trunk-vm',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(8),hidden=False,host='compute-0.redhat.local',hostname='g-trunk-vm',id=2,image_ref='aef875ea-49cc-4b27-936e-00718f4270c0',info_cache=InstanceInfoCache,instance_type_id=8,kernel_id='',key_data='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCSRvTzzpecOCvfEyRBBIez4wwHkMWCof4b/N4o7IVGcLJYX98yc0f4fGsci0gjY/+U8OL1rHliyA4XU3WeOn5x0UHQY8TqbgBgR2S1vaVeqSmDus81orNl49DE9sBnXLyT35p75W48m7kgjJ9uUtqp/kCNFqwfwjVu3f44bfHMXw3sVX3Ov/2NG59Vq9khemz4K52uZf3AntLDFRpVMqgYVLHmXsiffPaAFWEtb59LcKoC3XURE3rKsr5z8/bZmV9a55HUzcXVD7E6pHcGOQnBhM/Fftl+bFhUXjaRfXBPd9zlloatyYQog8S1AqSTyFh4ShfSFAs8GmXQTGyn9wF Generated-by-Nova',key_name='demo-key',keypairs=<?>,launch_index=0,launched_at=2023-04-20T17:25:05Z,launched_on='compute-0.redhat.local',locked=False,locked_by=None,memory_mb=512,metadata={},migration_context=<?>,new_flavor=None,node='compute-0.redhat.local',numa_topology=None,old_flavor=None,os_type=None,pci_devices=<?>,pci_requests=<?>,power_state=1,progress=0,project_id='96d86b8b491243378d2009c77ca4dcda',ramdisk_id='',reservation_id='r-l4csbcjj',resources=<?>,root_device_name='/dev/vda',root_gb=10,security_groups=<?>,services=<?>,shutdown_terminate=False,system_metadata={boot_roles='member,reader,admin',image_base_image_ref='aef875ea-49cc-4b27-936e-00718f4270c0',image_container_format='bare',image_disk_format='qcow2',image_hw_cdrom_bus='sata',image_hw_disk_bus='virtio',image_hw_input_bus='usb',image_hw_machine_type='pc-q35-rhel9.0.0',image_hw_pointer_model='usbtablet',image_hw_video_model='virtio',image_hw_vif_model='virtio',image_min_disk='10',image_min_ram='0',image_owner_specified.openstack.md5='',image_owner_specified.openstack.object='images/centos7',image_owner_specified.openstack.sha256='',owner_project_name='admin',owner_user_name='admin'},tags=<?>,task_state=None,terminated_at=None,trusted_certs=<?>,updated_at=2023-05-02T20:54:49Z,user_data=None,user_id='63c3a1c9125f4e62bc7dda59253a48fe',uuid=d843c420-9ad1-473a-8d67-d775112f7516,vcpu_model=<?>,vcpus=1,vm_mode=None,vm_state='active') vif={"id": "f8e11605-4a18-4c88-a553-378a085173c4", "address": "fa:16:3e:db:26:a3", "network": {"id": "33116c2c-2c5c-4e7a-81ee-b4cdc2dff9a4", "bridge": "tbr-411ed91c-9", "label": "public", "subnets": [{"cidr": "2620:52:0:13b8::/64", "dns": [], "gateway": {"address": "2620:52:0:13b8::fe", "type": "gateway", "version": 6, "meta": {}}, "ips": [{"address": "2620:52:0:13b8:f816:3eff:fedb:26a3", "type": "fixed", "version": 6, "meta": {}, "floating_ips": []}], "routes": [], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "2620:52:0:13b8:f816:3eff:fe96:627"}}, {"cidr": "10.0.0.0/24", "dns": [], "gateway": {"address": "10.0.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.0.0.185", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {"dhcp_server": "10.0.0.152"}}], "meta": {"injected": false, "tenant_id": "96d86b8b491243378d2009c77ca4dcda", "mtu": 1500, "physical_network": "datacentre", "tunneled": false}}, "type": "ovs", "details": {"connectivity": "l2", "datapath_type": "system", "ovs_hybrid_plug": true, "port_filter": true, "bridge_name": "tbr-411ed91c-9"}, "devname": "tapf8e11605-4a", "ovs_interfaceid": "f8e11605-4a18-4c88-a553-378a085173c4", "qbh_params": null, "qbg_params": null, "active": true, "vnic_type": "normal", "profile": {}, "preserve_on_delete": true, "meta": {}} plug /usr/lib/python3.9/site-packages/nova/virt/libvirt/vif.py:707
2023-05-02 21:02:50.786 2 DEBUG nova.network.os_vif_util [req-82cb3f99-ef0e-4a87-a55d-87cc53f102d6 - - - - -] Converting VIF {"id": "f8e11605-4a18-4c88-a553-378a085173c4", "address": "fa:16:3e:db:26:a3", "network": {"id": "33116c2c-2c5c-4e7a-81ee-b4cdc2dff9a4", "bridge": "tbr-411ed91c-9", "label": "public", "subnets": [{"cidr": "2620:52:0:13b8::/64", "dns": [], "gateway": {"address": "2620:52:0:13b8::fe", "type": "gateway", "version": 6, "meta": {}}, "ips": [{"address": "2620:52:0:13b8:f816:3eff:fedb:26a3", "type": "fixed", "version": 6, "meta": {}, "floating_ips": []}], "routes": [], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "2620:52:0:13b8:f816:3eff:fe96:627"}}, {"cidr": "10.0.0.0/24", "dns": [], "gateway": {"address": "10.0.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.0.0.185", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {"dhcp_server": "10.0.0.152"}}], "meta": {"injected": false, "tenant_id": "96d86b8b491243378d2009c77ca4dcda", "mtu": 1500, "physical_network": "datacentre", "tunneled": false}}, "type": "ovs", "details": {"connectivity": "l2", "datapath_type": "system", "ovs_hybrid_plug": true, "port_filter": true, "bridge_name": "tbr-411ed91c-9"}, "devname": "tapf8e11605-4a", "ovs_interfaceid": "f8e11605-4a18-4c88-a553-378a085173c4", "qbh_params": null, "qbg_params": null, "active": true, "vnic_type": "normal", "profile": {}, "preserve_on_delete": true, "meta": {}} nova_to_osvif_vif /usr/lib/python3.9/site-packages/nova/network/os_vif_util.py:501
2023-05-02 21:02:52.440 2 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddBridgeCommand(name=tbr-411ed91c-9, may_exist=True, datapath_type=system) do_commit /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89
2023-05-02 21:02:52.453 2 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=tbr-411ed91c-9, port=qvof8e11605-4a, may_exist=True) do_commit /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89

Comment 1 smooney 2023-05-03 12:59:01 UTC
quicking looking at this it looks like the this is not a nova bug but a bug in the migration procedure/tooling.

if the network info cache is not upsted before the reboot then its expected that we would continue to use the old bridges.

By design, nova does not call neutron when hard rebooting or powering on a VM it only uses the info form the network info cache.

For context nova does not support changing the vif type/ml2 driver on a port that this bound to vm so the procedure we have in tripleo is not support by nova upstream.
my guess would be that the current migration tooling/procedure is missing forcing a cache update by sending a network-vif-changed external event for each port that is migrated.

if that was done it would update the relevnet network info cache info for the port and the port would be recreated with the ovn config after the host reboot.

we have a periodic task that will heal this over time but i don't think this is a valid nova bug its a but in the immigration tooling/procedure.

looking at https://github.com/openstack/neutron/blob/master/tools/ovn_migration/migrate-to-ovn.yml i don't see anything that would curl the nova API and send those event so unless that is done by neutron itself that
is why this is happening.

Comment 2 Jakub Libosvar 2023-05-03 19:37:59 UTC
Thanks a lot Sean! I wrote a tool to notify the nova and it really fixes the problem.

Comment 25 Roman Safronov 2023-06-22 10:37:04 UTC
Verified on RHOS-17.1-RHEL-9-20230621.n.1 with python3-neutron-18.6.1-1.20230518200969.el9ost.noarch and openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200969.el9ost.noarch
Verified that in case compute node that keeps VMs with trunk ports was rebooted after migrating to ovn, tbr bridges no longer exist and the VMs are accessible when they are re-started.

Comment 33 errata-xmlrpc 2023-08-16 01:14:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2023:4577


Note You need to log in before you can comment on or make changes to this bug.