Created attachment 1961824 [details] nova-compute.log Description of problem: After a compute node has been switched from ml2/ovs to ml2/ovn and a compute node hosts a trunk VM, the trunk bridge is preserved on the compute node and traffic flows through the tbr- OVS bridge. The vif_details however no longer contain the tbr- defined. If a node is taken down and booted up again, then neutron-cleanup removes the trunk bridge but nova-compute service re-creates the trunk bridge even though it's no longer defined anywhere. The trunk bridge remains not connected to the br-int and no traffic flows. Version-Release number of selected component (if applicable): │openstack-nova-compute-23.2.3-1.20230321110952.d6a296e.el9ost.noarch How reproducible: Always Steps to Reproduce: 1. Create a trunk VM on ml2/ovs compute node 2. Migrate compute node with ovs-agent to OVN 3. Shutdown the compute node and start it again Actual results: tbr- bridge exists Expected results: No bridge that are no longer defined should be created Additional info: Here is a log snipped that shows creation of the bridge. Full logs are attached. 2023-05-02 21:02:50.785 2 DEBUG nova.virt.libvirt.vif [req-82cb3f99-ef0e-4a87-a55d-87cc53f102d6 - - - - -] vif_type=ovs instance=Instance(access_ip_v4=None,access_ip_v6=None,architecture=None,auto_disk_config=False,availability_zone='green',cell_name=None,cleaned=False,config_drive='',created_at=2023-04-20T17:24:44Z,default_ephemeral_device=None,default_swap_device=None,deleted=False,deleted_at=None,device_metadata=<?>,disable_terminate=False,display_description=None,display_name='g-trunk-vm',ec2_ids=<?>,ephemeral_gb=0,ephemeral_key_uuid=None,fault=<?>,flavor=Flavor(8),hidden=False,host='compute-0.redhat.local',hostname='g-trunk-vm',id=2,image_ref='aef875ea-49cc-4b27-936e-00718f4270c0',info_cache=InstanceInfoCache,instance_type_id=8,kernel_id='',key_data='ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDCSRvTzzpecOCvfEyRBBIez4wwHkMWCof4b/N4o7IVGcLJYX98yc0f4fGsci0gjY/+U8OL1rHliyA4XU3WeOn5x0UHQY8TqbgBgR2S1vaVeqSmDus81orNl49DE9sBnXLyT35p75W48m7kgjJ9uUtqp/kCNFqwfwjVu3f44bfHMXw3sVX3Ov/2NG59Vq9khemz4K52uZf3AntLDFRpVMqgYVLHmXsiffPaAFWEtb59LcKoC3XURE3rKsr5z8/bZmV9a55HUzcXVD7E6pHcGOQnBhM/Fftl+bFhUXjaRfXBPd9zlloatyYQog8S1AqSTyFh4ShfSFAs8GmXQTGyn9wF Generated-by-Nova',key_name='demo-key',keypairs=<?>,launch_index=0,launched_at=2023-04-20T17:25:05Z,launched_on='compute-0.redhat.local',locked=False,locked_by=None,memory_mb=512,metadata={},migration_context=<?>,new_flavor=None,node='compute-0.redhat.local',numa_topology=None,old_flavor=None,os_type=None,pci_devices=<?>,pci_requests=<?>,power_state=1,progress=0,project_id='96d86b8b491243378d2009c77ca4dcda',ramdisk_id='',reservation_id='r-l4csbcjj',resources=<?>,root_device_name='/dev/vda',root_gb=10,security_groups=<?>,services=<?>,shutdown_terminate=False,system_metadata={boot_roles='member,reader,admin',image_base_image_ref='aef875ea-49cc-4b27-936e-00718f4270c0',image_container_format='bare',image_disk_format='qcow2',image_hw_cdrom_bus='sata',image_hw_disk_bus='virtio',image_hw_input_bus='usb',image_hw_machine_type='pc-q35-rhel9.0.0',image_hw_pointer_model='usbtablet',image_hw_video_model='virtio',image_hw_vif_model='virtio',image_min_disk='10',image_min_ram='0',image_owner_specified.openstack.md5='',image_owner_specified.openstack.object='images/centos7',image_owner_specified.openstack.sha256='',owner_project_name='admin',owner_user_name='admin'},tags=<?>,task_state=None,terminated_at=None,trusted_certs=<?>,updated_at=2023-05-02T20:54:49Z,user_data=None,user_id='63c3a1c9125f4e62bc7dda59253a48fe',uuid=d843c420-9ad1-473a-8d67-d775112f7516,vcpu_model=<?>,vcpus=1,vm_mode=None,vm_state='active') vif={"id": "f8e11605-4a18-4c88-a553-378a085173c4", "address": "fa:16:3e:db:26:a3", "network": {"id": "33116c2c-2c5c-4e7a-81ee-b4cdc2dff9a4", "bridge": "tbr-411ed91c-9", "label": "public", "subnets": [{"cidr": "2620:52:0:13b8::/64", "dns": [], "gateway": {"address": "2620:52:0:13b8::fe", "type": "gateway", "version": 6, "meta": {}}, "ips": [{"address": "2620:52:0:13b8:f816:3eff:fedb:26a3", "type": "fixed", "version": 6, "meta": {}, "floating_ips": []}], "routes": [], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "2620:52:0:13b8:f816:3eff:fe96:627"}}, {"cidr": "10.0.0.0/24", "dns": [], "gateway": {"address": "10.0.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.0.0.185", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {"dhcp_server": "10.0.0.152"}}], "meta": {"injected": false, "tenant_id": "96d86b8b491243378d2009c77ca4dcda", "mtu": 1500, "physical_network": "datacentre", "tunneled": false}}, "type": "ovs", "details": {"connectivity": "l2", "datapath_type": "system", "ovs_hybrid_plug": true, "port_filter": true, "bridge_name": "tbr-411ed91c-9"}, "devname": "tapf8e11605-4a", "ovs_interfaceid": "f8e11605-4a18-4c88-a553-378a085173c4", "qbh_params": null, "qbg_params": null, "active": true, "vnic_type": "normal", "profile": {}, "preserve_on_delete": true, "meta": {}} plug /usr/lib/python3.9/site-packages/nova/virt/libvirt/vif.py:707 2023-05-02 21:02:50.786 2 DEBUG nova.network.os_vif_util [req-82cb3f99-ef0e-4a87-a55d-87cc53f102d6 - - - - -] Converting VIF {"id": "f8e11605-4a18-4c88-a553-378a085173c4", "address": "fa:16:3e:db:26:a3", "network": {"id": "33116c2c-2c5c-4e7a-81ee-b4cdc2dff9a4", "bridge": "tbr-411ed91c-9", "label": "public", "subnets": [{"cidr": "2620:52:0:13b8::/64", "dns": [], "gateway": {"address": "2620:52:0:13b8::fe", "type": "gateway", "version": 6, "meta": {}}, "ips": [{"address": "2620:52:0:13b8:f816:3eff:fedb:26a3", "type": "fixed", "version": 6, "meta": {}, "floating_ips": []}], "routes": [], "version": 6, "meta": {"ipv6_address_mode": "slaac", "dhcp_server": "2620:52:0:13b8:f816:3eff:fe96:627"}}, {"cidr": "10.0.0.0/24", "dns": [], "gateway": {"address": "10.0.0.1", "type": "gateway", "version": 4, "meta": {}}, "ips": [{"address": "10.0.0.185", "type": "fixed", "version": 4, "meta": {}, "floating_ips": []}], "routes": [], "version": 4, "meta": {"dhcp_server": "10.0.0.152"}}], "meta": {"injected": false, "tenant_id": "96d86b8b491243378d2009c77ca4dcda", "mtu": 1500, "physical_network": "datacentre", "tunneled": false}}, "type": "ovs", "details": {"connectivity": "l2", "datapath_type": "system", "ovs_hybrid_plug": true, "port_filter": true, "bridge_name": "tbr-411ed91c-9"}, "devname": "tapf8e11605-4a", "ovs_interfaceid": "f8e11605-4a18-4c88-a553-378a085173c4", "qbh_params": null, "qbg_params": null, "active": true, "vnic_type": "normal", "profile": {}, "preserve_on_delete": true, "meta": {}} nova_to_osvif_vif /usr/lib/python3.9/site-packages/nova/network/os_vif_util.py:501 2023-05-02 21:02:52.440 2 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddBridgeCommand(name=tbr-411ed91c-9, may_exist=True, datapath_type=system) do_commit /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89 2023-05-02 21:02:52.453 2 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): AddPortCommand(bridge=tbr-411ed91c-9, port=qvof8e11605-4a, may_exist=True) do_commit /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:89
quicking looking at this it looks like the this is not a nova bug but a bug in the migration procedure/tooling. if the network info cache is not upsted before the reboot then its expected that we would continue to use the old bridges. By design, nova does not call neutron when hard rebooting or powering on a VM it only uses the info form the network info cache. For context nova does not support changing the vif type/ml2 driver on a port that this bound to vm so the procedure we have in tripleo is not support by nova upstream. my guess would be that the current migration tooling/procedure is missing forcing a cache update by sending a network-vif-changed external event for each port that is migrated. if that was done it would update the relevnet network info cache info for the port and the port would be recreated with the ovn config after the host reboot. we have a periodic task that will heal this over time but i don't think this is a valid nova bug its a but in the immigration tooling/procedure. looking at https://github.com/openstack/neutron/blob/master/tools/ovn_migration/migrate-to-ovn.yml i don't see anything that would curl the nova API and send those event so unless that is done by neutron itself that is why this is happening.
Thanks a lot Sean! I wrote a tool to notify the nova and it really fixes the problem.
Verified on RHOS-17.1-RHEL-9-20230621.n.1 with python3-neutron-18.6.1-1.20230518200969.el9ost.noarch and openstack-neutron-ovn-migration-tool-18.6.1-1.20230518200969.el9ost.noarch Verified that in case compute node that keeps VMs with trunk ports was rebooted after migrating to ovn, tbr bridges no longer exist and the VMs are accessible when they are re-started.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 17.1 (Wallaby)), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2023:4577