Description of problem: After restart of ovs-vswitchd process on node with L3 agent, gateway ports (qg-XXX) and (probably) also qr-XXX ports are gone from qrouter-XXX and snat-XXX namespaces and aren't recreated. Restart of neutron-l3-agent fixes this. Version-Release number of selected component (if applicable): Tested on latest OSP-14 with dvr (and also on non distributed router in same deployment) but I think it affects also older releases. How reproducible: 100% times, Steps to Reproduce: 1. systemctl restart ovs-vswitchd - on node with L3 agent running, Actual results: qg-XXX and qr-XXX ports are existing in openvswitch br-int bridge but not in namespaces. Expected results: Ports should be moved to namespaces and configured properly Additional info:
From an HA perspective, if ovs-vswitchd crashes, OVS should respawn it. At that point, since the L3/DHCP agents use OVS internal ports (qr, qg devices), and those are in the OVSDB, OVS should recreate them. 1) Is this issue also happening on OSP 10 and 13? Can we determine when this started happening, or did it always? 2) Why is OVS itself not recreating the router/dhcp ports? We need to answer (1) to determine if this should block OSP 14 or not. If this is a regression in OVS 2.10 then this might be a blocker.
@Assaf: OVS recreates those ports but they aren't moved to qrouter/qdhcp namespace and that should be handled by L3/DHCP agent. I wasn't able to test it on OSP-13 or earlier yet but I suppose that it's the same on each release.
the problem with the fip connectivity is in osp14 but not in osp13. And the reason is that in osp14 the fg-xxx interface of the fip namespace is deleted when ovs process is started: [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# systemctl stop ovs-vswitchd [root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-d10c6892-c valid_lft forever preferred_lft forever inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link valid_lft forever preferred_lft forever 24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:feb7:9c5/64 scope link valid_lft forever preferred_lft forever [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-d10c6892-c valid_lft forever preferred_lft forever inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link valid_lft forever preferred_lft forever 24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:feb7:9c5/64 scope link valid_lft forever preferred_lft forever [root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-d10c6892-c valid_lft forever preferred_lft forever inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link valid_lft forever preferred_lft forever 24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:feb7:9c5/64 scope link valid_lft forever preferred_lft forever [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# systemctl start ovs-vswitchd [root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-d10c6892-c valid_lft forever preferred_lft forever inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link valid_lft forever preferred_lft forever [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# [root@compute-0 heat-admin]# [root@compute-0 heat-admin]#
The change that trigger the delete of the fg-xxx interface is between: ovs_version: "2.9.0" and ovs_version: "2.10.0" Because we have tested with osp13 and ovs 2.10.0 and then the problem is reproduced: [root@compute-0 ovs]# [root@compute-0 ovs]# yum remove openvswitch Loaded plugins: product-id, search-disabled-repos, subscription-manager This system is not registered with an entitlement server. You can use subscription-manager to register. Resolving Dependencies --> Running transaction check ---> Package openvswitch.x86_64 0:2.9.0-56.el7fdp will be erased --> Processing Dependency: openvswitch for package: openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64 --> Processing Dependency: openvswitch >= 2.8.0 for package: python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch --> Processing Dependency: openvswitch for package: openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64 --> Processing Dependency: openvswitch for package: 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch --> Processing Dependency: openvswitch for package: openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64 --> Running transaction check ---> Package openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost will be erased ---> Package openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp will be erased ---> Package openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp will be erased ---> Package openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp will be erased ---> Package python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost will be erased --> Finished Dependency Resolution Dependencies Resolved =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Removing: openvswitch x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 22 M Removing for dependencies: openstack-neutron-openvswitch noarch 1:12.0.4-2.el7ost @rhos-13.0-signed 24 k openvswitch-ovn-central x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 2.4 M openvswitch-ovn-common x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 6.7 M openvswitch-ovn-host x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 2.6 M python-networking-ovn-metadata-agent noarch 4.0.3-1.el7ost @rhos-13.0-signed 14 k Transaction Summary =================================================================================================================================================================================================================== Remove 1 Package (+5 Dependent packages) Installed size: 34 M Is this ok [y/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Erasing : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch 1/6 Erasing : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch 2/6 Erasing : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64 3/6 Erasing : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64 4/6 Erasing : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64 5/6 Erasing : openvswitch-2.9.0-56.el7fdp.x86_64 6/6 warning: /etc/sysconfig/openvswitch saved as /etc/sysconfig/openvswitch.rpmsave Verifying : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch 1/6 Verifying : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64 2/6 Verifying : openvswitch-2.9.0-56.el7fdp.x86_64 3/6 Verifying : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64 4/6 Verifying : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64 5/6 Verifying : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch 6/6 Removed: openvswitch.x86_64 0:2.9.0-56.el7fdp Dependency Removed: openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost Complete! [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# yum install * Loaded plugins: product-id, search-disabled-repos, subscription-manager This system is not registered with an entitlement server. You can use subscription-manager to register. Examining openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 Marking openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm to be installed Examining python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 Marking python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed Resolving Dependencies --> Running transaction check ---> Package openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp will be installed ---> Package python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed --> Finished Dependency Resolution Dependencies Resolved =================================================================================================================================================================================================================== Package Arch Version Repository Size =================================================================================================================================================================================================================== Installing: openvswitch2.10 x86_64 2.10.0-28.el7fdp /openvswitch2.10-2.10.0-28.el7fdp.x86_64 31 M openvswitch2.10-debuginfo x86_64 2.10.0-28.el7fdp /openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 203 M openvswitch2.10-devel x86_64 2.10.0-28.el7fdp /openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 659 k openvswitch2.10-ovn-central x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 2.9 M openvswitch2.10-ovn-common x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 8.1 M openvswitch2.10-ovn-host x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 2.8 M openvswitch2.10-ovn-vtep x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 2.7 M python-openvswitch2.10 x86_64 2.10.0-28.el7fdp /python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 1.2 M Transaction Summary =================================================================================================================================================================================================================== Install 8 Packages Total size: 252 M Installed size: 252 M Is this ok [y/d/N]: y Downloading packages: Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : openvswitch2.10-2.10.0-28.el7fdp.x86_64 1/8 Installing : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 2/8 Installing : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 3/8 Installing : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 4/8 Installing : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 5/8 Installing : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 6/8 Installing : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 7/8 Installing : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 8/8 Verifying : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 1/8 Verifying : openvswitch2.10-2.10.0-28.el7fdp.x86_64 2/8 Verifying : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 3/8 Verifying : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 4/8 Verifying : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 5/8 Verifying : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 6/8 Verifying : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 7/8 Verifying : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 8/8 Installed: openvswitch2.10.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp Complete! [root@compute-0 ovs]# ovs-vsctl show ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory) [root@compute-0 ovs]# systemctl start openvswitch [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# ovs-vsctl show dc15b921-ef3d-4045-b3e3-991c78828fec Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int Controller "tcp:127.0.0.1:6633" fail_mode: secure Port "qr-c8eec4e7-da" tag: 1 Interface "qr-c8eec4e7-da" type: internal Port "fg-3272cea0-c4" tag: 2 Interface "fg-3272cea0-c4" type: internal Port br-int Interface br-int type: internal Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port "qvo9ee2dc62-95" tag: 1 Interface "qvo9ee2dc62-95" Port "qvodd39c848-c9" tag: 1 Interface "qvodd39c848-c9" Port int-br-ex Interface int-br-ex type: patch options: {peer=phy-br-ex} Port int-br-isolated Interface int-br-isolated type: patch options: {peer=phy-br-isolated} Bridge br-isolated Controller "tcp:127.0.0.1:6633" fail_mode: secure Port "vlan20" tag: 20 Interface "vlan20" type: internal Port phy-br-isolated Interface phy-br-isolated type: patch options: {peer=int-br-isolated} Port br-isolated Interface br-isolated type: internal Port "vlan30" tag: 30 Interface "vlan30" type: internal Port "vlan40" tag: 40 Interface "vlan40" type: internal Port "vlan50" tag: 50 Interface "vlan50" type: internal Port "eth1" Interface "eth1" Bridge br-tun Controller "tcp:127.0.0.1:6633" fail_mode: secure Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Port "vxlan-ac110220" Interface "vxlan-ac110220" type: vxlan options: {df_default="true", in_key=flow, local_ip="172.17.2.18", out_key=flow, remote_ip="172.17.2.32"} Bridge br-ex Controller "tcp:127.0.0.1:6633" fail_mode: secure Port br-ex Interface br-ex type: internal Port "eth2" Interface "eth2" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} ovs_version: "2.10.0" [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# docker ps | grep neutron 1d9a0b5ab241 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3 "ip netns exec qro..." About an hour ago Up About an hour neutron-haproxy-qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e 77ed6b9bf731 192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_ovs_agent 9991f51a8960 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_l3_agent a41a88a83c69 192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_metadata_agent [root@compute-0 ovs]# docker restart neutron_l3_agent neutron_l3_agent [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# ip netns fip-83153130-bdfb-49d0-8402-61a3db534ef6 (id: 1) qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e (id: 0) [root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-a70e436a-f valid_lft forever preferred_lft forever inet6 fe80::6863:91ff:fe48:f1de/64 scope link valid_lft forever preferred_lft forever 40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe4e:78a0/64 scope link valid_lft forever preferred_lft forever [root@compute-0 ovs]# systemctl stop openvswitch [root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-a70e436a-f valid_lft forever preferred_lft forever inet6 fe80::6863:91ff:fe48:f1de/64 scope link valid_lft forever preferred_lft forever 40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe4e:78a0/64 scope link valid_lft forever preferred_lft forever [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# systemctl start openvswitch [root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-a70e436a-f valid_lft forever preferred_lft forever inet6 fe80::6863:91ff:fe48:f1de/64 scope link valid_lft forever preferred_lft forever [root@compute-0 ovs]# [root@compute-0 ovs]# [root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-a70e436a-f valid_lft forever preferred_lft forever inet6 fe80::6863:91ff:fe48:f1de/64 scope link valid_lft forever preferred_lft forever [root@compute-0 ovs]# [root@compute-0 ovs]#
Hi, OVS updates within the same stream does not restart the service today, so the daemons in memory are from the removed package. That's why you notice no issues. Same happens with updates within 2.10 stream. No restart, no ports are gone. However, rebasing from 2.9 to 2.10 force a stop/start. Then internal ports are recreated, but OVS has no control over the network namespaces. Please confirm if you see the behavior change between 2.9 or between 2.10 releases or only if you rebase from 2.9 to 2.10. Thanks, fbl
Thanks for looking Flavio, let me try and re-phrase what Candido was seeing, the failure wasn't actually related to updating OVS on the systems. On OSP 13 with OVS 2.9, a restart of ovs-vswitchd did not affect the running neutron l3-agent process (in a container) - all of its OVS interfaces still existed afterwards inside their respective namespaces. On OSP 13/14, with OVS 2.10, a restart of ovs-vswitchd removed all the OVS interfaces, affecting the l3-agent, forcing a restart to recreate all the devices. That different was unexpected, and since the l3-agent has no monitor to notify it the interfaces are gone, it's seen as a loss of connectivity to instances. I realize it can be argued the agent *should* monitor for such things and re-synchronize itself, that might be something we need to look into going forward.
Thanks Brian, that was helpful. OVS 2.9 ======= [root@localhost ~]# systemctl restart openvswitch [root@localhost ~]# ovs-vsctl show 8b1361d5-c7a1-44bd-9cc1-550210771aa1 Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "int1" Interface "int1" type: internal Port "int0" Interface "int0" type: internal Port "eth0" Interface "eth0" ovs_version: "2.9.0" # ip netns exec ns0 ip a show 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 7: int0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether ba:a1:9e:a1:bf:66 brd ff:ff:ff:ff:ff:ff inet 192.168.1.254/24 scope global int0 valid_lft forever preferred_lft forever OVS 2.10 ======== [root@localhost ~]# systemctl restart openvswitch # ovs-vsctl show 8b1361d5-c7a1-44bd-9cc1-550210771aa1 Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "int1" Interface "int1" type: internal Port "int0" Interface "int0" type: internal Port "eth0" Interface "eth0" ovs_version: "2.10.0" # ip netns exec ns0 ip a show 1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 Problem confirmed on my testbed. Going to debug this further. Thanks fbl
The problem happens in upstream as well.
This is the commit introducing the issue: https://github.com/openvswitch/ovs/commit/7521e0cf9e88a62f2feff4e7253654557f94877e
Reported upstream to the patch's author: https://mail.openvswitch.org/pipermail/ovs-dev/2018-December/354301.html ---8<--- Hi Ben, This patch introduced a regression in OSP environments using internal ports in other netns. Their networking configuration is lost when the service is restarted because the ports are recreated now. Before the patch it checked using netlink if the port with a specific "name" was already there. I believe that's the check you referred as expensive below. Anyways, the check is a lookup in all ports attached to the DP regardless of the port's netns. After the patch it relies on the kernel to identify that situation. Unfortunately the only protection there is register_netdevice() which fails only if the port with that name exists in the current netns. If the port is in another netns, it will get a new dp_port and because of that userspace will delete the old port. At this point the original port is gone from the other netns and there a fresh port in the current netns. I think the optimization is a good idea, so I came up with this kernel patch to make sure we are not adding another vport with the same name. It resolved the issue in my small env (want to do more tests though). diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 252adfb6fc0b..291b4a71a910 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -2022,6 +2022,11 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info) return -ENOMEM; ovs_lock(); + vport = lookup_vport(sock_net(skb->sk), ovs_header, a); + err = -EEXIST; + if (!IS_ERR(vport) && vport) + goto exit_unlock_free; + restart: dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex); err = -ENODEV; However, OSP users using unpatched kernel with OVS 2.10 might trigger the bug, so I wonder if we should revert the patch in 2.10 and work on an improved fix for 2.11. Perhaps we can detect if the kernel fix is in there (or not) by trying to add the same port twice once and use that as a hint. Perhaps there is something cheaper in dpif to verify if the vport is there that is not vulnerable to races. Thanks, fbl ---8<----
the fix was verified : 11:06:51 . /tmp/ir-venv-awm3Iep/bin/activate 11:06:51 infrared tripleo-undercloud -o undercloud_settings.yml --mirror tlv --version 14 --build 2018-12-13.3 --ssl false --tls-everywhere false 11:06:51 root@compute-2 heat-admin]# [root@compute-2 heat-admin]# [root@compute-2 heat-admin]# ps -ef | grep ovs 42435 30879 30860 0 13:12 ? 00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh root 64679 30753 0 13:36 ? 00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock root 75578 72162 0 13:59 pts/2 00:00:00 grep --color=auto ovs [root@compute-2 heat-admin]# ps -ef | grep ovs 42435 30879 30860 0 13:12 ? 00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh root 64679 30753 0 13:36 ? 00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock root 75590 72162 0 13:59 pts/2 00:00:00 grep --color=auto ovs [root@compute-2 heat-admin]# systemctl start ovs-vswitchd [root@compute-2 heat-admin]# ip netns exec fip-d3b6e13f-5062-4bf2-bc11-6f77748453fe ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: fpr-e55a92b8-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether f6:0f:8c:32:f6:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 169.254.106.115/31 scope global fpr-e55a92b8-0 valid_lft forever preferred_lft forever inet6 fe80::f40f:8cff:fe32:f68e/64 scope link valid_lft forever preferred_lft forever 23: fg-d51b8944-77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether fa:16:3e:77:06:c0 brd ff:ff:ff:ff:ff:ff inet 10.0.0.225/24 brd 10.0.0.255 scope global fg-d51b8944-77 valid_lft forever preferred_lft forever inet6 fe80::f816:3eff:fe77:6c0/64 scope link valid_lft forever preferred_lft forever [root@compute-2 heat-admin]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045