Bug 1654371
| Summary: | OVS restart kills router and DHCP ports | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Slawek Kaplonski <skaplons> | |
| Component: | openvswitch | Assignee: | Flavio Leitner <fleitner> | |
| Status: | CLOSED ERRATA | QA Contact: | Roee Agiman <ragiman> | |
| Severity: | urgent | Docs Contact: | ||
| Priority: | urgent | |||
| Version: | 14.0 (Rocky) | CC: | amuller, apevec, atragler, bcafarel, bhaley, ccamposr, chrisw, fbaudin, fleitner, lmarsh, nyechiel, rhos-maint, skaplons, tredaelli | |
| Target Milestone: | rc | Keywords: | Triaged | |
| Target Release: | 14.0 (Rocky) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | openvswitch2.10-2.10.0-28.el7fdp.1 | Doc Type: | Bug Fix | |
| Doc Text: |
Restarting the service causes internal ports moved to another networking namespace to be recreated. When this happens, the ports lose their networking configuration and are recreated in the wrong networking namespace. With this release, the code does not recreate the ports when the service is restarted, which allows the ports to keep their networking configuration.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1657946 (view as bug list) | Environment: | ||
| Last Closed: | 2019-01-11 11:55:06 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
|
Description
Slawek Kaplonski
2018-11-28 15:53:25 UTC
From an HA perspective, if ovs-vswitchd crashes, OVS should respawn it. At that point, since the L3/DHCP agents use OVS internal ports (qr, qg devices), and those are in the OVSDB, OVS should recreate them. 1) Is this issue also happening on OSP 10 and 13? Can we determine when this started happening, or did it always? 2) Why is OVS itself not recreating the router/dhcp ports? We need to answer (1) to determine if this should block OSP 14 or not. If this is a regression in OVS 2.10 then this might be a blocker. @Assaf: OVS recreates those ports but they aren't moved to qrouter/qdhcp namespace and that should be handled by L3/DHCP agent. I wasn't able to test it on OSP-13 or earlier yet but I suppose that it's the same on each release. the problem with the fip connectivity is in osp14 but not in osp13.
And the reason is that in osp14 the fg-xxx interface of the fip namespace is deleted when ovs process is started:
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]# systemctl stop ovs-vswitchd
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-d10c6892-c
valid_lft forever preferred_lft forever
inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link
valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feb7:9c5/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-d10c6892-c
valid_lft forever preferred_lft forever
inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link
valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feb7:9c5/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-d10c6892-c
valid_lft forever preferred_lft forever
inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link
valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:feb7:9c5/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]# systemctl start ovs-vswitchd
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-d10c6892-c
valid_lft forever preferred_lft forever
inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]#
[root@compute-0 heat-admin]#
The change that trigger the delete of the fg-xxx interface is between:
ovs_version: "2.9.0"
and
ovs_version: "2.10.0"
Because we have tested with osp13 and ovs 2.10.0 and then the problem is reproduced:
[root@compute-0 ovs]#
[root@compute-0 ovs]# yum remove openvswitch
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package openvswitch.x86_64 0:2.9.0-56.el7fdp will be erased
--> Processing Dependency: openvswitch for package: openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64
--> Processing Dependency: openvswitch >= 2.8.0 for package: python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch
--> Processing Dependency: openvswitch for package: openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64
--> Processing Dependency: openvswitch for package: 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch
--> Processing Dependency: openvswitch for package: openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64
--> Running transaction check
---> Package openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost will be erased
---> Package openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost will be erased
--> Finished Dependency Resolution
Dependencies Resolved
===================================================================================================================================================================================================================
Package Arch Version Repository Size
===================================================================================================================================================================================================================
Removing:
openvswitch x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 22 M
Removing for dependencies:
openstack-neutron-openvswitch noarch 1:12.0.4-2.el7ost @rhos-13.0-signed 24 k
openvswitch-ovn-central x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 2.4 M
openvswitch-ovn-common x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 6.7 M
openvswitch-ovn-host x86_64 2.9.0-56.el7fdp @rhos-13.0-signed 2.6 M
python-networking-ovn-metadata-agent noarch 4.0.3-1.el7ost @rhos-13.0-signed 14 k
Transaction Summary
===================================================================================================================================================================================================================
Remove 1 Package (+5 Dependent packages)
Installed size: 34 M
Is this ok [y/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Erasing : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch 1/6
Erasing : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch 2/6
Erasing : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64 3/6
Erasing : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64 4/6
Erasing : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64 5/6
Erasing : openvswitch-2.9.0-56.el7fdp.x86_64 6/6
warning: /etc/sysconfig/openvswitch saved as /etc/sysconfig/openvswitch.rpmsave
Verifying : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch 1/6
Verifying : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64 2/6
Verifying : openvswitch-2.9.0-56.el7fdp.x86_64 3/6
Verifying : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64 4/6
Verifying : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64 5/6
Verifying : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch 6/6
Removed:
openvswitch.x86_64 0:2.9.0-56.el7fdp
Dependency Removed:
openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp
python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost
Complete!
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# yum install *
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Examining openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: python-openvswitch2.10-2.10.0-28.el7fdp.x86_64
Marking python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed
--> Finished Dependency Resolution
Dependencies Resolved
===================================================================================================================================================================================================================
Package Arch Version Repository Size
===================================================================================================================================================================================================================
Installing:
openvswitch2.10 x86_64 2.10.0-28.el7fdp /openvswitch2.10-2.10.0-28.el7fdp.x86_64 31 M
openvswitch2.10-debuginfo x86_64 2.10.0-28.el7fdp /openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 203 M
openvswitch2.10-devel x86_64 2.10.0-28.el7fdp /openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 659 k
openvswitch2.10-ovn-central x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 2.9 M
openvswitch2.10-ovn-common x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 8.1 M
openvswitch2.10-ovn-host x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 2.8 M
openvswitch2.10-ovn-vtep x86_64 2.10.0-28.el7fdp /openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 2.7 M
python-openvswitch2.10 x86_64 2.10.0-28.el7fdp /python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 1.2 M
Transaction Summary
===================================================================================================================================================================================================================
Install 8 Packages
Total size: 252 M
Installed size: 252 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : openvswitch2.10-2.10.0-28.el7fdp.x86_64 1/8
Installing : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 2/8
Installing : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 3/8
Installing : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 4/8
Installing : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 5/8
Installing : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 6/8
Installing : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 7/8
Installing : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 8/8
Verifying : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64 1/8
Verifying : openvswitch2.10-2.10.0-28.el7fdp.x86_64 2/8
Verifying : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64 3/8
Verifying : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64 4/8
Verifying : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64 5/8
Verifying : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64 6/8
Verifying : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64 7/8
Verifying : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64 8/8
Installed:
openvswitch2.10.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp
openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp
openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp
Complete!
[root@compute-0 ovs]# ovs-vsctl show
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
[root@compute-0 ovs]# systemctl start openvswitch
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# ovs-vsctl show
dc15b921-ef3d-4045-b3e3-991c78828fec
Manager "ptcp:6640:127.0.0.1"
is_connected: true
Bridge br-int
Controller "tcp:127.0.0.1:6633"
fail_mode: secure
Port "qr-c8eec4e7-da"
tag: 1
Interface "qr-c8eec4e7-da"
type: internal
Port "fg-3272cea0-c4"
tag: 2
Interface "fg-3272cea0-c4"
type: internal
Port br-int
Interface br-int
type: internal
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "qvo9ee2dc62-95"
tag: 1
Interface "qvo9ee2dc62-95"
Port "qvodd39c848-c9"
tag: 1
Interface "qvodd39c848-c9"
Port int-br-ex
Interface int-br-ex
type: patch
options: {peer=phy-br-ex}
Port int-br-isolated
Interface int-br-isolated
type: patch
options: {peer=phy-br-isolated}
Bridge br-isolated
Controller "tcp:127.0.0.1:6633"
fail_mode: secure
Port "vlan20"
tag: 20
Interface "vlan20"
type: internal
Port phy-br-isolated
Interface phy-br-isolated
type: patch
options: {peer=int-br-isolated}
Port br-isolated
Interface br-isolated
type: internal
Port "vlan30"
tag: 30
Interface "vlan30"
type: internal
Port "vlan40"
tag: 40
Interface "vlan40"
type: internal
Port "vlan50"
tag: 50
Interface "vlan50"
type: internal
Port "eth1"
Interface "eth1"
Bridge br-tun
Controller "tcp:127.0.0.1:6633"
fail_mode: secure
Port br-tun
Interface br-tun
type: internal
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port "vxlan-ac110220"
Interface "vxlan-ac110220"
type: vxlan
options: {df_default="true", in_key=flow, local_ip="172.17.2.18", out_key=flow, remote_ip="172.17.2.32"}
Bridge br-ex
Controller "tcp:127.0.0.1:6633"
fail_mode: secure
Port br-ex
Interface br-ex
type: internal
Port "eth2"
Interface "eth2"
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
ovs_version: "2.10.0"
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# docker ps | grep neutron
1d9a0b5ab241 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3 "ip netns exec qro..." About an hour ago Up About an hour neutron-haproxy-qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e
77ed6b9bf731 192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_ovs_agent
9991f51a8960 192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_l3_agent
a41a88a83c69 192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent:2018-11-05.3 "kolla_start" 2 hours ago Up 2 hours (healthy) neutron_metadata_agent
[root@compute-0 ovs]# docker restart neutron_l3_agent
neutron_l3_agent
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# ip netns
fip-83153130-bdfb-49d0-8402-61a3db534ef6 (id: 1)
qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e (id: 0)
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-a70e436a-f
valid_lft forever preferred_lft forever
inet6 fe80::6863:91ff:fe48:f1de/64 scope link
valid_lft forever preferred_lft forever
40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe4e:78a0/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 ovs]# systemctl stop openvswitch
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-a70e436a-f
valid_lft forever preferred_lft forever
inet6 fe80::6863:91ff:fe48:f1de/64 scope link
valid_lft forever preferred_lft forever
40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe4e:78a0/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# systemctl start openvswitch
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-a70e436a-f
valid_lft forever preferred_lft forever
inet6 fe80::6863:91ff:fe48:f1de/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 ovs]#
[root@compute-0 ovs]#
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-a70e436a-f
valid_lft forever preferred_lft forever
inet6 fe80::6863:91ff:fe48:f1de/64 scope link
valid_lft forever preferred_lft forever
[root@compute-0 ovs]#
[root@compute-0 ovs]#
Hi, OVS updates within the same stream does not restart the service today, so the daemons in memory are from the removed package. That's why you notice no issues. Same happens with updates within 2.10 stream. No restart, no ports are gone. However, rebasing from 2.9 to 2.10 force a stop/start. Then internal ports are recreated, but OVS has no control over the network namespaces. Please confirm if you see the behavior change between 2.9 or between 2.10 releases or only if you rebase from 2.9 to 2.10. Thanks, fbl Thanks for looking Flavio, let me try and re-phrase what Candido was seeing, the failure wasn't actually related to updating OVS on the systems. On OSP 13 with OVS 2.9, a restart of ovs-vswitchd did not affect the running neutron l3-agent process (in a container) - all of its OVS interfaces still existed afterwards inside their respective namespaces. On OSP 13/14, with OVS 2.10, a restart of ovs-vswitchd removed all the OVS interfaces, affecting the l3-agent, forcing a restart to recreate all the devices. That different was unexpected, and since the l3-agent has no monitor to notify it the interfaces are gone, it's seen as a loss of connectivity to instances. I realize it can be argued the agent *should* monitor for such things and re-synchronize itself, that might be something we need to look into going forward. Thanks Brian, that was helpful.
OVS 2.9
=======
[root@localhost ~]# systemctl restart openvswitch
[root@localhost ~]# ovs-vsctl show
8b1361d5-c7a1-44bd-9cc1-550210771aa1
Bridge "ovsbr0"
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "int1"
Interface "int1"
type: internal
Port "int0"
Interface "int0"
type: internal
Port "eth0"
Interface "eth0"
ovs_version: "2.9.0"
# ip netns exec ns0 ip a show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: int0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether ba:a1:9e:a1:bf:66 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.254/24 scope global int0
valid_lft forever preferred_lft forever
OVS 2.10
========
[root@localhost ~]# systemctl restart openvswitch
# ovs-vsctl show
8b1361d5-c7a1-44bd-9cc1-550210771aa1
Bridge "ovsbr0"
Port "ovsbr0"
Interface "ovsbr0"
type: internal
Port "int1"
Interface "int1"
type: internal
Port "int0"
Interface "int0"
type: internal
Port "eth0"
Interface "eth0"
ovs_version: "2.10.0"
# ip netns exec ns0 ip a show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Problem confirmed on my testbed.
Going to debug this further.
Thanks
fbl
The problem happens in upstream as well. This is the commit introducing the issue: https://github.com/openvswitch/ovs/commit/7521e0cf9e88a62f2feff4e7253654557f94877e Reported upstream to the patch's author: https://mail.openvswitch.org/pipermail/ovs-dev/2018-December/354301.html ---8<--- Hi Ben, This patch introduced a regression in OSP environments using internal ports in other netns. Their networking configuration is lost when the service is restarted because the ports are recreated now. Before the patch it checked using netlink if the port with a specific "name" was already there. I believe that's the check you referred as expensive below. Anyways, the check is a lookup in all ports attached to the DP regardless of the port's netns. After the patch it relies on the kernel to identify that situation. Unfortunately the only protection there is register_netdevice() which fails only if the port with that name exists in the current netns. If the port is in another netns, it will get a new dp_port and because of that userspace will delete the old port. At this point the original port is gone from the other netns and there a fresh port in the current netns. I think the optimization is a good idea, so I came up with this kernel patch to make sure we are not adding another vport with the same name. It resolved the issue in my small env (want to do more tests though). diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 252adfb6fc0b..291b4a71a910 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -2022,6 +2022,11 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info) return -ENOMEM; ovs_lock(); + vport = lookup_vport(sock_net(skb->sk), ovs_header, a); + err = -EEXIST; + if (!IS_ERR(vport) && vport) + goto exit_unlock_free; + restart: dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex); err = -ENODEV; However, OSP users using unpatched kernel with OVS 2.10 might trigger the bug, so I wonder if we should revert the patch in 2.10 and work on an improved fix for 2.11. Perhaps we can detect if the kernel fix is in there (or not) by trying to add the same port twice once and use that as a hint. Perhaps there is something cheaper in dpif to verify if the vport is there that is not vulnerable to races. Thanks, fbl ---8<---- the fix was verified :
11:06:51 . /tmp/ir-venv-awm3Iep/bin/activate
11:06:51 infrared tripleo-undercloud -o undercloud_settings.yml --mirror tlv --version 14 --build 2018-12-13.3 --ssl false --tls-everywhere false
11:06:51
root@compute-2 heat-admin]#
[root@compute-2 heat-admin]#
[root@compute-2 heat-admin]# ps -ef | grep ovs
42435 30879 30860 0 13:12 ? 00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh
root 64679 30753 0 13:36 ? 00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock
root 75578 72162 0 13:59 pts/2 00:00:00 grep --color=auto ovs
[root@compute-2 heat-admin]# ps -ef | grep ovs
42435 30879 30860 0 13:12 ? 00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh
root 64679 30753 0 13:36 ? 00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock
root 75590 72162 0 13:59 pts/2 00:00:00 grep --color=auto ovs
[root@compute-2 heat-admin]# systemctl start ovs-vswitchd
[root@compute-2 heat-admin]# ip netns exec fip-d3b6e13f-5062-4bf2-bc11-6f77748453fe ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: fpr-e55a92b8-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether f6:0f:8c:32:f6:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 169.254.106.115/31 scope global fpr-e55a92b8-0
valid_lft forever preferred_lft forever
inet6 fe80::f40f:8cff:fe32:f68e/64 scope link
valid_lft forever preferred_lft forever
23: fg-d51b8944-77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether fa:16:3e:77:06:c0 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.225/24 brd 10.0.0.255 scope global fg-d51b8944-77
valid_lft forever preferred_lft forever
inet6 fe80::f816:3eff:fe77:6c0/64 scope link
valid_lft forever preferred_lft forever
[root@compute-2 heat-admin]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045 |