Bug 1654371

Summary: OVS restart kills router and DHCP ports
Product: Red Hat OpenStack Reporter: Slawek Kaplonski <skaplons>
Component: openvswitchAssignee: Flavio Leitner <fleitner>
Status: CLOSED ERRATA QA Contact: Roee Agiman <ragiman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 14.0 (Rocky)CC: amuller, apevec, atragler, bcafarel, bhaley, ccamposr, chrisw, fbaudin, fleitner, lmarsh, nyechiel, rhos-maint, skaplons, tredaelli
Target Milestone: rcKeywords: Triaged
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openvswitch2.10-2.10.0-28.el7fdp.1 Doc Type: Bug Fix
Doc Text:
Restarting the service causes internal ports moved to another networking namespace to be recreated. When this happens, the ports lose their networking configuration and are recreated in the wrong networking namespace. With this release, the code does not recreate the ports when the service is restarted, which allows the ports to keep their networking configuration.
Story Points: ---
Clone Of:
: 1657946 (view as bug list) Environment:
Last Closed: 2019-01-11 11:55:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Slawek Kaplonski 2018-11-28 15:53:25 UTC
Description of problem:

After restart of ovs-vswitchd process on node with L3 agent, gateway ports (qg-XXX) and (probably) also qr-XXX ports are gone from qrouter-XXX and snat-XXX namespaces and aren't recreated.
Restart of neutron-l3-agent fixes this.


Version-Release number of selected component (if applicable):
Tested on latest OSP-14 with dvr (and also on non distributed router in same deployment) but I think it affects also older releases.

How reproducible:
100% times,

Steps to Reproduce:
1. systemctl restart ovs-vswitchd - on node with L3 agent running,

Actual results:
qg-XXX and qr-XXX ports are existing in openvswitch br-int bridge but not in namespaces.


Expected results:
Ports should be moved to namespaces and configured properly


Additional info:

Comment 1 Assaf Muller 2018-11-28 16:51:03 UTC
From an HA perspective, if ovs-vswitchd crashes, OVS should respawn it. At that point, since the L3/DHCP agents use OVS internal ports (qr, qg devices), and those are in the OVSDB, OVS should recreate them.

1) Is this issue also happening on OSP 10 and 13? Can we determine when this started happening, or did it always?
2) Why is OVS itself not recreating the router/dhcp ports?

We need to answer (1) to determine if this should block OSP 14 or not. If this is a regression in OVS 2.10 then this might be a blocker.

Comment 2 Slawek Kaplonski 2018-11-28 20:30:24 UTC
@Assaf: OVS recreates those ports but they aren't moved to qrouter/qdhcp namespace and that should be handled by L3/DHCP agent.

I wasn't able to test it on OSP-13 or earlier yet but I suppose that it's the same on each release.

Comment 3 Candido Campos 2018-11-29 14:47:05 UTC
the problem with the fip connectivity is in osp14 but not in osp13.

And the reason is that in osp14 the fg-xxx interface of the fip namespace is deleted when ovs process is started:

[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# systemctl stop ovs-vswitchd
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-d10c6892-c
       valid_lft forever preferred_lft forever
    inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link 
       valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb7:9c5/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-d10c6892-c
       valid_lft forever preferred_lft forever
    inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link 
       valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb7:9c5/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-d10c6892-c
       valid_lft forever preferred_lft forever
    inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link 
       valid_lft forever preferred_lft forever
24: fg-9a52b3d5-44: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:b7:09:c5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.215/24 brd 10.0.0.255 scope global fg-9a52b3d5-44
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb7:9c5/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# systemctl start ovs-vswitchd
[root@compute-0 heat-admin]# ip netns exec fip-c1215d97-2ef3-4d22-a0dc-5e84c4bfd06c ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-d10c6892-c: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:90:c6:85:7c:d0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-d10c6892-c
       valid_lft forever preferred_lft forever
    inet6 fe80::2090:c6ff:fe85:7cd0/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]# 
[root@compute-0 heat-admin]#

Comment 4 Candido Campos 2018-11-29 15:26:09 UTC
The change that trigger the delete of the fg-xxx interface is between:

 ovs_version: "2.9.0"

and 

 ovs_version: "2.10.0"


Because we have tested with osp13 and ovs 2.10.0 and then the problem is reproduced:

[root@compute-0 ovs]# 
[root@compute-0 ovs]# yum remove openvswitch 
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package openvswitch.x86_64 0:2.9.0-56.el7fdp will be erased
--> Processing Dependency: openvswitch for package: openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64
--> Processing Dependency: openvswitch >= 2.8.0 for package: python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch
--> Processing Dependency: openvswitch for package: openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64
--> Processing Dependency: openvswitch for package: 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch
--> Processing Dependency: openvswitch for package: openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64
--> Running transaction check
---> Package openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost will be erased
---> Package openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp will be erased
---> Package python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost will be erased
--> Finished Dependency Resolution

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                                             Arch                                  Version                                          Repository                                        Size
===================================================================================================================================================================================================================
Removing:
 openvswitch                                                         x86_64                                2.9.0-56.el7fdp                                  @rhos-13.0-signed                                 22 M
Removing for dependencies:
 openstack-neutron-openvswitch                                       noarch                                1:12.0.4-2.el7ost                                @rhos-13.0-signed                                 24 k
 openvswitch-ovn-central                                             x86_64                                2.9.0-56.el7fdp                                  @rhos-13.0-signed                                2.4 M
 openvswitch-ovn-common                                              x86_64                                2.9.0-56.el7fdp                                  @rhos-13.0-signed                                6.7 M
 openvswitch-ovn-host                                                x86_64                                2.9.0-56.el7fdp                                  @rhos-13.0-signed                                2.6 M
 python-networking-ovn-metadata-agent                                noarch                                4.0.3-1.el7ost                                   @rhos-13.0-signed                                 14 k

Transaction Summary
===================================================================================================================================================================================================================
Remove  1 Package (+5 Dependent packages)

Installed size: 34 M
Is this ok [y/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Erasing    : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch                                                                                                                                          1/6 
  Erasing    : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch                                                                                                                                      2/6 
  Erasing    : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64                                                                                                                                                     3/6 
  Erasing    : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64                                                                                                                                                  4/6 
  Erasing    : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64                                                                                                                                                   5/6 
  Erasing    : openvswitch-2.9.0-56.el7fdp.x86_64                                                                                                                                                              6/6 
warning: /etc/sysconfig/openvswitch saved as /etc/sysconfig/openvswitch.rpmsave
  Verifying  : python-networking-ovn-metadata-agent-4.0.3-1.el7ost.noarch                                                                                                                                      1/6 
  Verifying  : openvswitch-ovn-common-2.9.0-56.el7fdp.x86_64                                                                                                                                                   2/6 
  Verifying  : openvswitch-2.9.0-56.el7fdp.x86_64                                                                                                                                                              3/6 
  Verifying  : openvswitch-ovn-central-2.9.0-56.el7fdp.x86_64                                                                                                                                                  4/6 
  Verifying  : openvswitch-ovn-host-2.9.0-56.el7fdp.x86_64                                                                                                                                                     5/6 
  Verifying  : 1:openstack-neutron-openvswitch-12.0.4-2.el7ost.noarch                                                                                                                                          6/6 

Removed:
  openvswitch.x86_64 0:2.9.0-56.el7fdp                                                                                                                                                                             

Dependency Removed:
  openstack-neutron-openvswitch.noarch 1:12.0.4-2.el7ost        openvswitch-ovn-central.x86_64 0:2.9.0-56.el7fdp  openvswitch-ovn-common.x86_64 0:2.9.0-56.el7fdp  openvswitch-ovn-host.x86_64 0:2.9.0-56.el7fdp 
  python-networking-ovn-metadata-agent.noarch 0:4.0.3-1.el7ost 

Complete!
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# yum install *
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Examining openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm: openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64
Marking openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64.rpm to be installed
Examining python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm: python-openvswitch2.10-2.10.0-28.el7fdp.x86_64
Marking python-openvswitch2.10-2.10.0-28.el7fdp.x86_64.rpm to be installed
Resolving Dependencies
--> Running transaction check
---> Package openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp will be installed
---> Package python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===================================================================================================================================================================================================================
 Package                                             Arch                           Version                                     Repository                                                                    Size
===================================================================================================================================================================================================================
Installing:
 openvswitch2.10                                     x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-2.10.0-28.el7fdp.x86_64                                      31 M
 openvswitch2.10-debuginfo                           x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64                           203 M
 openvswitch2.10-devel                               x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64                               659 k
 openvswitch2.10-ovn-central                         x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64                         2.9 M
 openvswitch2.10-ovn-common                          x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64                          8.1 M
 openvswitch2.10-ovn-host                            x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64                            2.8 M
 openvswitch2.10-ovn-vtep                            x86_64                         2.10.0-28.el7fdp                            /openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64                            2.7 M
 python-openvswitch2.10                              x86_64                         2.10.0-28.el7fdp                            /python-openvswitch2.10-2.10.0-28.el7fdp.x86_64                              1.2 M

Transaction Summary
===================================================================================================================================================================================================================
Install  8 Packages

Total size: 252 M
Installed size: 252 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : openvswitch2.10-2.10.0-28.el7fdp.x86_64                                                                                                                                                         1/8 
  Installing : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64                                                                                                                                              2/8 
  Installing : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64                                                                                                                                                3/8 
  Installing : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64                                                                                                                                                4/8 
  Installing : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64                                                                                                                                             5/8 
  Installing : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64                                                                                                                                                  6/8 
  Installing : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64                                                                                                                                                   7/8 
  Installing : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64                                                                                                                                               8/8 
  Verifying  : openvswitch2.10-debuginfo-2.10.0-28.el7fdp.x86_64                                                                                                                                               1/8 
  Verifying  : openvswitch2.10-2.10.0-28.el7fdp.x86_64                                                                                                                                                         2/8 
  Verifying  : openvswitch2.10-ovn-host-2.10.0-28.el7fdp.x86_64                                                                                                                                                3/8 
  Verifying  : openvswitch2.10-ovn-vtep-2.10.0-28.el7fdp.x86_64                                                                                                                                                4/8 
  Verifying  : python-openvswitch2.10-2.10.0-28.el7fdp.x86_64                                                                                                                                                  5/8 
  Verifying  : openvswitch2.10-ovn-common-2.10.0-28.el7fdp.x86_64                                                                                                                                              6/8 
  Verifying  : openvswitch2.10-ovn-central-2.10.0-28.el7fdp.x86_64                                                                                                                                             7/8 
  Verifying  : openvswitch2.10-devel-2.10.0-28.el7fdp.x86_64                                                                                                                                                   8/8 

Installed:
  openvswitch2.10.x86_64 0:2.10.0-28.el7fdp                              openvswitch2.10-debuginfo.x86_64 0:2.10.0-28.el7fdp                   openvswitch2.10-devel.x86_64 0:2.10.0-28.el7fdp                    
  openvswitch2.10-ovn-central.x86_64 0:2.10.0-28.el7fdp                  openvswitch2.10-ovn-common.x86_64 0:2.10.0-28.el7fdp                  openvswitch2.10-ovn-host.x86_64 0:2.10.0-28.el7fdp                 
  openvswitch2.10-ovn-vtep.x86_64 0:2.10.0-28.el7fdp                     python-openvswitch2.10.x86_64 0:2.10.0-28.el7fdp                     

Complete!
[root@compute-0 ovs]# ovs-vsctl show
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (No such file or directory)
[root@compute-0 ovs]# systemctl start openvswitch
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# ovs-vsctl show
dc15b921-ef3d-4045-b3e3-991c78828fec
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port "qr-c8eec4e7-da"
            tag: 1
            Interface "qr-c8eec4e7-da"
                type: internal
        Port "fg-3272cea0-c4"
            tag: 2
            Interface "fg-3272cea0-c4"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "qvo9ee2dc62-95"
            tag: 1
            Interface "qvo9ee2dc62-95"
        Port "qvodd39c848-c9"
            tag: 1
            Interface "qvodd39c848-c9"
        Port int-br-ex
            Interface int-br-ex
                type: patch
                options: {peer=phy-br-ex}
        Port int-br-isolated
            Interface int-br-isolated
                type: patch
                options: {peer=phy-br-isolated}
    Bridge br-isolated
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port "vlan20"
            tag: 20
            Interface "vlan20"
                type: internal
        Port phy-br-isolated
            Interface phy-br-isolated
                type: patch
                options: {peer=int-br-isolated}
        Port br-isolated
            Interface br-isolated
                type: internal
        Port "vlan30"
            tag: 30
            Interface "vlan30"
                type: internal
        Port "vlan40"
            tag: 40
            Interface "vlan40"
                type: internal
        Port "vlan50"
            tag: 50
            Interface "vlan50"
                type: internal
        Port "eth1"
            Interface "eth1"
    Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port "vxlan-ac110220"
            Interface "vxlan-ac110220"
                type: vxlan
                options: {df_default="true", in_key=flow, local_ip="172.17.2.18", out_key=flow, remote_ip="172.17.2.32"}
    Bridge br-ex
        Controller "tcp:127.0.0.1:6633"
        fail_mode: secure
        Port br-ex
            Interface br-ex
                type: internal
        Port "eth2"
            Interface "eth2"
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
    ovs_version: "2.10.0"
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# docker ps | grep neutron 
1d9a0b5ab241        192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3            "ip netns exec qro..."   About an hour ago   Up About an hour                           neutron-haproxy-qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e
77ed6b9bf731        192.168.24.1:8787/rhosp13/openstack-neutron-openvswitch-agent:2018-11-05.3   "kolla_start"            2 hours ago         Up 2 hours (healthy)                       neutron_ovs_agent
9991f51a8960        192.168.24.1:8787/rhosp13/openstack-neutron-l3-agent:2018-11-05.3            "kolla_start"            2 hours ago         Up 2 hours (healthy)                       neutron_l3_agent
a41a88a83c69        192.168.24.1:8787/rhosp13/openstack-neutron-metadata-agent:2018-11-05.3      "kolla_start"            2 hours ago         Up 2 hours (healthy)                       neutron_metadata_agent
[root@compute-0 ovs]# docker restart neutron_l3_agent
neutron_l3_agent
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# ip netns 
fip-83153130-bdfb-49d0-8402-61a3db534ef6 (id: 1)
qrouter-a70e436a-f4cf-4076-b16e-4bf2168c948e (id: 0)
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-a70e436a-f
       valid_lft forever preferred_lft forever
    inet6 fe80::6863:91ff:fe48:f1de/64 scope link 
       valid_lft forever preferred_lft forever
40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe4e:78a0/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 ovs]# systemctl stop openvswitch
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-a70e436a-f
       valid_lft forever preferred_lft forever
    inet6 fe80::6863:91ff:fe48:f1de/64 scope link 
       valid_lft forever preferred_lft forever
40: fg-3272cea0-c4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:4e:78:a0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.213/24 brd 10.0.0.255 scope global fg-3272cea0-c4
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe4e:78a0/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# systemctl start openvswitch
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-a70e436a-f
       valid_lft forever preferred_lft forever
    inet6 fe80::6863:91ff:fe48:f1de/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 ovs]# 
[root@compute-0 ovs]# 
[root@compute-0 ovs]# ip netns exec fip-83153130-bdfb-49d0-8402-61a3db534ef6  ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-a70e436a-f: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 6a:63:91:48:f1:de brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-a70e436a-f
       valid_lft forever preferred_lft forever
    inet6 fe80::6863:91ff:fe48:f1de/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-0 ovs]# 
[root@compute-0 ovs]#

Comment 6 Flavio Leitner 2018-11-30 20:51:15 UTC
Hi,

OVS updates within the same stream does not restart the service today, so the daemons in memory are from the removed package. That's why you notice no issues.

Same happens with updates within 2.10 stream. No restart, no ports are gone.

However, rebasing from 2.9 to 2.10 force a stop/start. Then internal ports are recreated, but OVS has no control over the network namespaces.

Please confirm if you see the behavior change between 2.9 or between 2.10 releases or only if you rebase from 2.9 to 2.10.

Thanks,
fbl

Comment 7 Brian Haley 2018-11-30 21:27:34 UTC
Thanks for looking Flavio, let me try and re-phrase what Candido was seeing, the failure wasn't actually related to updating OVS on the systems.

On OSP 13 with OVS 2.9, a restart of ovs-vswitchd did not affect the running neutron l3-agent process (in a container) - all of its OVS interfaces still existed afterwards inside their respective namespaces.

On OSP 13/14, with OVS 2.10, a restart of ovs-vswitchd removed all the OVS interfaces, affecting the l3-agent, forcing a restart to recreate all the devices.

That different was unexpected, and since the l3-agent has no monitor to notify it the interfaces are gone, it's seen as a loss of connectivity to instances.  I realize it can be argued the agent *should* monitor for such things and re-synchronize itself, that might be something we need to look into going forward.

Comment 8 Flavio Leitner 2018-12-01 15:29:14 UTC
Thanks Brian, that was helpful.

OVS 2.9
=======

[root@localhost ~]# systemctl restart openvswitch 
[root@localhost ~]# ovs-vsctl show
8b1361d5-c7a1-44bd-9cc1-550210771aa1
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "int1"
            Interface "int1"
                type: internal
        Port "int0"
            Interface "int0"
                type: internal
        Port "eth0"
            Interface "eth0"
    ovs_version: "2.9.0"
# ip netns exec ns0  ip a show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
7: int0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether ba:a1:9e:a1:bf:66 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.254/24 scope global int0
       valid_lft forever preferred_lft forever


OVS 2.10
========
[root@localhost ~]# systemctl restart openvswitch 
# ovs-vsctl show
8b1361d5-c7a1-44bd-9cc1-550210771aa1
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "int1"
            Interface "int1"
                type: internal
        Port "int0"
            Interface "int0"
                type: internal
        Port "eth0"
            Interface "eth0"
    ovs_version: "2.10.0"

# ip netns exec ns0  ip a show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00


Problem confirmed on my testbed.
Going to debug this further.
Thanks
fbl

Comment 12 Flavio Leitner 2018-12-04 13:37:33 UTC
The problem happens in upstream as well.

Comment 13 Flavio Leitner 2018-12-04 13:41:32 UTC
This is the commit introducing the issue:
https://github.com/openvswitch/ovs/commit/7521e0cf9e88a62f2feff4e7253654557f94877e

Comment 19 Flavio Leitner 2018-12-05 17:14:04 UTC
Reported upstream to the patch's author:
https://mail.openvswitch.org/pipermail/ovs-dev/2018-December/354301.html

---8<---
Hi Ben,

This patch introduced a regression in OSP environments using internal
ports in other netns. Their networking configuration is lost when
the service is restarted because the ports are recreated now.

Before the patch it checked using netlink if the port with a specific
"name" was already there. I believe that's the check you referred as
expensive below. Anyways, the check is a lookup in all ports attached
to the DP regardless of the port's netns.

After the patch it relies on the kernel to identify that situation.
Unfortunately the only protection there is register_netdevice() which
fails only if the port with that name exists in the current netns.

If the port is in another netns, it will get a new dp_port and because
of that userspace will delete the old port. At this point the original
port is gone from the other netns and there a fresh port in the current
netns.

I think the optimization is a good idea, so I came up with this kernel
patch to make sure we are not adding another vport with the same name.
It resolved the issue in my small env (want to do more tests though).

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 252adfb6fc0b..291b4a71a910 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -2022,6 +2022,11 @@ static int ovs_vport_cmd_new(struct sk_buff *skb, struct genl_info *info)
 		return -ENOMEM;
 
 	ovs_lock();
+	vport = lookup_vport(sock_net(skb->sk), ovs_header, a);
+	err = -EEXIST;
+	if (!IS_ERR(vport) && vport)
+		goto exit_unlock_free;
+
 restart:
 	dp = get_dp(sock_net(skb->sk), ovs_header->dp_ifindex);
 	err = -ENODEV;


However, OSP users using unpatched kernel with OVS 2.10 might trigger
the bug, so I wonder if we should revert the patch in 2.10 and work
on an improved fix for 2.11. Perhaps we can detect if the kernel fix
is in there (or not) by trying to add the same port twice once and
use that as a hint. Perhaps there is something cheaper in dpif to
verify if the vport is there that is not vulnerable to races.

Thanks,
fbl
---8<----

Comment 39 Candido Campos 2018-12-14 14:04:57 UTC
the fix was verified :


11:06:51 . /tmp/ir-venv-awm3Iep/bin/activate
11:06:51 infrared tripleo-undercloud                                     -o undercloud_settings.yml --mirror tlv                                      --version 14                                     --build 2018-12-13.3                                      --ssl false                                     --tls-everywhere false                                       
11:06:51 

root@compute-2 heat-admin]# 
[root@compute-2 heat-admin]# 
[root@compute-2 heat-admin]# ps -ef | grep ovs
42435      30879   30860  0 13:12 ?        00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh
root       64679   30753  0 13:36 ?        00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock
root       75578   72162  0 13:59 pts/2    00:00:00 grep --color=auto ovs
[root@compute-2 heat-admin]# ps -ef | grep ovs
42435      30879   30860  0 13:12 ?        00:00:00 /bin/bash /neutron_ovs_agent_launcher.sh
root       64679   30753  0 13:36 ?        00:00:00 /usr/bin/python2 /bin/privsep-helper --config-file /usr/share/nova/nova-dist.conf --config-file /etc/nova/nova.conf --privsep_context vif_plug_ovs.privsep.vif_plug --privsep_sock_path /tmp/tmpmjg90A/privsep.sock
root       75590   72162  0 13:59 pts/2    00:00:00 grep --color=auto ovs
[root@compute-2 heat-admin]# systemctl start ovs-vswitchd
[root@compute-2 heat-admin]# ip netns exec fip-d3b6e13f-5062-4bf2-bc11-6f77748453fe ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: fpr-e55a92b8-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f6:0f:8c:32:f6:8e brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.106.115/31 scope global fpr-e55a92b8-0
       valid_lft forever preferred_lft forever
    inet6 fe80::f40f:8cff:fe32:f68e/64 scope link 
       valid_lft forever preferred_lft forever
23: fg-d51b8944-77: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:77:06:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.225/24 brd 10.0.0.255 scope global fg-d51b8944-77
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe77:6c0/64 scope link 
       valid_lft forever preferred_lft forever
[root@compute-2 heat-admin]#

Comment 44 errata-xmlrpc 2019-01-11 11:55:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045