Bug 1572698 - Need to restart neutron-openvswitch-agent after reconfiguration of network with os-net-config
Summary: Need to restart neutron-openvswitch-agent after reconfiguration of network wi...
Keywords:
Status: CLOSED DUPLICATE of bug 1576256
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Assaf Muller
QA Contact: Gurenko Alex
URL:
Whiteboard:
Depends On:
Blocks: 1572958 1572959
TreeView+ depends on / blocked
 
Reported: 2018-04-27 15:34 UTC by Andreas Karis
Modified: 2022-08-16 09:48 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1572958 1572959 (view as bug list)
Environment:
Last Closed: 2018-06-01 18:48:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-5057 0 None None None 2022-08-16 09:48:02 UTC

Description Andreas Karis 2018-04-27 15:34:59 UTC
Description of problem:
Hi,

We need to restart neutron-openvswitch-agent after reconfiguration of network with os-net-config

Additional info:

We just hit a major outage in a customer environment with OSP 8 due to some interesting behavior between tripleo and neutron-openvswitch-agent.

I just reproduced part of this issue in a lab in both OSP 8 and OSP 10:

a) Modify br-ex:
~~~
    cat /etc/sysconfig/network-scripts/ifcfg-br-ex
    # This file is autogenerated by os-net-config
    DEVICE=br-ex
    MTU=2000
    ONBOOT=yes
    HOTPLUG=no
    NM_CONTROLLED=no
    PEERDNS=no
    DEVICETYPE=ovs
    TYPE=OVSBridge
    OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone"
~~~

b) Start a stack update with `openstack overcloud deploy (...)`

c) Verify flows on br-ex after the stack update:
~~~
[root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=57042.432s, table=0, n_packets=797291, n_bytes=102211740, idle_age=1, priority=0 actions=NORMAL
~~~

We do obviously not support the manipulation of ifcfg files outside the scope of Director. However, we should at least deal with this properly:
When ifcfg-<interface> files are manipulated outside the scope of Director, os-net-config will detect this and will reconfigure the network. It will, as part of it, restart br-ex, and it will delete the flows which were created by neutron-openvswitch-agent. 

We can reproduce this without Director as well:

Note that os-net-config will normally *not* bring up/down the network, unless a change to files in /etc/sysconfig/network-scripts/ was detected!
~~~
[root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json 
[2018/04/26 11:18:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json
[2018/04/26 11:18:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
[2018/04/26 11:18:11 PM] [INFO] Ifcfg net config provider created.
[2018/04/26 11:18:11 PM] [INFO] nic5 mapped to: eth4
[2018/04/26 11:18:11 PM] [INFO] nic4 mapped to: eth3
[2018/04/26 11:18:11 PM] [INFO] nic3 mapped to: eth2
[2018/04/26 11:18:11 PM] [INFO] nic2 mapped to: eth1
[2018/04/26 11:18:11 PM] [INFO] nic1 mapped to: eth0
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth0
[2018/04/26 11:18:11 PM] [INFO] adding custom route for interface: eth0
[2018/04/26 11:18:11 PM] [INFO] adding bridge: br-ex
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth1
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan901
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan903
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan902
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth2
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth3
[2018/04/26 11:18:11 PM] [INFO] applying network configs...
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth3
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth2
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth1
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth0
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan903
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan902
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan901
[2018/04/26 11:18:11 PM] [INFO] No changes required for bridge: br-ex
[root@overcloud-compute-0 ~]# 
~~~

Even if we manipulate interfaces live manually, os-net-config would not cause a restart:
~~~
[root@overcloud-compute-0 ~]# ip link set dev br-ex mtu 2000
[root@overcloud-compute-0 ~]# ip a a dev br-ex 192.168.123.5/24
[root@overcloud-compute-0 ~]# ip link set dev br-ex up
[root@overcloud-compute-0 ~]# ip link set dev vlan902 down
[root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json 
[2018/04/26 11:19:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json
[2018/04/26 11:19:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
[2018/04/26 11:19:11 PM] [INFO] Ifcfg net config provider created.
[2018/04/26 11:19:11 PM] [INFO] nic5 mapped to: eth4
[2018/04/26 11:19:11 PM] [INFO] nic4 mapped to: eth3
[2018/04/26 11:19:11 PM] [INFO] nic3 mapped to: eth2
[2018/04/26 11:19:11 PM] [INFO] nic2 mapped to: eth1
[2018/04/26 11:19:11 PM] [INFO] nic1 mapped to: eth0
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth0
[2018/04/26 11:19:11 PM] [INFO] adding custom route for interface: eth0
[2018/04/26 11:19:11 PM] [INFO] adding bridge: br-ex
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth1
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan901
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan903
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan902
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth2
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth3
[2018/04/26 11:19:11 PM] [INFO] applying network configs...
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth3
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth2
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth1
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth0
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan903
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan902
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan901
[2018/04/26 11:19:11 PM] [INFO] No changes required for bridge: br-ex
[root@overcloud-compute-0 ~]# 
~~~

Now changing a file in /etc/sysconfig/network-scripts, e.g. as follows:
~~~
    [root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
    NXST_FLOW reply (xid=0x4):
     cookie=0xb37330a54aeb3705, duration=110643.987s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=4,in_port=5,dl_vlan=2 actions=mod_vlan_vid:906,NORMAL
     cookie=0xb37330a54aeb3705, duration=110651.569s, table=0, n_packets=3420, n_bytes=403312, idle_age=11, hard_age=65534, priority=2,in_port=5 actions=drop
     cookie=0xb37330a54aeb3705, duration=110651.578s, table=0, n_packets=2139296, n_bytes=349532217, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

    # note, I'm pushing a custom MTU line in ifcfg-br-ex to trigger the network restart

    [root@overcloud-compute-0 ~]# cat !$
    cat /etc/sysconfig/network-scripts/ifcfg-br-ex
    # This file is autogenerated by os-net-config
    DEVICE=br-ex
    MTU=2000
    ONBOOT=yes
    HOTPLUG=no
    NM_CONTROLLED=no
    PEERDNS=no
    DEVICETYPE=ovs
    TYPE=OVSBridge
    OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone"
    [root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json
    [2018/04/26 11:15:17 PM] [INFO] Using config file at: /etc/os-net-config/config.json
    [2018/04/26 11:15:17 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
    [2018/04/26 11:15:17 PM] [INFO] Ifcfg net config provider created.
    [2018/04/26 11:15:17 PM] [INFO] nic5 mapped to: eth4
    [2018/04/26 11:15:17 PM] [INFO] nic4 mapped to: eth3
    [2018/04/26 11:15:17 PM] [INFO] nic3 mapped to: eth2
    [2018/04/26 11:15:17 PM] [INFO] nic2 mapped to: eth1
    [2018/04/26 11:15:17 PM] [INFO] nic1 mapped to: eth0
    [2018/04/26 11:15:17 PM] [INFO] adding interface: eth0
    [2018/04/26 11:15:17 PM] [INFO] adding custom route for interface: eth0
    [2018/04/26 11:15:17 PM] [INFO] adding bridge: br-ex
    [2018/04/26 11:15:17 PM] [INFO] adding interface: eth1
    [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan901
    [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan903
    [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan902
    [2018/04/26 11:15:17 PM] [INFO] adding interface: eth2
    [2018/04/26 11:15:17 PM] [INFO] adding interface: eth3
    [2018/04/26 11:15:17 PM] [INFO] applying network configs...
    [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth3
    [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth2
    [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth1
    [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth0
    [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan903
    [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan902
    [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan901
    [2018/04/26 11:15:17 PM] [INFO] running ifdown on interface: vlan903
    [2018/04/26 11:15:18 PM] [INFO] running ifdown on interface: vlan902
    [2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: vlan901
    [2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: eth1
    [2018/04/26 11:15:20 PM] [INFO] running ifdown on bridge: br-ex
    [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route6-br-ex
    [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex
    [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route-br-ex
    [2018/04/26 11:15:20 PM] [INFO] running ifup on bridge: br-ex
    [2018/04/26 11:15:22 PM] [INFO] running ifup on interface: vlan903
    [2018/04/26 11:15:27 PM] [INFO] running ifup on interface: vlan902
     
     
     
    [2018/04/26 11:15:32 PM] [INFO] running ifup on interface: vlan901
    [2018/04/26 11:15:37 PM] [INFO] running ifup on interface: eth1
    [root@overcloud-compute-0 ~]#
    [root@overcloud-compute-0 ~]#
    [root@overcloud-compute-0 ~]#
    [root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
    NXST_FLOW reply (xid=0x4):
     cookie=0x0, duration=21.750s, table=0, n_packets=610, n_bytes=71190, idle_age=0, priority=0 actions=NORMAL
    [root@overcloud-compute-0 ~]#
~~~

O.k., so what's the consequence of this?

a) VLAN connections will be broken due to loss of internal patch cable to br-int:
~~~
[root@overcloud-compute-0 ~]# ovs-vsctl show
3b8f43d7-01f5-4bb7-8155-2fde36264c5d
    Bridge br-ex
        Port "vlan902"
            tag: 902
            Interface "vlan902"
                type: internal
        Port br-ex
            Interface br-ex
                type: internal
        Port "vlan903"
            tag: 903
            Interface "vlan903"
                type: internal
        Port "eth1"
            Interface "eth1"
        Port "vlan901"
            tag: 901
            Interface "vlan901"
                type: internal
    Bridge br-int
~~~

b) flows are switched from neutron-openvswitch-agent flows to default "NORMAL" with cookie=0x0

This has the potential to be a time bomb. Under certain circumstances, neutron-openvswitch-agent will clean up stale flows (flows that have the wrong cookie). This cleanup can happen months (!) later, see BZ https://bugzilla.redhat.com/1571647

Comment 2 Andreas Karis 2018-05-03 20:50:11 UTC
Hi,

At another customer, I ran into a similar issue. This was triggered by a change to the configuration of DnsServers.

Between the templates, there's a change to the number of DnsServers.
~~~
# before 
DnsServers: ["10.236.255.2"]
~~~

~~~
# after
DnsServers: ["10.236.255.2","10.236.255.6"]
~~~

This change triggers a run of os-net-config which propagates it into ifcfg-br-ex:
~~~
# before
[akaris@collab-shell network-scripts]$ cat ifcfg-br-ex
# This file is autogenerated by os-net-config
(...)
DNS1=10.236.255.2
~~~

~~~
# after
[akaris@collab-shell network-scripts]$ cat ifcfg-br-ex
# This file is autogenerated by os-net-config
(...)
DNS1=10.236.255.2
DNS2=10.236.255.6
~~~

os-net-config then deletes and creates br-ex which deletes all flows and the virtual patch cord to br-ex:
~~~
(...)
May 03 19:22:31  os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] running ifdown on bridge: br-ex
May 03 19:22:31  ovs-vsctl[465206]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br br-ex
May 03 19:22:31  os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route6-br-ex
May 03 19:22:31  os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex
May 03 19:22:31  os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route-br-ex
May 03 19:22:31  os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] running ifup on bridge: br-ex
(...)
~~~

Flows and virtual patch cord are only recreated once neutron-openvswitch-agent is restarted. This will cause an immediate outage on VLAN networks.

- Andreas

Comment 3 Assaf Muller 2018-06-01 18:48:57 UTC
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1571647 (and its dependent bugs) - Slawek fixed this upstream and backported to all OSP branches - The OVS agent will now monitor the bridges it manages, and if it notices that the bridge is missing flows (because it was recreated, e.g. by restarting network service or issuing an ifdown/ifup) it will reprogram the bridge. This is done continuously, so there's no need to restart the agent to trigger this anymore. I'll close this bug and its dependents as dupes.

*** This bug has been marked as a duplicate of bug 1576256 ***


Note You need to log in before you can comment on or make changes to this bug.