+++ This bug was initially created as a clone of Bug #1572698 +++
Description of problem:
Hi,
We need to restart neutron-openvswitch-agent after reconfiguration of network with os-net-config
Additional info:
We just hit a major outage in a customer environment with OSP 8 due to some interesting behavior between tripleo and neutron-openvswitch-agent.
I just reproduced part of this issue in a lab in both OSP 8 and OSP 10:
a) Modify br-ex:
~~~
cat /etc/sysconfig/network-scripts/ifcfg-br-ex
# This file is autogenerated by os-net-config
DEVICE=br-ex
MTU=2000
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSBridge
OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone"
~~~
b) Start a stack update with `openstack overcloud deploy (...)`
c) Verify flows on br-ex after the stack update:
~~~
[root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=57042.432s, table=0, n_packets=797291, n_bytes=102211740, idle_age=1, priority=0 actions=NORMAL
~~~
We do obviously not support the manipulation of ifcfg files outside the scope of Director. However, we should at least deal with this properly:
When ifcfg-<interface> files are manipulated outside the scope of Director, os-net-config will detect this and will reconfigure the network. It will, as part of it, restart br-ex, and it will delete the flows which were created by neutron-openvswitch-agent.
We can reproduce this without Director as well:
Note that os-net-config will normally *not* bring up/down the network, unless a change to files in /etc/sysconfig/network-scripts/ was detected!
~~~
[root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json
[2018/04/26 11:18:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json
[2018/04/26 11:18:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
[2018/04/26 11:18:11 PM] [INFO] Ifcfg net config provider created.
[2018/04/26 11:18:11 PM] [INFO] nic5 mapped to: eth4
[2018/04/26 11:18:11 PM] [INFO] nic4 mapped to: eth3
[2018/04/26 11:18:11 PM] [INFO] nic3 mapped to: eth2
[2018/04/26 11:18:11 PM] [INFO] nic2 mapped to: eth1
[2018/04/26 11:18:11 PM] [INFO] nic1 mapped to: eth0
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth0
[2018/04/26 11:18:11 PM] [INFO] adding custom route for interface: eth0
[2018/04/26 11:18:11 PM] [INFO] adding bridge: br-ex
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth1
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan901
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan903
[2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan902
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth2
[2018/04/26 11:18:11 PM] [INFO] adding interface: eth3
[2018/04/26 11:18:11 PM] [INFO] applying network configs...
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth3
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth2
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth1
[2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth0
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan903
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan902
[2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan901
[2018/04/26 11:18:11 PM] [INFO] No changes required for bridge: br-ex
[root@overcloud-compute-0 ~]#
~~~
Even if we manipulate interfaces live manually, os-net-config would not cause a restart:
~~~
[root@overcloud-compute-0 ~]# ip link set dev br-ex mtu 2000
[root@overcloud-compute-0 ~]# ip a a dev br-ex 192.168.123.5/24
[root@overcloud-compute-0 ~]# ip link set dev br-ex up
[root@overcloud-compute-0 ~]# ip link set dev vlan902 down
[root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json
[2018/04/26 11:19:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json
[2018/04/26 11:19:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
[2018/04/26 11:19:11 PM] [INFO] Ifcfg net config provider created.
[2018/04/26 11:19:11 PM] [INFO] nic5 mapped to: eth4
[2018/04/26 11:19:11 PM] [INFO] nic4 mapped to: eth3
[2018/04/26 11:19:11 PM] [INFO] nic3 mapped to: eth2
[2018/04/26 11:19:11 PM] [INFO] nic2 mapped to: eth1
[2018/04/26 11:19:11 PM] [INFO] nic1 mapped to: eth0
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth0
[2018/04/26 11:19:11 PM] [INFO] adding custom route for interface: eth0
[2018/04/26 11:19:11 PM] [INFO] adding bridge: br-ex
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth1
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan901
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan903
[2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan902
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth2
[2018/04/26 11:19:11 PM] [INFO] adding interface: eth3
[2018/04/26 11:19:11 PM] [INFO] applying network configs...
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth3
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth2
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth1
[2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth0
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan903
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan902
[2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan901
[2018/04/26 11:19:11 PM] [INFO] No changes required for bridge: br-ex
[root@overcloud-compute-0 ~]#
~~~
Now changing a file in /etc/sysconfig/network-scripts, e.g. as follows:
~~~
[root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
cookie=0xb37330a54aeb3705, duration=110643.987s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=4,in_port=5,dl_vlan=2 actions=mod_vlan_vid:906,NORMAL
cookie=0xb37330a54aeb3705, duration=110651.569s, table=0, n_packets=3420, n_bytes=403312, idle_age=11, hard_age=65534, priority=2,in_port=5 actions=drop
cookie=0xb37330a54aeb3705, duration=110651.578s, table=0, n_packets=2139296, n_bytes=349532217, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
# note, I'm pushing a custom MTU line in ifcfg-br-ex to trigger the network restart
[root@overcloud-compute-0 ~]# cat !$
cat /etc/sysconfig/network-scripts/ifcfg-br-ex
# This file is autogenerated by os-net-config
DEVICE=br-ex
MTU=2000
ONBOOT=yes
HOTPLUG=no
NM_CONTROLLED=no
PEERDNS=no
DEVICETYPE=ovs
TYPE=OVSBridge
OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone"
[root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json
[2018/04/26 11:15:17 PM] [INFO] Using config file at: /etc/os-net-config/config.json
[2018/04/26 11:15:17 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml
[2018/04/26 11:15:17 PM] [INFO] Ifcfg net config provider created.
[2018/04/26 11:15:17 PM] [INFO] nic5 mapped to: eth4
[2018/04/26 11:15:17 PM] [INFO] nic4 mapped to: eth3
[2018/04/26 11:15:17 PM] [INFO] nic3 mapped to: eth2
[2018/04/26 11:15:17 PM] [INFO] nic2 mapped to: eth1
[2018/04/26 11:15:17 PM] [INFO] nic1 mapped to: eth0
[2018/04/26 11:15:17 PM] [INFO] adding interface: eth0
[2018/04/26 11:15:17 PM] [INFO] adding custom route for interface: eth0
[2018/04/26 11:15:17 PM] [INFO] adding bridge: br-ex
[2018/04/26 11:15:17 PM] [INFO] adding interface: eth1
[2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan901
[2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan903
[2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan902
[2018/04/26 11:15:17 PM] [INFO] adding interface: eth2
[2018/04/26 11:15:17 PM] [INFO] adding interface: eth3
[2018/04/26 11:15:17 PM] [INFO] applying network configs...
[2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth3
[2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth2
[2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth1
[2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth0
[2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan903
[2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan902
[2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan901
[2018/04/26 11:15:17 PM] [INFO] running ifdown on interface: vlan903
[2018/04/26 11:15:18 PM] [INFO] running ifdown on interface: vlan902
[2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: vlan901
[2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: eth1
[2018/04/26 11:15:20 PM] [INFO] running ifdown on bridge: br-ex
[2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route6-br-ex
[2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex
[2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route-br-ex
[2018/04/26 11:15:20 PM] [INFO] running ifup on bridge: br-ex
[2018/04/26 11:15:22 PM] [INFO] running ifup on interface: vlan903
[2018/04/26 11:15:27 PM] [INFO] running ifup on interface: vlan902
[2018/04/26 11:15:32 PM] [INFO] running ifup on interface: vlan901
[2018/04/26 11:15:37 PM] [INFO] running ifup on interface: eth1
[root@overcloud-compute-0 ~]#
[root@overcloud-compute-0 ~]#
[root@overcloud-compute-0 ~]#
[root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=21.750s, table=0, n_packets=610, n_bytes=71190, idle_age=0, priority=0 actions=NORMAL
[root@overcloud-compute-0 ~]#
~~~
O.k., so what's the consequence of this?
a) VLAN connections will be broken due to loss of internal patch cable to br-int:
~~~
[root@overcloud-compute-0 ~]# ovs-vsctl show
3b8f43d7-01f5-4bb7-8155-2fde36264c5d
Bridge br-ex
Port "vlan902"
tag: 902
Interface "vlan902"
type: internal
Port br-ex
Interface br-ex
type: internal
Port "vlan903"
tag: 903
Interface "vlan903"
type: internal
Port "eth1"
Interface "eth1"
Port "vlan901"
tag: 901
Interface "vlan901"
type: internal
Bridge br-int
~~~
b) flows are switched from neutron-openvswitch-agent flows to default "NORMAL" with cookie=0x0
This has the potential to be a time bomb. Under certain circumstances, neutron-openvswitch-agent will clean up stale flows (flows that have the wrong cookie). This cleanup can happen months (!) later, see BZ https://bugzilla.redhat.com/1571647