Description of problem: Hi, We need to restart neutron-openvswitch-agent after reconfiguration of network with os-net-config Additional info: We just hit a major outage in a customer environment with OSP 8 due to some interesting behavior between tripleo and neutron-openvswitch-agent. I just reproduced part of this issue in a lab in both OSP 8 and OSP 10: a) Modify br-ex: ~~~ cat /etc/sysconfig/network-scripts/ifcfg-br-ex # This file is autogenerated by os-net-config DEVICE=br-ex MTU=2000 ONBOOT=yes HOTPLUG=no NM_CONTROLLED=no PEERDNS=no DEVICETYPE=ovs TYPE=OVSBridge OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone" ~~~ b) Start a stack update with `openstack overcloud deploy (...)` c) Verify flows on br-ex after the stack update: ~~~ [root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=57042.432s, table=0, n_packets=797291, n_bytes=102211740, idle_age=1, priority=0 actions=NORMAL ~~~ We do obviously not support the manipulation of ifcfg files outside the scope of Director. However, we should at least deal with this properly: When ifcfg-<interface> files are manipulated outside the scope of Director, os-net-config will detect this and will reconfigure the network. It will, as part of it, restart br-ex, and it will delete the flows which were created by neutron-openvswitch-agent. We can reproduce this without Director as well: Note that os-net-config will normally *not* bring up/down the network, unless a change to files in /etc/sysconfig/network-scripts/ was detected! ~~~ [root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json [2018/04/26 11:18:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json [2018/04/26 11:18:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml [2018/04/26 11:18:11 PM] [INFO] Ifcfg net config provider created. [2018/04/26 11:18:11 PM] [INFO] nic5 mapped to: eth4 [2018/04/26 11:18:11 PM] [INFO] nic4 mapped to: eth3 [2018/04/26 11:18:11 PM] [INFO] nic3 mapped to: eth2 [2018/04/26 11:18:11 PM] [INFO] nic2 mapped to: eth1 [2018/04/26 11:18:11 PM] [INFO] nic1 mapped to: eth0 [2018/04/26 11:18:11 PM] [INFO] adding interface: eth0 [2018/04/26 11:18:11 PM] [INFO] adding custom route for interface: eth0 [2018/04/26 11:18:11 PM] [INFO] adding bridge: br-ex [2018/04/26 11:18:11 PM] [INFO] adding interface: eth1 [2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan901 [2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan903 [2018/04/26 11:18:11 PM] [INFO] adding vlan: vlan902 [2018/04/26 11:18:11 PM] [INFO] adding interface: eth2 [2018/04/26 11:18:11 PM] [INFO] adding interface: eth3 [2018/04/26 11:18:11 PM] [INFO] applying network configs... [2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth3 [2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth2 [2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth1 [2018/04/26 11:18:11 PM] [INFO] No changes required for interface: eth0 [2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan903 [2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan902 [2018/04/26 11:18:11 PM] [INFO] No changes required for vlan interface: vlan901 [2018/04/26 11:18:11 PM] [INFO] No changes required for bridge: br-ex [root@overcloud-compute-0 ~]# ~~~ Even if we manipulate interfaces live manually, os-net-config would not cause a restart: ~~~ [root@overcloud-compute-0 ~]# ip link set dev br-ex mtu 2000 [root@overcloud-compute-0 ~]# ip a a dev br-ex 192.168.123.5/24 [root@overcloud-compute-0 ~]# ip link set dev br-ex up [root@overcloud-compute-0 ~]# ip link set dev vlan902 down [root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json [2018/04/26 11:19:11 PM] [INFO] Using config file at: /etc/os-net-config/config.json [2018/04/26 11:19:11 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml [2018/04/26 11:19:11 PM] [INFO] Ifcfg net config provider created. [2018/04/26 11:19:11 PM] [INFO] nic5 mapped to: eth4 [2018/04/26 11:19:11 PM] [INFO] nic4 mapped to: eth3 [2018/04/26 11:19:11 PM] [INFO] nic3 mapped to: eth2 [2018/04/26 11:19:11 PM] [INFO] nic2 mapped to: eth1 [2018/04/26 11:19:11 PM] [INFO] nic1 mapped to: eth0 [2018/04/26 11:19:11 PM] [INFO] adding interface: eth0 [2018/04/26 11:19:11 PM] [INFO] adding custom route for interface: eth0 [2018/04/26 11:19:11 PM] [INFO] adding bridge: br-ex [2018/04/26 11:19:11 PM] [INFO] adding interface: eth1 [2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan901 [2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan903 [2018/04/26 11:19:11 PM] [INFO] adding vlan: vlan902 [2018/04/26 11:19:11 PM] [INFO] adding interface: eth2 [2018/04/26 11:19:11 PM] [INFO] adding interface: eth3 [2018/04/26 11:19:11 PM] [INFO] applying network configs... [2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth3 [2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth2 [2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth1 [2018/04/26 11:19:11 PM] [INFO] No changes required for interface: eth0 [2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan903 [2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan902 [2018/04/26 11:19:11 PM] [INFO] No changes required for vlan interface: vlan901 [2018/04/26 11:19:11 PM] [INFO] No changes required for bridge: br-ex [root@overcloud-compute-0 ~]# ~~~ Now changing a file in /etc/sysconfig/network-scripts, e.g. as follows: ~~~ [root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0xb37330a54aeb3705, duration=110643.987s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=4,in_port=5,dl_vlan=2 actions=mod_vlan_vid:906,NORMAL cookie=0xb37330a54aeb3705, duration=110651.569s, table=0, n_packets=3420, n_bytes=403312, idle_age=11, hard_age=65534, priority=2,in_port=5 actions=drop cookie=0xb37330a54aeb3705, duration=110651.578s, table=0, n_packets=2139296, n_bytes=349532217, idle_age=0, hard_age=65534, priority=0 actions=NORMAL # note, I'm pushing a custom MTU line in ifcfg-br-ex to trigger the network restart [root@overcloud-compute-0 ~]# cat !$ cat /etc/sysconfig/network-scripts/ifcfg-br-ex # This file is autogenerated by os-net-config DEVICE=br-ex MTU=2000 ONBOOT=yes HOTPLUG=no NM_CONTROLLED=no PEERDNS=no DEVICETYPE=ovs TYPE=OVSBridge OVS_EXTRA="set bridge br-ex other-config:hwaddr=52:54:00:94:27:2f -- set bridge br-ex fail_mode=standalone" [root@overcloud-compute-0 ~]# os-net-config -v -c /etc/os-net-config/config.json [2018/04/26 11:15:17 PM] [INFO] Using config file at: /etc/os-net-config/config.json [2018/04/26 11:15:17 PM] [INFO] Using mapping file at: /etc/os-net-config/mapping.yaml [2018/04/26 11:15:17 PM] [INFO] Ifcfg net config provider created. [2018/04/26 11:15:17 PM] [INFO] nic5 mapped to: eth4 [2018/04/26 11:15:17 PM] [INFO] nic4 mapped to: eth3 [2018/04/26 11:15:17 PM] [INFO] nic3 mapped to: eth2 [2018/04/26 11:15:17 PM] [INFO] nic2 mapped to: eth1 [2018/04/26 11:15:17 PM] [INFO] nic1 mapped to: eth0 [2018/04/26 11:15:17 PM] [INFO] adding interface: eth0 [2018/04/26 11:15:17 PM] [INFO] adding custom route for interface: eth0 [2018/04/26 11:15:17 PM] [INFO] adding bridge: br-ex [2018/04/26 11:15:17 PM] [INFO] adding interface: eth1 [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan901 [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan903 [2018/04/26 11:15:17 PM] [INFO] adding vlan: vlan902 [2018/04/26 11:15:17 PM] [INFO] adding interface: eth2 [2018/04/26 11:15:17 PM] [INFO] adding interface: eth3 [2018/04/26 11:15:17 PM] [INFO] applying network configs... [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth3 [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth2 [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth1 [2018/04/26 11:15:17 PM] [INFO] No changes required for interface: eth0 [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan903 [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan902 [2018/04/26 11:15:17 PM] [INFO] No changes required for vlan interface: vlan901 [2018/04/26 11:15:17 PM] [INFO] running ifdown on interface: vlan903 [2018/04/26 11:15:18 PM] [INFO] running ifdown on interface: vlan902 [2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: vlan901 [2018/04/26 11:15:19 PM] [INFO] running ifdown on interface: eth1 [2018/04/26 11:15:20 PM] [INFO] running ifdown on bridge: br-ex [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route6-br-ex [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex [2018/04/26 11:15:20 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route-br-ex [2018/04/26 11:15:20 PM] [INFO] running ifup on bridge: br-ex [2018/04/26 11:15:22 PM] [INFO] running ifup on interface: vlan903 [2018/04/26 11:15:27 PM] [INFO] running ifup on interface: vlan902 [2018/04/26 11:15:32 PM] [INFO] running ifup on interface: vlan901 [2018/04/26 11:15:37 PM] [INFO] running ifup on interface: eth1 [root@overcloud-compute-0 ~]# [root@overcloud-compute-0 ~]# [root@overcloud-compute-0 ~]# [root@overcloud-compute-0 ~]# ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=21.750s, table=0, n_packets=610, n_bytes=71190, idle_age=0, priority=0 actions=NORMAL [root@overcloud-compute-0 ~]# ~~~ O.k., so what's the consequence of this? a) VLAN connections will be broken due to loss of internal patch cable to br-int: ~~~ [root@overcloud-compute-0 ~]# ovs-vsctl show 3b8f43d7-01f5-4bb7-8155-2fde36264c5d Bridge br-ex Port "vlan902" tag: 902 Interface "vlan902" type: internal Port br-ex Interface br-ex type: internal Port "vlan903" tag: 903 Interface "vlan903" type: internal Port "eth1" Interface "eth1" Port "vlan901" tag: 901 Interface "vlan901" type: internal Bridge br-int ~~~ b) flows are switched from neutron-openvswitch-agent flows to default "NORMAL" with cookie=0x0 This has the potential to be a time bomb. Under certain circumstances, neutron-openvswitch-agent will clean up stale flows (flows that have the wrong cookie). This cleanup can happen months (!) later, see BZ https://bugzilla.redhat.com/1571647
Hi, At another customer, I ran into a similar issue. This was triggered by a change to the configuration of DnsServers. Between the templates, there's a change to the number of DnsServers. ~~~ # before DnsServers: ["10.236.255.2"] ~~~ ~~~ # after DnsServers: ["10.236.255.2","10.236.255.6"] ~~~ This change triggers a run of os-net-config which propagates it into ifcfg-br-ex: ~~~ # before [akaris@collab-shell network-scripts]$ cat ifcfg-br-ex # This file is autogenerated by os-net-config (...) DNS1=10.236.255.2 ~~~ ~~~ # after [akaris@collab-shell network-scripts]$ cat ifcfg-br-ex # This file is autogenerated by os-net-config (...) DNS1=10.236.255.2 DNS2=10.236.255.6 ~~~ os-net-config then deletes and creates br-ex which deletes all flows and the virtual patch cord to br-ex: ~~~ (...) May 03 19:22:31 os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] running ifdown on bridge: br-ex May 03 19:22:31 ovs-vsctl[465206]: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --if-exists del-br br-ex May 03 19:22:31 os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route6-br-ex May 03 19:22:31 os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/ifcfg-br-ex May 03 19:22:31 os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] Writing config /etc/sysconfig/network-scripts/route-br-ex May 03 19:22:31 os-collect-config[3712]: [2018/05/03 07:22:31 PM] [INFO] running ifup on bridge: br-ex (...) ~~~ Flows and virtual patch cord are only recreated once neutron-openvswitch-agent is restarted. This will cause an immediate outage on VLAN networks. - Andreas
This is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1571647 (and its dependent bugs) - Slawek fixed this upstream and backported to all OSP branches - The OVS agent will now monitor the bridges it manages, and if it notices that the bridge is missing flows (because it was recreated, e.g. by restarting network service or issuing an ifdown/ifup) it will reprogram the bridge. This is done continuously, so there's no need to restart the agent to trigger this anymore. I'll close this bug and its dependents as dupes. *** This bug has been marked as a duplicate of bug 1576256 ***