Description of problem: The latest async update for OSP 10 broke the undercloud's MAC address assignment to br-ctlplane. neutron-openvswitch-agent resets the MAC address to a random value. Version-Release number of selected component (if applicable): Redhat release OSP 10 Red Hat Enterprise Linux Server release 7.5 (Maipo) rpm -qa | grep openvswitch python-openvswitch-2.9.0-56.el7fdp.noarch openstack-neutron-openvswitch-9.4.1-28.el7ost.noarch openvswitch-selinux-extra-policy-1.0-3.el7fdp.noarch openvswitch-2.9.0-56.el7fdp.x86_64 How reproducible: Upgrade to latest undercloud: ~~~ su - stack source stackrc openstack undercloud upgrade ~~~ After the update and a reboot of the undercloud, the MAC addresses look like this: ~~~ [root@undercloud-7 ~]# ip link ls dev br-ctlplane 7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether ee:43:a6:3f:87:4e brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# ip link ls dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# systemctl list-units | grep neutron neutron-dhcp-agent.service loaded active running OpenStack Neutron DHCP Agent neutron-openvswitch-agent.service loaded active running OpenStack Neutron Open vSwitch Agent neutron-ovs-cleanup.service loaded active exited OpenStack Neutron Open vSwitch Cleanup Utility neutron-server.service loaded active running OpenStack Neutron Server [root@undercloud-7 ~]# ~~~ I can reproduce this issue at will: ~~~ [root@undercloud-7 ~]# ifdown br-ctlplane if[root@undercloud-7 ~]# ifup br-ctlplane ip link ls dev[root@undercloud-7 ~]# ip link ls dev br-ctlplane 13: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# ip link ls dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# systemctl start neutron-openvswitch-agent [root@undercloud-7 ~]# ip link ls dev br-ctlplane 13: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# systemctl status neutron-openvswitch-agent ● neutron-openvswitch-agent.service - OpenStack Neutron Open vSwitch Agent Loaded: loaded (/usr/lib/systemd/system/neutron-openvswitch-agent.service; enabled; vendor preset: disabled) Active: active (running) since Mon 2018-10-01 23:14:52 EDT; 16s ago Process: 16442 ExecStartPre=/usr/bin/neutron-enable-bridge-firewall.sh (code=exited, status=0/SUCCESS) Main PID: 16450 (neutron-openvsw) Tasks: 11 CGroup: /system.slice/neutron-openvswitch-agent.service ├─16450 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/p... ├─16568 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ├─16569 /usr/bin/python2 /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ├─16588 sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json ├─16590 /usr/bin/python2 /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Interface name,ofport,external_ids --format=json ├─16600 /bin/ovsdb-client monitor Interface name,ofport,external_ids --format=json ├─16603 sudo neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Bridge name --format=json ├─16613 /usr/bin/python2 /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf ovsdb-client monitor Bridge name --format=json └─16620 /bin/ovsdb-client monitor Bridge name --format=json ^C [root@undercloud-7 ~]# date Mon Oct 1 23:15:12 EDT 2018 [root@undercloud-7 ~]# ip link ls dev br-ctlplane 13: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 26:4a:aa:dd:13:4d brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# ~~~ Expected results: neutron-openvswitch-agent should not change the MAC address of br-ctlplane. From the same lab before the upgrade: ~~~ [root@undercloud-7 ~]# ovs-vsctl show f50a5f50-ff21-4c46-809f-323d246fdab2 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port "tap47ab69d8-d8" tag: 1 Interface "tap47ab69d8-d8" type: internal Port int-br-ctlplane Interface int-br-ctlplane type: patch options: {peer=phy-br-ctlplane} Port br-int Interface br-int type: internal Bridge br-ctlplane Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure Port br-ctlplane Interface br-ctlplane type: internal Port phy-br-ctlplane Interface phy-br-ctlplane type: patch options: {peer=int-br-ctlplane} Port "eth0" Interface "eth0" ovs_version: "2.9.0" [root@undercloud-7 ~]# ip link ls dev br-ctlplane 7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# ip link ls dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br-ctlplane # This file is autogenerated by os-net-config DEVICE=br-ctlplane ONBOOT=yes HOTPLUG=no NM_CONTROLLED=no PEERDNS=no DEVICETYPE=ovs TYPE=OVSBridge MTU=1500 BOOTPROTO=static IPADDR=192.0.2.1 NETMASK=255.255.255.0 OVS_EXTRA="set bridge br-ctlplane other-config:hwaddr=52:54:00:ec:14:c2 -- br-set-external-id br-ctlplane bridge-id br-ctlplane -- set bridge br-ctlplane fail_mode=standalone" [root@undercloud-7 ~]# ~~~ ~~~ Additional info: confirmed both in a lab and in a customer environment
In a lab, I downgraded neutron and rebooted my undercloud - that fixed the issue for me: ~~~ yum downgrade openstack-neutron-openvswitch-1:9.4.1-19.el7ost.noarch openstack-neutron-common-1:9.4.1-19.el7ost.noarch python-neutron-1:9.4.1-19.el7ost.noarch openstack-neutron-ml2-9.4.1-19.el7ost.noarch openstack-neutron-9.4.1-19.el7ost.noarch -y ~~~
Changelog since -19: 2018-07-26 Assaf Muller <amuller> 1:9.4.1-28 - Disallow router interface out of subnet IP range (rhbz#1608087) 2018-07-18 Brian Haley <bhaley> 1:9.4.1-27 - DVR: Inter Tenant Traffic between networks not possible with shared net (rhbz#1600180) 2018-06-19 Slawek Kaplonski <skaplons> 1:9.4.1-26 - [OVS] Add mac-table-size to be set on each ovs bridge (rhbz#1589031) 2018-06-05 Brian Haley <bhaley> 1:9.4.1-25 - Disable IPv6 forwarding by default on HA routers (rhbz#1584845) 2018-05-30 Brian Haley <bhaley> 1:9.4.1-24 - Retry dhcp_release on failures (rhbz#1578414) 2018-05-22 Jakub Libosvar <libosvar> 1:9.4.1-23 - Override ovsdb_timeout default value in ovs_cleanup tool (rhbz#1532280) - Improve DbListCommand operation from O(n^2) to O(n) (rhbz#1575696) - Avoid agents adding ports as trunk by default. (rhbz#1575706) - Don't delete flows on ports which were on dead vlan during plug (rhbz#1575706 rhbz#1579400) 2018-05-11 Sławek Kapłoński <slawek> 1:9.4.1-22 - Monitor phys_bridges to reconfigured it if created again (rhbz#1576256) - Adding fix to failing scenario tests in CI (rhbz#1575356) 2018-05-04 Sławek Kapłoński <slawek> 1:9.4.1-21 - Fix parallel deletion of subnets. (rhbz#1511394)
Hi, I identified the issue. The original bridge configuration is: ~~~ ov[root@undercloud-7 ~]# ovs-vsctl list-bridge br-ctlplane ovs-vsctl: unknown command 'list-bridge'; use --help for help [root@undercloud-7 ~]# ovs-vsctl list bridge br-ctlplane _uuid : d56235c5-4933-4334-b33b-be2134c99995 auto_attach : [] controller : [] datapath_id : "0000525400ec14c2" datapath_type : "" datapath_version : "<unknown>" external_ids : {bridge-id=br-ctlplane} fail_mode : standalone flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : br-ctlplane netflow : [] other_config : {hwaddr="52:54:00:ec:14:c2"} ports : [054cde3c-0e02-497d-ac25-be8e6992f708, fcbfcff7-d6b8-4bce-824d-085a681663cf] protocols : [] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false ~~~ The new version of neutron-openvswitch-agent sets this: ~~~ 2018-10-02 12:31:43.032 3949 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=1): DbSetCommand(table=Bridge, col_values=(('other_config', {'mac-table-size': '50000'}),), record=br-ctlplane) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 ~~~ Which removes the hwaddr: ~~~ [root@undercloud-7 ~]# ovs-vsctl list bridge br-ctlplane _uuid : 334f1314-e024-4c0e-ad6f-acddaa43bb40 auto_attach : [] controller : [505d73e7-4049-44b8-862c-e19e556bc051] datapath_id : "000016134f330e4c" datapath_type : system datapath_version : "<unknown>" external_ids : {bridge-id=br-ctlplane} fail_mode : secure flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : br-ctlplane netflow : [] other_config : {mac-table-size="50000"} ports : [18c205e9-c869-4c0b-a24a-18e249cf4f3e, 90ab6c75-f108-4716-a328-9c26ba7b1a75] protocols : ["OpenFlow10", "OpenFlow13"] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false ~~~ When it should run something similar to this manual command: ~~~ [root@undercloud-7 ~]# ovs-vsctl set bridge br-ctlplane other-config:mac-table-size=50000 [root@undercloud-7 ~]# ovs-vsctl list bridge br-ctlplane _uuid : d56235c5-4933-4334-b33b-be2134c99995 auto_attach : [] controller : [] datapath_id : "0000525400ec14c2" datapath_type : "" datapath_version : "<unknown>" external_ids : {bridge-id=br-ctlplane} fail_mode : standalone flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : br-ctlplane netflow : [] other_config : {hwaddr="52:54:00:ec:14:c2", mac-table-size="50000"} ports : [054cde3c-0e02-497d-ac25-be8e6992f708, fcbfcff7-d6b8-4bce-824d-085a681663cf] protocols : [] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false [root@undercloud-7 ~]# [root@undercloud-7 ~]# ip link ls dev br-ctlplane 14: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:ec:14:c2 brd ff:ff:ff:ff:ff:ff ~~~ The neutron OVS agent issue can be reproduced manually: ~~~ [root@undercloud-7 ~]# ovs-vsctl set bridge br-ctlplane other-config='{mac-table-size=50000}' [root@undercloud-7 ~]# ovs-vsctl list bridge br-ctlplane | grep other other_config : {mac-table-size="50000"} [root@undercloud-7 ~]# ip link ls dev br-ctlplane 14: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether c6:35:62:d5:34:43 brd ff:ff:ff:ff:ff:ff [root@undercloud-7 ~]# ~~~ In the meantime, a workaround is to downgrade neutron-openvswitch as indicated earlier: ~~~ yum downgrade openstack-neutron-openvswitch-1:9.4.1-19.el7ost.noarch openstack-neutron-common-1:9.4.1-19.el7ost.noarch python-neutron-1:9.4.1-19.el7ost.noarch openstack-neutron-ml2-9.4.1-19.el7ost.noarch openstack-neutron-9.4.1-19.el7ost.noarch -y ~~~
The bug exists upstream as well: https://github.com/openstack/neutron/blob/master/neutron/agent/common/ovs_lib.py#L278
I checked OSP 13 and the bug seems to affect OSP 10, but not 13. Here's OSP 13 after the latest update: [root@undercloud-1 ~]# ovs-vsctl show 236d2d20-1f4d-4719-b54b-df9fd0fe4ad8 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-int Controller "tcp:127.0.0.1:6633" fail_mode: secure Port br-int Interface br-int type: internal Port int-br-ctlplane Interface int-br-ctlplane type: patch options: {peer=phy-br-ctlplane} Bridge br-ctlplane Controller "tcp:127.0.0.1:6633" fail_mode: secure Port "eth0" Interface "eth0" Port phy-br-ctlplane Interface phy-br-ctlplane type: patch options: {peer=int-br-ctlplane} Port br-ctlplane Interface br-ctlplane type: internal ovs_version: "2.9.0" [root@undercloud-1 ~]# ovs-vsctl list bridge br-ctlplane _uuid : e8b50322-716e-4b0e-9c6e-ab4e46075804 auto_attach : [] controller : [ec643c44-33fa-406b-ae5c-f0798f639233] datapath_id : "00005254008b011b" datapath_type : system datapath_version : "<unknown>" external_ids : {bridge-id=br-ctlplane} fail_mode : secure flood_vlans : [] flow_tables : {} ipfix : [] mcast_snooping_enable: false mirrors : [] name : br-ctlplane netflow : [] other_config : {hwaddr="52:54:00:8b:01:1b", mac-table-size="50000"} ports : [638b9ed4-8653-41f4-96ce-69de4d669af5, dcf35238-d0be-47ad-9a14-fa94641f86d2, eefafbb1-a0dc-4e38-8e7a-851833e38af3] protocols : ["OpenFlow10", "OpenFlow13"] rstp_enable : false rstp_status : {} sflow : [] status : {} stp_enable : false [root@undercloud-1 ~]# rpm -qa | grep neutron puppet-neutron-12.4.1-1.3aa3109git.el7ost.noarch openstack-neutron-openvswitch-12.0.3-5.el7ost.noarch python2-neutronclient-6.7.0-1.el7ost.noarch openstack-neutron-ml2-12.0.3-5.el7ost.noarch python-neutron-12.0.3-5.el7ost.noarch openstack-neutron-12.0.3-5.el7ost.noarch openstack-neutron-common-12.0.3-5.el7ost.noarch python2-ironic-neutron-agent-1.0.0-1.el7ost.noarch python2-neutron-lib-1.13.0-1.el7ost.noarch [root@undercloud-1 ~]# [root@undercloud-1 ~]# ip link ls dev br-ctlplane 7: br-ctlplane: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 52:54:00:8b:01:1b brd ff:ff:ff:ff:ff:ff [root@undercloud-1 ~]#
My unit tests from patch set 4 revealed that the issue does not exist in master: https://review.openstack.org/#/c/607341/4 Given that the unit test passes with and without my patch, I figured that the patch may be in older versions of OSP only. I then tried with OSP 13 (what I should have done in the first place) and it's fixed in there. I'm going to dig a bit to see what the code differences are.
I dug into this further. The problem in OSP 10 is that stuff is simply replaced when a map needs to be set: OSP 10: /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/native/commands.py ~~~ class DbSetCommand(BaseCommand): def __init__(self, api, table, record, *col_values): super(DbSetCommand, self).__init__(api) self.table = table self.record = record self.col_values = col_values def run_idl(self, txn): record = idlutils.row_by_record(self.api.idl, self.table, self.record) for col, val in self.col_values: # TODO(twilson) Ugh, the OVS library doesn't like OrderedDict # We're only using it to make a unit test work, so we should fix # this soon. if isinstance(val, collections.OrderedDict): val = dict(val) setattr(record, col, val) ~~~ This seems to be fixed here in OSP 13, if I read this correctly: OSP 13: /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/command.py ~~~ class DbSetCommand(BaseCommand): def __init__(self, api, table, record, *col_values): super(DbSetCommand, self).__init__(api) self.table = table self.record = record self.col_values = col_values def run_idl(self, txn): record = idlutils.row_by_record(self.api.idl, self.table, self.record) for col, val in self.col_values: # TODO(twilson) Ugh, the OVS library doesn't like OrderedDict # We're only using it to make a unit test work, so we should fix # this soon. if isinstance(val, collections.OrderedDict): val = dict(val) if isinstance(val, dict): # NOTE(twilson) OVS 2.6's Python IDL has mutate methods that # would make this cleaner, but it's too early to rely on them. existing = getattr(record, col, {}) existing.update(val) val = existing self.set_column(record, col, val) ~~~ I suppose it's that commit here: https://github.com/openstack/ovsdbapp/commit/558783eba2987a86a1b838a5a89d8957eb950f94 (but I'm not sure) Unfortunately, it's not trivial to simply backport that change due to the significant API changes over time, so I cannot prove this: ~~~ class DbSetCommand(BaseCommand): def __init__(self, api, table, record, *col_values): super(DbSetCommand, self).__init__(api) self.table = table self.record = record self.col_values = col_values # def run_idl(self, txn): # record = idlutils.row_by_record(self.api.idl, self.table, self.record) # for col, val in self.col_values: # # TODO(twilson) Ugh, the OVS library doesn't like OrderedDict # # We're only using it to make a unit test work, so we should fix # # this soon. # if isinstance(val, collections.OrderedDict): # val = dict(val) # setattr(record, col, val) @classmethod def set_column(cls, row, col, val): setattr(row, col, idlutils.db_replace_record(val)) @classmethod def set_columns(cls, row, **columns): for col, val in columns.items(): cls.set_column(row, col, val) def run_idl(self, txn): record = idlutils.row_by_record(self.api.idl, self.table, self.record) for col, val in self.col_values: # TODO(twilson) Ugh, the OVS library doesn't like OrderedDict # We're only using it to make a unit test work, so we should fix # this soon. if isinstance(val, collections.OrderedDict): val = dict(val) if isinstance(val, dict): # NOTE(twilson) OVS 2.6's Python IDL has mutate methods that # would make this cleaner, but it's too early to rely on them. existing = getattr(record, col, {}) existing.update(val) val = existing self.set_column(record, col, val) # ~~~ ~~~ 2018-10-05 16:15:21.567 24827 ERROR neutron Traceback (most recent call last): 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/bin/neutron-openvswitch-agent", line 10, in <module> 2018-10-05 16:15:21.567 24827 ERROR neutron sys.exit(main()) 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/cmd/eventlet/plugins/ovs_ neutron_agent.py", line 20, in main 2018-10-05 16:15:21.567 24827 ERROR neutron agent_main.main() 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/main.py", line 51, in main 2018-10-05 16:15:21.567 24827 ERROR neutron mod.main() 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/main.py", line 35, in main 2018-10-05 16:15:21.567 24827 ERROR neutron 'neutron.plugins.ml2.drivers.openvswitch.agent.' 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/base/app_manager.py", line 375, in run_apps 2018-10-05 16:15:21.567 24827 ERROR neutron hub.joinall(services) 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 97, in joinall 2018-10-05 16:15:21.567 24827 ERROR neutron t.wait() 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait 2018-10-05 16:15:21.567 24827 ERROR neutron return self._exit_event.wait() 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 125, in wait 2018-10-05 16:15:21.567 24827 ERROR neutron current.throw(*self._exc) 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main 2018-10-05 16:15:21.567 24827 ERROR neutron result = function(*args, **kwargs) 2018-10-05 16:15:21.567 24827 ERROR neutron File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 59, in _launch 2018-10-05 16:15:21.567 24827 ERROR neutron raise e 2018-10-05 16:15:21.567 24827 ERROR neutron AttributeError: 'module' object has no attribute 'db_replace_record' 2018-10-05 16:15:21.567 24827 ERROR neutron ~~~ So taken that into consideration, for OSP 10, it may be easier to go with what I suggested upstream but what I'm going to scrap there (because in upstream and downstream OSP 13 it's fixed): https://review.openstack.org/#/c/607341/4/neutron/agent/common/ovs_lib.py
Alternatively, we could consider the MAC address table size change as introducing a regression and revert it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4298