+++ This bug was initially created as a clone of Bug #1450223 +++ Description of problem: Noticed the issue of port failed to bind on a router's external gateway interface. Digging in we found it failed to bind because ovs agent was dead. Looking at ovs debug logs we see a neutron rootwrap process seems to be taking it's socket and it can not start correctly? Workaround is to kill the rootwrap process , restart neutron-openv, clear the router gateway and reattach it .. See notes below in additional info section. We need to figure out why this keeps happening and fix it. Version-Release number of selected component (if applicable): openstack-neutron-9.2.0-2.el7ost.noarch openstack-neutron-openvswitch-9.2.0-2.el7ost.noarch How reproducible: Seems to keep happening. Steps to Reproduce: 1. unknown 2. 3. Actual results: port failed to bind prevents pinging floating ips. Expected results: openvswitch agent up and no port binding failures. Additional info: /notes ###port binding failed due to dead ovs agent. 2017-05-08 16:02:19.921 580770 WARNING neutron.plugins.ml2.drivers.mech_agent [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Refusing to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 4, 14, 0, 9, 37), 'availability_zone': None, 'alive': False, 'topic': u'N/A', 'host': u'overcloud-controller-2.localdomain', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'SubPort': u'1.0', u'QosPolicy': u'1.3', u'Trunk': u'1.0'}, 'created_at': datetime.datetime(2016, 10, 6, 17, 56, 19), 'started_at': datetime.datetime(2017, 4, 11, 1, 29, 5), 'id': u'1afbdf75-05d0-4f10-bab1-380c3ce846bc', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'tunneling_ip': u'192.168.3.17', u'arp_responder_enabled': False, u'devices': 44, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'ipsec_gre', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'log_agent_heartbeats': False, u'l2_population': False, u'tunnel_types': [u'vxlan'], u'extensions': [u'qos'], u'enable_distributed_routing': False, u'bridge_mappings': {u'datacentre': u'br-ex'}}} 2017-05-08 16:02:19.921 580770 ERROR neutron.plugins.ml2.managers [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Failed to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a on host overcloud-controller-2.localdomain for vnic_type normal using segments [{'segmentation_id': 3291, 'physical_network': u'datacentre', 'id': u'867a59b9-d4d8-42c9-bda8-a1f54cccab88', 'network_type': u'vlan'}] ###agent-list | 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain | | xxx | True | neutron-openvswitch-agent | ###debug ovs logs 2017-05-10 18:22:00.481 874615 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 97, in __call__ self.ofp_ssl_listen_port) File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 120, in server_loop datapath_connection_factory) File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 117, in __init__ self.server = eventlet.listen(listen_info) File "/usr/lib/python2.7/site-packages/eventlet/convenience.py", line 43, in listen sock.bind(addr) File "/usr/lib64/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(*args) error: [Errno 98] Address already in use 2017-05-10 18:22:00.482 874615 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:146 2017-05-10 18:22:01.123 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125 2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetFailModeCommand(bridge=br-int, mode=secure) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125 2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Bridge, col_values=(('protocols', ['OpenFlow10', 'OpenFlow13']),), record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125 2017-05-10 18:22:01.126 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633']) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=controller, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125 2017-05-10 18:22:01.271 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Controller, col_values=(('connection_mode', 'out-of-band'),), record=aa338024-faea-45ef-b00e-2e5e02e85597) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.384 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=datapath_id, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98 2017-05-10 18:22:01.385 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125 2017-05-10 18:22:01.385 874615 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID 0000b2e9944d644f ### non working controller. root@overcloud-controller-2 neutron]# netstat -tulpn |grep 6633 tcp 51 0 127.0.0.1:6633 0.0.0.0:* LISTEN 170970/sudo [root@overcloud-controller-2 neutron]# [root@overcloud-controller-2 neutron]# ps aux |grep 170970 root 170970 0.0 0.0 193332 2792 ? S Apr11 0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf ### working system.. [root@overcloud-controller-1 ~]# netstat -tulpn |grep 6633 tcp 0 0 127.0.0.1:6633 0.0.0.0:* LISTEN 275273/python2 [root@overcloud-controller-1 ~]# [root@overcloud-controller-1 ~]# ps aux|grep 275273 neutron 275273 5.6 0.0 394056 99824 ? Ss Apr19 1770:07 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent --log-file /var/log/neutron/openvswitch-agent.log #### kill rootwrap on controller2 to allow neutron-openv to start up correctlly. +--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+ | id | agent_type | host | availability_zone | alive | admin_state_up | binary | +--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+ | 0b42df33-b5dc-4ffe-b452-9d4eaf2f2701 | L3 agent | overcloud-controller-1.localdomain | nova | :-) | True | neutron-l3-agent | | 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain | | :-) | True | neutron-openvswitch-agent |
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.
We will need to backport the following fixes: https://review.openstack.org/448745 https://review.openstack.org/425607 The easiest is probably to release new libraries: https://review.openstack.org/#/c/464708/ and rebase RH-OSP.
Ok, the python-oslo-rootwrap-5.1.2-1.el7ost package is now available for tests. See the original issue for tests: https://bugzilla.redhat.com/show_bug.cgi?id=1450223
Verified on python-oslo-rootwrap-5.1.2-1 Created and deleted routers, set and cleared gateway multiple times. Booted multiple instances. No errors seen in the logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2653
*** Bug 1484470 has been marked as a duplicate of this bug. ***