Bug 1456476 - [OSP11] Rebase oslo.rootwrap onto release 5.4.1
Summary: [OSP11] Rebase oslo.rootwrap onto release 5.4.1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-rootwrap
Version: 11.0 (Ocata)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z1
: 11.0 (Ocata)
Assignee: Victor Stinner
QA Contact: Udi Shkalim
URL:
Whiteboard:
Depends On: 1451082
Blocks: 1450223
TreeView+ depends on / blocked
 
Reported: 2017-05-29 13:07 UTC by Victor Stinner
Modified: 2020-07-16 09:42 UTC (History)
13 users (show)

Fixed In Version: python-oslo-rootwrap-5.4.1-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1451082
Environment:
Last Closed: 2017-07-19 17:11:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 464708 0 None None None 2017-05-29 13:07:39 UTC
RDO 6648 0 None None None 2017-05-29 13:07:39 UTC
Red Hat Product Errata RHBA-2017:1786 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 11.0 Bug Fix and Enhancement Advisory 2017-07-19 21:10:33 UTC

Description Victor Stinner 2017-05-29 13:07:40 UTC
+++ This bug was initially created as a clone of Bug #1451082 +++

+++ This bug was initially created as a clone of Bug #1450223 +++

Description of problem:

Noticed the issue of port failed to bind on a router's external gateway interface. Digging in we found it failed to bind because ovs agent was dead. Looking at ovs debug logs we see a neutron rootwrap process seems to be taking it's socket and it can not start correctly? Workaround is to kill the rootwrap process , restart neutron-openv, clear the router gateway and reattach it .. See notes below in additional info section. We need to figure out why this keeps happening and fix it.


Version-Release number of selected component (if applicable):
openstack-neutron-9.2.0-2.el7ost.noarch  
openstack-neutron-openvswitch-9.2.0-2.el7ost.noarch   
How reproducible:
Seems to keep happening.

Steps to Reproduce:
1. unknown
2.
3.

Actual results:

port failed to bind prevents pinging floating ips.
Expected results:
openvswitch agent up and no port binding failures.



Additional info:

/notes
###port binding failed due to dead ovs agent.
2017-05-08 16:02:19.921 580770 WARNING neutron.plugins.ml2.drivers.mech_agent [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Refusing to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 4, 14, 0, 9, 37), 'availability_zone': None, 'alive': False, 'topic': u'N/A', 'host': u'overcloud-controller-2.localdomain', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'SubPort': u'1.0', u'QosPolicy': u'1.3', u'Trunk': u'1.0'}, 'created_at': datetime.datetime(2016, 10, 6, 17, 56, 19), 'started_at': datetime.datetime(2017, 4, 11, 1, 29, 5), 'id': u'1afbdf75-05d0-4f10-bab1-380c3ce846bc', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'tunneling_ip': u'192.168.3.17', u'arp_responder_enabled': False, u'devices': 44, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'ipsec_gre', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'log_agent_heartbeats': False, u'l2_population': False, u'tunnel_types': [u'vxlan'], u'extensions': [u'qos'], u'enable_distributed_routing': False, u'bridge_mappings': {u'datacentre': u'br-ex'}}}
2017-05-08 16:02:19.921 580770 ERROR neutron.plugins.ml2.managers [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Failed to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a on host overcloud-controller-2.localdomain for vnic_type normal using segments [{'segmentation_id': 3291, 'physical_network': u'datacentre', 'id': u'867a59b9-d4d8-42c9-bda8-a1f54cccab88', 'network_type': u'vlan'}]


###agent-list
| 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |


###debug ovs logs
2017-05-10 18:22:00.481 874615 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 97, in __call__
    self.ofp_ssl_listen_port)
  File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 120, in server_loop
    datapath_connection_factory)
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 117, in __init__
    self.server = eventlet.listen(listen_info)
  File "/usr/lib/python2.7/site-packages/eventlet/convenience.py", line 43, in listen
    sock.bind(addr)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

2017-05-10 18:22:00.482 874615 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:146
2017-05-10 18:22:01.123 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetFailModeCommand(bridge=br-int, mode=secure) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Bridge, col_values=(('protocols', ['OpenFlow10', 'OpenFlow13']),), record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125

2017-05-10 18:22:01.126 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633']) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98

2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=controller, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.271 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Controller, col_values=(('connection_mode', 'out-of-band'),), record=aa338024-faea-45ef-b00e-2e5e02e85597) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.384 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=datapath_id, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.385 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.385 874615 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID 0000b2e9944d644f

### non working controller.
root@overcloud-controller-2 neutron]# netstat -tulpn |grep 6633
tcp       51      0 127.0.0.1:6633          0.0.0.0:*               LISTEN      170970/sudo
[root@overcloud-controller-2 neutron]#


[root@overcloud-controller-2 neutron]# ps aux |grep 170970
root      170970  0.0  0.0 193332  2792 ?        S    Apr11   0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf




### working system..
[root@overcloud-controller-1 ~]# netstat -tulpn |grep 6633
tcp        0      0 127.0.0.1:6633          0.0.0.0:*               LISTEN      275273/python2
[root@overcloud-controller-1 ~]#

[root@overcloud-controller-1 ~]# ps aux|grep 275273
neutron   275273  5.6  0.0 394056 99824 ?        Ss   Apr19 1770:07 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent --log-file /var/log/neutron/openvswitch-agent.log


#### kill rootwrap on controller2 to allow neutron-openv to start up correctlly.

+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host                               | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| 0b42df33-b5dc-4ffe-b452-9d4eaf2f2701 | L3 agent           | overcloud-controller-1.localdomain | nova              | :-)   | True           | neutron-l3-agent          |
| 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |

--- Additional comment from Red Hat Bugzilla Rules Engine on 2017-05-15 13:43:21 EDT ---

This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

--- Additional comment from Ihar Hrachyshka on 2017-05-15 13:45:55 EDT ---

We will need to backport the following fixes:

https://review.openstack.org/448745
https://review.openstack.org/425607

The easiest is probably to release new libraries: https://review.openstack.org/#/c/464708/ and rebase RH-OSP.

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-29 13:08:00 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Victor Stinner 2017-05-29 14:29:00 UTC
Ok, the python-oslo-rootwrap-5.4.1-1.el7ost package is now available for tests. 

See the original issue for tests: https://bugzilla.redhat.com/show_bug.cgi?id=1450223

Comment 4 Udi Shkalim 2017-07-16 13:12:57 UTC
Verified on: python-oslo-rootwrap-5.4.1-1.el7ost.noarch

Multiple router gateway set and clear.
Reboot the controllers.
No error message.

Comment 6 errata-xmlrpc 2017-07-19 17:11:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1786


Note You need to log in before you can comment on or make changes to this bug.