Bug 1451082 - [OSP10] Rebase oslo.rootwrap onto release 5.1.2
Summary: [OSP10] Rebase oslo.rootwrap onto release 5.1.2
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-oslo-rootwrap
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 10.0 (Newton)
Assignee: Victor Stinner
QA Contact: Udi Shkalim
URL:
Whiteboard:
: 1484470 (view as bug list)
Depends On:
Blocks: 1450223 1456476
TreeView+ depends on / blocked
 
Reported: 2017-05-15 17:43 UTC by Ihar Hrachyshka
Modified: 2022-08-09 14:05 UTC (History)
14 users (show)

Fixed In Version: python-oslo-rootwrap-5.1.2-1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1450223
: 1456476 (view as bug list)
Environment:
Last Closed: 2017-09-06 17:06:29 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 464708 0 None MERGED Release new stable oslo.rootwrap releases 2020-10-01 20:55:33 UTC
RDO 6648 0 None None None 2017-05-15 17:43:05 UTC
Red Hat Issue Tracker OSP-8576 0 None None None 2022-08-09 14:05:03 UTC
Red Hat Product Errata RHBA-2017:2653 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 Bug Fix and Enhancement Advisory 2017-09-06 20:54:38 UTC

Description Ihar Hrachyshka 2017-05-15 17:43:06 UTC
+++ This bug was initially created as a clone of Bug #1450223 +++

Description of problem:

Noticed the issue of port failed to bind on a router's external gateway interface. Digging in we found it failed to bind because ovs agent was dead. Looking at ovs debug logs we see a neutron rootwrap process seems to be taking it's socket and it can not start correctly? Workaround is to kill the rootwrap process , restart neutron-openv, clear the router gateway and reattach it .. See notes below in additional info section. We need to figure out why this keeps happening and fix it.


Version-Release number of selected component (if applicable):
openstack-neutron-9.2.0-2.el7ost.noarch  
openstack-neutron-openvswitch-9.2.0-2.el7ost.noarch   
How reproducible:
Seems to keep happening.

Steps to Reproduce:
1. unknown
2.
3.

Actual results:

port failed to bind prevents pinging floating ips.
Expected results:
openvswitch agent up and no port binding failures.



Additional info:

/notes
###port binding failed due to dead ovs agent.
2017-05-08 16:02:19.921 580770 WARNING neutron.plugins.ml2.drivers.mech_agent [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Refusing to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a to dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2017, 4, 14, 0, 9, 37), 'availability_zone': None, 'alive': False, 'topic': u'N/A', 'host': u'overcloud-controller-2.localdomain', 'agent_type': u'Open vSwitch agent', 'resource_versions': {u'SubPort': u'1.0', u'QosPolicy': u'1.3', u'Trunk': u'1.0'}, 'created_at': datetime.datetime(2016, 10, 6, 17, 56, 19), 'started_at': datetime.datetime(2017, 4, 11, 1, 29, 5), 'id': u'1afbdf75-05d0-4f10-bab1-380c3ce846bc', 'configurations': {u'ovs_hybrid_plug': True, u'in_distributed_mode': False, u'datapath_type': u'system', u'vhostuser_socket_dir': u'/var/run/openvswitch', u'tunneling_ip': u'192.168.3.17', u'arp_responder_enabled': False, u'devices': 44, u'ovs_capabilities': {u'datapath_types': [u'netdev', u'system'], u'iface_types': [u'geneve', u'gre', u'internal', u'ipsec_gre', u'lisp', u'patch', u'stt', u'system', u'tap', u'vxlan']}, u'log_agent_heartbeats': False, u'l2_population': False, u'tunnel_types': [u'vxlan'], u'extensions': [u'qos'], u'enable_distributed_routing': False, u'bridge_mappings': {u'datacentre': u'br-ex'}}}
2017-05-08 16:02:19.921 580770 ERROR neutron.plugins.ml2.managers [req-533fe3ea-77c8-40fc-b58f-4ab050c81720 - - - - -] Failed to bind port 544d8ec2-d351-4e49-baf1-8b67b6fb482a on host overcloud-controller-2.localdomain for vnic_type normal using segments [{'segmentation_id': 3291, 'physical_network': u'datacentre', 'id': u'867a59b9-d4d8-42c9-bda8-a1f54cccab88', 'network_type': u'vlan'}]


###agent-list
| 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain |                   | xxx   | True           | neutron-openvswitch-agent |


###debug ovs logs
2017-05-10 18:22:00.481 874615 ERROR ryu.lib.hub [-] hub: uncaught exception: Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 97, in __call__
    self.ofp_ssl_listen_port)
  File "/usr/lib/python2.7/site-packages/ryu/controller/controller.py", line 120, in server_loop
    datapath_connection_factory)
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 117, in __init__
    self.server = eventlet.listen(listen_info)
  File "/usr/lib/python2.7/site-packages/eventlet/convenience.py", line 43, in listen
    sock.bind(addr)
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 98] Address already in use

2017-05-10 18:22:00.482 874615 DEBUG neutron.agent.linux.utils [-] Exit code: 0 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:146
2017-05-10 18:22:01.123 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): AddBridgeCommand(datapath_type=system, may_exist=True, name=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetFailModeCommand(bridge=br-int, mode=secure) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.124 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Bridge, col_values=(('protocols', ['OpenFlow10', 'OpenFlow13']),), record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.125 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125

2017-05-10 18:22:01.126 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): SetControllerCommand(bridge=br-int, targets=['tcp:127.0.0.1:6633']) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98

2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=controller, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.270 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.271 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbSetCommand(table=Controller, col_values=(('connection_mode', 'out-of-band'),), record=aa338024-faea-45ef-b00e-2e5e02e85597) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.384 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Running txn command(idx=0): DbGetCommand(column=datapath_id, table=Bridge, record=br-int) do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:98
2017-05-10 18:22:01.385 874615 DEBUG neutron.agent.ovsdb.impl_idl [-] Transaction caused no change do_commit /usr/lib/python2.7/site-packages/neutron/agent/ovsdb/impl_idl.py:125
2017-05-10 18:22:01.385 874615 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID 0000b2e9944d644f

### non working controller.
root@overcloud-controller-2 neutron]# netstat -tulpn |grep 6633
tcp       51      0 127.0.0.1:6633          0.0.0.0:*               LISTEN      170970/sudo
[root@overcloud-controller-2 neutron]#


[root@overcloud-controller-2 neutron]# ps aux |grep 170970
root      170970  0.0  0.0 193332  2792 ?        S    Apr11   0:00 sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf




### working system..
[root@overcloud-controller-1 ~]# netstat -tulpn |grep 6633
tcp        0      0 127.0.0.1:6633          0.0.0.0:*               LISTEN      275273/python2
[root@overcloud-controller-1 ~]#

[root@overcloud-controller-1 ~]# ps aux|grep 275273
neutron   275273  5.6  0.0 394056 99824 ?        Ss   Apr19 1770:07 /usr/bin/python2 /usr/bin/neutron-openvswitch-agent --config-file /usr/share/neutron/neutron-dist.conf --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/openvswitch_agent.ini --config-dir /etc/neutron/conf.d/common --config-dir /etc/neutron/conf.d/neutron-openvswitch-agent --log-file /var/log/neutron/openvswitch-agent.log


#### kill rootwrap on controller2 to allow neutron-openv to start up correctlly.

+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| id                                   | agent_type         | host                               | availability_zone | alive | admin_state_up | binary                    |
+--------------------------------------+--------------------+------------------------------------+-------------------+-------+----------------+---------------------------+
| 0b42df33-b5dc-4ffe-b452-9d4eaf2f2701 | L3 agent           | overcloud-controller-1.localdomain | nova              | :-)   | True           | neutron-l3-agent          |
| 1afbdf75-05d0-4f10-bab1-380c3ce846bc | Open vSwitch agent | overcloud-controller-2.localdomain |                   | :-)   | True           | neutron-openvswitch-agent |

Comment 1 Red Hat Bugzilla Rules Engine 2017-05-15 17:43:21 UTC
This bugzilla has been removed from the release and needs to be reviewed and Triaged for another Target Release.

Comment 2 Ihar Hrachyshka 2017-05-15 17:45:55 UTC
We will need to backport the following fixes:

https://review.openstack.org/448745
https://review.openstack.org/425607

The easiest is probably to release new libraries: https://review.openstack.org/#/c/464708/ and rebase RH-OSP.

Comment 3 Victor Stinner 2017-05-29 14:43:25 UTC
Ok, the python-oslo-rootwrap-5.1.2-1.el7ost package is now available for tests. 

See the original issue for tests: https://bugzilla.redhat.com/show_bug.cgi?id=1450223

Comment 5 Udi Shkalim 2017-08-15 13:35:14 UTC
Verified on python-oslo-rootwrap-5.1.2-1

Created and deleted routers, set and cleared gateway multiple times.
Booted multiple instances. No errors seen in the logs

Comment 7 errata-xmlrpc 2017-09-06 17:06:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2653

Comment 8 Pablo Caruana 2017-10-24 09:45:07 UTC
*** Bug 1484470 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.