1416307 – neutron-openvswitch-agent won't create ovs flows upon restart

Bug 1416307 - neutron-openvswitch-agent won't create ovs flows upon restart

Summary: neutron-openvswitch-agent won't create ovs flows upon restart

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	python-ryu
Sub Component:
Version:	10.0 (Newton)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	z3
Target Release:	10.0 (Newton)
Assignee:	Miguel Angel Ajo
QA Contact:	Eran Kuris
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-25 08:51 UTC by Irina Petrova
Modified:	2020-06-11 13:14 UTC (History)
CC List:	17 users (show)
Fixed In Version:	python-ryu-4.9-2.1.el7ost,python-tinyrpc-0.5-3.el7ost
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-06-28 15:27:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2916661	0	None	None	None	2017-02-09 07:53:23 UTC
Red Hat Product Errata	RHBA-2017:1587	0	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 Bug Fix and Enhancement Advisory	2017-06-28 19:11:42 UTC

Description Irina Petrova 2017-01-25 08:51:58 UTC

Description of problem:

A customer is mapping multiple flat provider networks on a single bonded interface, each network with its separate bridge. 
According to our manual [1] this is not the way to go: 
~~~
If there are multiple flat provider networks, each of them should have separate physical interface and bridge to connect them to external network.
~~~
However, the setup is expected to work.

Upon reboot of the Compute node (running RHOS 10, ovs 2.5) the flows are being re-created only on __one bridge__ (out of many). The customer has reported that it's a random bridge: after a few reboots of the Compute node, the bridge __with__ flows changes to another one.

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/networking-guide/#using_flat_provider_networks


Version-Release number of selected component (if applicable):

It happens in OSP10 (post upgrade from working OSP9, as well as on a fresh OSP10 installment).
We tried downgrading openvswitch to 2.4: it didn't work.
We tried also downgrading ovs to 2.4 and booting kernel 3.10.0-327.36.2.el7: no luck.
The customer has tried upgrading ovs to 2.6: no change.


How reproducible:
Setup a Compute node with multiple flat provider networks on a single bonded interface in OSP10.


Actual results:
There are no ovs flows on the bridges. There is no connectivity on the provider networks.


Expected results:
There are ovs flows on the bridges. Instances are able to ping on the provider network(s).


Workaround that brings back the flows (but there's still __no__ ping on the provider network):

1) Delete all bridges (ovs-vsctl list-br  | xargs -I% ovs-vsctl del-br %)
2) Restart network (systemctl restart network &)
3) Restart neutron-openvswitch-agent (systemctl restart neutron-openvswitch-agent)
4) Look at your marvellous flows (ovs-vsctl list-br | xargs -I% ovs-ofctl dump-flows %)


Additional info:

At some point during a troubleshooting session we managed to get back the connectivity. However, after a reboot it was lost once again. Since then, we haven't been able to resume connectivity back.

The multiple flat provider networks on a single interface has been discussed here: http://post-office.corp.redhat.com/archives/rhos-tech/2017-January/msg00130.html

Comment 1 David Hill 2017-01-26 00:45:38 UTC

It's mapped on a bonded interface using VLANs and each VLANs is mapped to a flat network within the openvswitch-agent.  This is working alright but when rebooting the compute node, for some reasons, neutron-openvswitch-agent will recreate the flows only on the first bridge and skip the other ones.  We have a workaround for that which is to delete everything and restart everything ... neutron-openvswitch-agent will then recreate the required flows for all bridges.

Comment 2 Assaf Muller 2017-01-26 11:40:46 UTC

We're going to need an SOS report from a controller and a compute, as well as the OSPd files that were used to deploy the setup.

Comment 4 Irina Petrova 2017-02-02 11:46:16 UTC

Hey Ajo,

Any additional info needed?

--Irina

Comment 24 Eran Kuris 2017-05-03 10:40:22 UTC

fixed verified 
 rpm -qa |grep python-tinyrpc
python-tinyrpc-0.5-3.el7ost.noarch
[root@controller-0 ~]# rpm -qa |grep python-ryu-
python-ryu-common-4.9-2.1.el7ost.noarch
python-ryu-4.9-2.1.el7ost.noarch

Comment 27 errata-xmlrpc 2017-06-28 15:27:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1587

Note You need to log in before you can comment on or make changes to this bug.