Bug 1416307 - neutron-openvswitch-agent won't create ovs flows upon restart
Summary: neutron-openvswitch-agent won't create ovs flows upon restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-ryu
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z3
: 10.0 (Newton)
Assignee: Miguel Angel Ajo
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-25 08:51 UTC by Irina Petrova
Modified: 2020-06-11 13:14 UTC (History)
17 users (show)

Fixed In Version: python-ryu-4.9-2.1.el7ost,python-tinyrpc-0.5-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-28 15:27:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2916661 0 None None None 2017-02-09 07:53:23 UTC
Red Hat Product Errata RHBA-2017:1587 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 Bug Fix and Enhancement Advisory 2017-06-28 19:11:42 UTC

Description Irina Petrova 2017-01-25 08:51:58 UTC
Description of problem:

A customer is mapping multiple flat provider networks on a single bonded interface, each network with its separate bridge. 
According to our manual [1] this is not the way to go: 
~~~
If there are multiple flat provider networks, each of them should have separate physical interface and bridge to connect them to external network.
~~~
However, the setup is expected to work.

Upon reboot of the Compute node (running RHOS 10, ovs 2.5) the flows are being re-created only on __one bridge__ (out of many). The customer has reported that it's a random bridge: after a few reboots of the Compute node, the bridge __with__ flows changes to another one.

[1] https://access.redhat.com/documentation/en/red-hat-openstack-platform/10/single/networking-guide/#using_flat_provider_networks


Version-Release number of selected component (if applicable):

It happens in OSP10 (post upgrade from working OSP9, as well as on a fresh OSP10 installment).
We tried downgrading openvswitch to 2.4: it didn't work.
We tried also downgrading ovs to 2.4 and booting kernel 3.10.0-327.36.2.el7: no luck.
The customer has tried upgrading ovs to 2.6: no change.


How reproducible:
Setup a Compute node with multiple flat provider networks on a single bonded interface in OSP10.


Actual results:
There are no ovs flows on the bridges. There is no connectivity on the provider networks.


Expected results:
There are ovs flows on the bridges. Instances are able to ping on the provider network(s).


Workaround that brings back the flows (but there's still __no__ ping on the provider network):

1) Delete all bridges (ovs-vsctl list-br  | xargs -I% ovs-vsctl del-br %)
2) Restart network (systemctl restart network &)
3) Restart neutron-openvswitch-agent (systemctl restart neutron-openvswitch-agent)
4) Look at your marvellous flows (ovs-vsctl list-br | xargs -I% ovs-ofctl dump-flows %)


Additional info:

At some point during a troubleshooting session we managed to get back the connectivity. However, after a reboot it was lost once again. Since then, we haven't been able to resume connectivity back.

The multiple flat provider networks on a single interface has been discussed here: http://post-office.corp.redhat.com/archives/rhos-tech/2017-January/msg00130.html

Comment 1 David Hill 2017-01-26 00:45:38 UTC
It's mapped on a bonded interface using VLANs and each VLANs is mapped to a flat network within the openvswitch-agent.  This is working alright but when rebooting the compute node, for some reasons, neutron-openvswitch-agent will recreate the flows only on the first bridge and skip the other ones.  We have a workaround for that which is to delete everything and restart everything ... neutron-openvswitch-agent will then recreate the required flows for all bridges.

Comment 2 Assaf Muller 2017-01-26 11:40:46 UTC
We're going to need an SOS report from a controller and a compute, as well as the OSPd files that were used to deploy the setup.

Comment 4 Irina Petrova 2017-02-02 11:46:16 UTC
Hey Ajo,

Any additional info needed?

--Irina

Comment 24 Eran Kuris 2017-05-03 10:40:22 UTC
fixed verified 
 rpm -qa |grep python-tinyrpc
python-tinyrpc-0.5-3.el7ost.noarch
[root@controller-0 ~]# rpm -qa |grep python-ryu-
python-ryu-common-4.9-2.1.el7ost.noarch
python-ryu-4.9-2.1.el7ost.noarch

Comment 27 errata-xmlrpc 2017-06-28 15:27:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1587


Note You need to log in before you can comment on or make changes to this bug.