Bug 1372369 - Backport to mitaka: Set secure fail mode for physical bridges
Summary: Backport to mitaka: Set secure fail mode for physical bridges
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Hynek Mlnarik
QA Contact: Eran Kuris
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-01 14:05 UTC by Hynek Mlnarik
Modified: 2019-02-17 03:54 UTC (History)
8 users (show)

Fixed In Version: openstack-neutron-8.1.2-5.el7ost
Doc Type: Bug Fix
Doc Text:
Prior to this update, the `Fail` mode on OVS physical bridges was not set, defaulting to `standalone`. However, this meant that when `ofctl_interface` was set to `native`, and the interface became unavailable (due to heavy load, OVS agent shutdown, network disruption), the flows on physical bridges could be cleared. Consequently, the physical bridge traffic was disrupted. With this update, the OVS physical bridge fail mode has been set to `secure`. As a result, flows are retained on physical bridges.
Clone Of:
Environment:
Last Closed: 2016-10-05 14:23:57 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1607787 0 None None None 2016-09-01 14:05:49 UTC
OpenStack gerrit 355315 0 None None None 2016-09-01 14:14:14 UTC
Red Hat Product Errata RHBA-2016:2009 0 normal SHIPPED_LIVE openstack-neutron bug fix advisory 2016-10-05 18:23:45 UTC

Description Hynek Mlnarik 2016-09-01 14:05:49 UTC
Description of problem:

Restarting current ovs neutron agent under a heavy load with Ryu (ofctl=native) leads intermittently to disruption of traffic as manifested by occasional failures in not-yet-commited fullstack test [1]. More specifically, the reason seems to be too slow restart of Ryu controller in combination with OVS vswitchd timeouts. The disruption of the traffic occurs always after the following log entry is recorded in ovs-vswitchd.log:

  fail_open|WARN|Could not connect to controller (or switch failed controller's post-connection admission control policy) for 15 seconds, failing open

This issue is manifested regardless of network type (VLAN, flat) and vsctl interface (cli, native). It has not occured with ofctl=cli though.

The issue occurs for physical switches as they are in default fail mode, meaning that once controller connection is lost, ovs takes over the management of the flows and clears them. This conclusion is based on flows dumped from situation just before the traffic was blocked and after that event when flows were cleared. Before the event:

  ====== br-eth23164fdb4 =======
  Fri Jul 29 10:44:04 UTC 2016
  OFPST_FLOW reply (OF1.3) (xid=0x2):
   cookie=0x921fc02d0b4f49e1, duration=16.137s, table=0, n_packets=16, n_bytes=1400, priority=4,in_port=2,dl_vlan=1 actions=set_field:5330->vlan_vid,NORMAL
   cookie=0x921fc02d0b4f49e1, duration=25.647s, table=0, n_packets=6, n_bytes=508, priority=2,in_port=2 actions=drop
   cookie=0x921fc02d0b4f49e1, duration=26.250s, table=0, n_packets=16, n_bytes=1400, priority=0 actions=NORMAL

After the disruption:

  ====== br-eth23164fdb4 =======
  Fri Jul 29 10:44:05 UTC 2016
  OFPST_FLOW reply (OF1.3) (xid=0x2):

The same bug apperars in this condition (courtesy of Jakub Libosvar): setup a phys bridge, block the traffic for the respective bridge controller via iptables until OVS timeout occurs, and then check openflow rules of the affected bridge.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
OVS flows are cleared after controller timeout

Expected results:
OVS flows are retained

Additional info:

Comment 2 Eran Kuris 2016-09-25 07:28:39 UTC
Can you please add steps to reproduce ? 
what is it heavy load ? This is very general.

Comment 3 Hynek Mlnarik 2016-09-25 08:26:06 UTC
See upstream bug: https://bugs.launchpad.net/neutron/+bug/1607787

Comment 4 Eran Kuris 2016-09-26 11:01:47 UTC
verified on 
[root@overcloud-controller-0 ~]# rpm -qa |grep neutron 
openstack-neutron-lbaas-8.0.0-1.el7ost.noarch
openstack-neutron-common-8.1.2-5.el7ost.noarch
openstack-neutron-metering-agent-8.1.2-5.el7ost.noarch
python-neutron-lib-0.0.2-1.el7ost.noarch
python-neutronclient-4.1.1-2.el7ost.noarch
openstack-neutron-bigswitch-agent-2015.3.8-1.el7ost.noarch
python-neutron-8.1.2-5.el7ost.noarch
openstack-neutron-ml2-8.1.2-5.el7ost.noarch
python-neutron-lbaas-8.0.0-1.el7ost.noarch
openstack-neutron-openvswitch-8.1.2-5.el7ost.noarch
openstack-neutron-8.1.2-5.el7ost.noarch
openstack-neutron-bigswitch-lldp-2015.3.8-1.el7ost.noarch


RHOS-9 with OSPD-9

Comment 6 errata-xmlrpc 2016-10-05 14:23:57 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2009.html


Note You need to log in before you can comment on or make changes to this bug.