Bug 1191922 - ovs-agent restart or ovs restart can cause a network storm bringing down the net
Summary: ovs-agent restart or ovs restart can cause a network storm bringing down the net
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 6.0 (Juno)
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: z1
: 6.0 (Juno)
Assignee: Miguel Angel Ajo
QA Contact: Nir Magnezi
URL:
Whiteboard:
Depends On: 1185521 1186492 1187257 1191633
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-02-12 09:15 UTC by Miguel Angel Ajo
Modified: 2016-04-26 21:09 UTC (History)
17 users (show)

Fixed In Version: openstack-neutron-2014.2.2-3.el7ost
Doc Type: Bug Fix
Doc Text:
Previously, the br-tun bridge was reset (OF rules and ports) during openvswitch-agent restarts, and in some conditions because of neutron-server restarts. Consequently, if a broadcast packet entered br-tun while there were no openflow rules, and at least 2 other hosts br-tun had been reset the same way, the packet generated a network broadcast storm raising the network usage and the Open vSwitch cpu usage on all hosts. This update fixes this issue by setting br-tun automatically into secure mode during reset. As a result, packets will not be forwarded in the absence of openflow rules, and the race condition has been eliminated.
Clone Of: 1185521
Environment:
Last Closed: 2015-03-05 18:21:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1421232 0 None None None Never
OpenStack gerrit 156177 0 None MERGED Setup br-tun in secure fail mode to avoid broadcast storms 2021-02-19 11:58:29 UTC
Red Hat Product Errata RHBA-2015:0635 0 normal SHIPPED_LIVE openstack-neutron bug fix advisory 2015-03-05 23:17:09 UTC

Comment 6 lpeer 2015-02-16 12:22:58 UTC
This is a regression in Juno.
Happens on service restart or ovs restart.
RDO user already encounter it so we suspect our customers would see it as well.

I suggest we include this in our A1 release, a Neutron build with the fix will be available shortly.

Comment 7 Ihar Hrachyshka 2015-02-16 12:43:02 UTC
Why MODIFIED? Moving back to ON_DEV.

Comment 11 Nir Magnezi 2015-02-19 17:56:29 UTC
Verified NVR:
openstack-neutron-2014.2.2-3.el7ost.noarch
openvswitch-2.1.2-2.el7_0.2.x86_64
openstack-neutron-openvswitch-2014.2.2-3.el7ost.noarch

Verification Steps:
===================
1. Deploy[1] a setup with at least 3 nodes running openvswitch + vxlan tunneling[2].

2. For each node running openvswitch, stop neutron ovs agent & restart openvswitch:
   # systemctl stop neutron-openvswitch-agent
   # systemctl restart openvswitch

3. At this point, The br-tun flow table should look like:
   # ovs-ofctl dump-flows br-tun
   NXST_FLOW reply (xid=0x4):

4. For each node running openvswitch, activate br-tun:
   # ip l s br-tun up

5. From one of the nodes (networker for example), Install python scapy (via yum or pip).
   # pip install scapy

6. For each node running openvswitch,, Monitor br-tun:
   # tcpdump -i br-tun -vvv

7. Than, Run this[3] script from the node you selected to generate broadcast:
   # python scapy_script.py

Result:
=======
1. Captured from tr-tun interface via the Node used in step #7:

18:44:57.448606 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto Options (0), length 20)
    169.254.192.1 > 224.0.0.18:  ip 0

2. Nothing apears in br-tun interfaces in the rest of the nodes.

3. CPU levels remain normal.

[1] http://jenkins-hurricane.scl.lab.tlv.redhat.com:8080/job/rhel-osp6-rhel7.1-neutron-ml2-vxlan/
[2] https://github.com/nmagnezi/hurricane/blob/master/plugins/installer/packstack/templates/packstack-juno-neutron-ml2-vxlan.ini
[3] https://github.com/nmagnezi/scripts/blob/master/scapy_script.py

Comment 13 errata-xmlrpc 2015-03-05 18:21:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0635.html


Note You need to log in before you can comment on or make changes to this bug.