Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1191922

Summary: ovs-agent restart or ovs restart can cause a network storm bringing down the net
Product: Red Hat OpenStack Reporter: Miguel Angel Ajo <majopela>
Component: openstack-neutronAssignee: Miguel Angel Ajo <majopela>
Status: CLOSED ERRATA QA Contact: Nir Magnezi <nmagnezi>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 6.0 (Juno)CC: apevec, chrisw, fdinitto, fleitner, ihrachys, jbenc, lhh, lpeer, majopela, mlopes, nyechiel, oblaut, rhos-maint, rkhan, sclewis, scohen, yeylon
Target Milestone: z1Keywords: Regression, ZStream
Target Release: 6.0 (Juno)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2014.2.2-3.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the br-tun bridge was reset (OF rules and ports) during openvswitch-agent restarts, and in some conditions because of neutron-server restarts. Consequently, if a broadcast packet entered br-tun while there were no openflow rules, and at least 2 other hosts br-tun had been reset the same way, the packet generated a network broadcast storm raising the network usage and the Open vSwitch cpu usage on all hosts. This update fixes this issue by setting br-tun automatically into secure mode during reset. As a result, packets will not be forwarded in the absence of openflow rules, and the race condition has been eliminated.
Story Points: ---
Clone Of: 1185521 Environment:
Last Closed: 2015-03-05 18:21:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1185521, 1186492, 1187257, 1191633    
Bug Blocks:    

Comment 6 lpeer 2015-02-16 12:22:58 UTC
This is a regression in Juno.
Happens on service restart or ovs restart.
RDO user already encounter it so we suspect our customers would see it as well.

I suggest we include this in our A1 release, a Neutron build with the fix will be available shortly.

Comment 7 Ihar Hrachyshka 2015-02-16 12:43:02 UTC
Why MODIFIED? Moving back to ON_DEV.

Comment 11 Nir Magnezi 2015-02-19 17:56:29 UTC
Verified NVR:
openstack-neutron-2014.2.2-3.el7ost.noarch
openvswitch-2.1.2-2.el7_0.2.x86_64
openstack-neutron-openvswitch-2014.2.2-3.el7ost.noarch

Verification Steps:
===================
1. Deploy[1] a setup with at least 3 nodes running openvswitch + vxlan tunneling[2].

2. For each node running openvswitch, stop neutron ovs agent & restart openvswitch:
   # systemctl stop neutron-openvswitch-agent
   # systemctl restart openvswitch

3. At this point, The br-tun flow table should look like:
   # ovs-ofctl dump-flows br-tun
   NXST_FLOW reply (xid=0x4):

4. For each node running openvswitch, activate br-tun:
   # ip l s br-tun up

5. From one of the nodes (networker for example), Install python scapy (via yum or pip).
   # pip install scapy

6. For each node running openvswitch,, Monitor br-tun:
   # tcpdump -i br-tun -vvv

7. Than, Run this[3] script from the node you selected to generate broadcast:
   # python scapy_script.py

Result:
=======
1. Captured from tr-tun interface via the Node used in step #7:

18:44:57.448606 IP (tos 0x0, ttl 64, id 1, offset 0, flags [none], proto Options (0), length 20)
    169.254.192.1 > 224.0.0.18:  ip 0

2. Nothing apears in br-tun interfaces in the rest of the nodes.

3. CPU levels remain normal.

[1] http://jenkins-hurricane.scl.lab.tlv.redhat.com:8080/job/rhel-osp6-rhel7.1-neutron-ml2-vxlan/
[2] https://github.com/nmagnezi/hurricane/blob/master/plugins/installer/packstack/templates/packstack-juno-neutron-ml2-vxlan.ini
[3] https://github.com/nmagnezi/scripts/blob/master/scapy_script.py

Comment 13 errata-xmlrpc 2015-03-05 18:21:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0635.html