Bug 1081159

Summary: L3 agent restart causes network outage
Product: Red Hat OpenStack Reporter: Dave Sullivan <dsulliva>
Component: openstack-neutronAssignee: Jakub Libosvar <jlibosva>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: high Docs Contact:
Priority: high    
Version: 4.0CC: breeler, chrisw, dmaley, dsulliva, jlibosva, lpeer, majopela, mnewby, nyechiel, sputhenp, yeylon
Target Milestone: z4Keywords: ZStream
Target Release: 4.0Flags: lpeer: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-2013.2.3-4.el6ost Doc Type: Bug Fix
Doc Text:
Cause: qrouter namespaces were destroyed and recreated during an L3 agent start. Consequence: Ongoing traffic was lost due to missing NAT rules in destroyed namespace. Fix: Namespaces in use are preserved during agent start. Result: Restarting L3 agent has no influence on ongoing traffic via router namespaces.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-05-29 20:19:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dave Sullivan 2014-03-26 16:37:18 UTC
Description of problem:

L3 Network Drops - Floating IP's are not Accessible

Even if neutron services are restarted on management node, floating IP's are not accessible.

Tenants need to restart their instances and then things work.  

Need to determine cause of initial l3 outage.

Appears to be an upstream BZ noted here

https://bugs.launchpad.net/neutron/+bug/1175695

Version-Release number of selected component (if applicable):

current RHOS 4

Comment 2 Maru Newby 2014-03-28 23:04:17 UTC
The upstream bug looks like the probably cause.  The next step 
is cherry-picking the fix for inclusion in stable/havana and figuring out if we can rely on the next sync or we need to manually backport to RHOS.

Comment 3 Ofer Blaut 2014-03-31 08:42:11 UTC
I Have tested on Havana A3 using distributed system 

openstack-neutron-2013.2.2-5.el6ost.noarch

1. I have stopped L3 agent 
2. qrouter namespace is still up, and traffic to floating ip works 
3. while starting  L3 agent traffic stops for ~ 25 seconds  and resume later

[root@puma05 ~]# ip netns | grep qrouter
qrouter-69cf3535-2960-4b11-8e3a-da37c3331f01
[root@puma05 ~]# service neutron-l3-agent stop
Stopping neutron-l3-agent:                                 [  OK  ]
[root@puma05 ~]# ip netns | grep qrouter
qrouter-69cf3535-2960-4b11-8e3a-da37c3331f01
[root@puma05 ~]# openstack-status 
== neutron services ==
neutron-server:                         inactive  (disabled on boot)
neutron-dhcp-agent:                     active
neutron-l3-agent:                       inactive
neutron-metadata-agent:                 active
neutron-lbaas-agent:                    inactive  (disabled on boot)
neutron-openvswitch-agent:              active
== Support services ==
openvswitch:                            active
messagebus:                             active

Comment 10 Ofer Blaut 2014-04-22 10:33:30 UTC
tested by service neutron-l3-agent restart

I have run ping -ni 0.01 <floating ip>  and no packet is lost 

openstack-neutron-2013.2.3-4.el6ost.noarch

Comment 12 errata-xmlrpc 2014-05-29 20:19:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0516.html