Bug 1081159 - L3 agent restart causes network outage
Summary: L3 agent restart causes network outage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-neutron
Version: 4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z4
: 4.0
Assignee: Jakub Libosvar
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-03-26 16:37 UTC by Dave Sullivan
Modified: 2022-07-09 06:16 UTC (History)
11 users (show)

Fixed In Version: openstack-neutron-2013.2.3-4.el6ost
Doc Type: Bug Fix
Doc Text:
Cause: qrouter namespaces were destroyed and recreated during an L3 agent start. Consequence: Ongoing traffic was lost due to missing NAT rules in destroyed namespace. Fix: Namespaces in use are preserved during agent start. Result: Restarting L3 agent has no influence on ongoing traffic via router namespaces.
Clone Of:
Environment:
Last Closed: 2014-05-29 20:19:34 UTC
Target Upstream Version:
Embargoed:
lpeer: needinfo+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1175695 0 None None None Never
OpenStack gerrit 84420 0 None None None Never
Red Hat Product Errata RHSA-2014:0516 0 normal SHIPPED_LIVE Moderate: openstack-neutron security, bug fix, and enhancement update 2014-05-30 00:15:59 UTC

Description Dave Sullivan 2014-03-26 16:37:18 UTC
Description of problem:

L3 Network Drops - Floating IP's are not Accessible

Even if neutron services are restarted on management node, floating IP's are not accessible.

Tenants need to restart their instances and then things work.  

Need to determine cause of initial l3 outage.

Appears to be an upstream BZ noted here

https://bugs.launchpad.net/neutron/+bug/1175695

Version-Release number of selected component (if applicable):

current RHOS 4

Comment 2 Maru Newby 2014-03-28 23:04:17 UTC
The upstream bug looks like the probably cause.  The next step 
is cherry-picking the fix for inclusion in stable/havana and figuring out if we can rely on the next sync or we need to manually backport to RHOS.

Comment 3 Ofer Blaut 2014-03-31 08:42:11 UTC
I Have tested on Havana A3 using distributed system 

openstack-neutron-2013.2.2-5.el6ost.noarch

1. I have stopped L3 agent 
2. qrouter namespace is still up, and traffic to floating ip works 
3. while starting  L3 agent traffic stops for ~ 25 seconds  and resume later

[root@puma05 ~]# ip netns | grep qrouter
qrouter-69cf3535-2960-4b11-8e3a-da37c3331f01
[root@puma05 ~]# service neutron-l3-agent stop
Stopping neutron-l3-agent:                                 [  OK  ]
[root@puma05 ~]# ip netns | grep qrouter
qrouter-69cf3535-2960-4b11-8e3a-da37c3331f01
[root@puma05 ~]# openstack-status 
== neutron services ==
neutron-server:                         inactive  (disabled on boot)
neutron-dhcp-agent:                     active
neutron-l3-agent:                       inactive
neutron-metadata-agent:                 active
neutron-lbaas-agent:                    inactive  (disabled on boot)
neutron-openvswitch-agent:              active
== Support services ==
openvswitch:                            active
messagebus:                             active

Comment 10 Ofer Blaut 2014-04-22 10:33:30 UTC
tested by service neutron-l3-agent restart

I have run ping -ni 0.01 <floating ip>  and no packet is lost 

openstack-neutron-2013.2.3-4.el6ost.noarch

Comment 12 errata-xmlrpc 2014-05-29 20:19:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0516.html


Note You need to log in before you can comment on or make changes to this bug.