1181592 – Any change to floating ips causes connectivity issues using HA routers

Bug 1181592 - Any change to floating ips causes connectivity issues using HA routers

Summary: Any change to floating ips causes connectivity issues using HA routers

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-neutron
Sub Component:
Version:	6.0 (Juno)
Hardware:	Unspecified
OS:	Unspecified
Priority:	low
Severity:	low
Target Milestone:	rc
Target Release:	10.0 (Newton)
Assignee:	Miguel Angel Ajo
QA Contact:	Toni Freger
Docs Contact:
URL:
Whiteboard:
Depends On:	1181107
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-13 12:35 UTC by Miguel Angel Ajo
Modified:	2016-12-14 15:26 UTC (History)
CC List:	12 users (show)
Fixed In Version:	openstack-neutron-9.1.0-1.el7ost
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1181796 (view as bug list)
Environment:
Last Closed:	2016-12-14 15:26:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1511722	None	None	None	Never
OpenStack gerrit	343312	None	None	None	2016-09-16 11:36:13 UTC
OpenStack gerrit	377730	None	None	None	2016-10-03 12:55:46 UTC
Red Hat Product Errata	RHEA-2016:2948	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 10 enhancement update	2016-12-14 19:55:27 UTC

Description Miguel Angel Ajo 2015-01-13 12:35:05 UTC

Description of problem:

A design problem in keepalived makes unnecessary DNS requests, during configuration reloads.

If the network node configured DNS server is not accesible on the qrouter-* namespace (the external network), then keepalived will get stuck for ~60 seconds on every floating IP change related to the router being served by the specific keepalived.

The MASTER server will flap around between HA routers, causing connectivity issues with the external network for 1-2 minutes.


Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. use a DNS server in the management network in the network nodes
2. create a router (with l3 ha routers enabled)
3. assign a FIP to an instance
4. ping the FIP, or the router IP
5. assign another FIP to a different instance, or make any modification to floating IPs over the same router.


Actual results:

The external connectivity breaks for 1-2 minutes.

Expected results:

The external connectivity works without issue, the MASTER router doesn't flap.

Additional info:

Comment 1 Miguel Angel Ajo 2015-01-13 12:36:25 UTC

The related keepalived bug is:

https://bugzilla.redhat.com/show_bug.cgi?id=1181107

And we have a workaround:
 echo 127.0.0.1 $( hostname ) >>/etc/hosts

Comment 5 lpeer 2015-01-13 18:46:15 UTC

The impacts of this bug is that VRRP is not working and we do not have HA solution for OSP-6.

The short term fix should be done as described in comment 1 by the installer.
adding  echo 127.0.0.1 $( hostname ) >>/etc/hosts in the network/controller nodes

And the long term fix is in keepalive (bz 1181107) in combination of Neutron who has to provide the router_id parameter.

Comment 6 Ryan O'Hara 2015-01-13 19:06:57 UTC

I'd argue that this does not depend on BZ#1181107 since you have a valid workaround.

Comment 7 Miguel Angel Ajo 2015-01-14 15:37:49 UTC

(In reply to Ryan O'Hara from comment #6)
> I'd argue that this does not depend on BZ#1181107 since you have a valid
> workaround.

We're using the bz tracker to handle the proper fix, the workaround is being done in deployment.


Btw, and Important, after talking with Fabio di Nitto, we found 
the provided workaround into the first comment is risky:

Use this one, otherwise it could cause problems to pacemaker:

dig A $(hostname) | grep -A1 "ANSWER SEC" | tail -n 1 | awk '{print $NF " " $1}' | sed -e 's/.$//g'  >>/etc/hosts ;   grep $(hostname) /etc/hosts || echo "Failure setting up the hostname entry"

Comment 8 lpeer 2015-01-29 17:51:31 UTC

The workaround is available via staypuft, currently we are waiting for the right' fix in keepalive, lowering the priority.

Comment 9 lpeer 2015-03-09 14:38:57 UTC

Pushing this one a bit, the fix is dependant on keepalived fix which is not available yet.

Comment 10 Toni Freger 2015-06-24 15:16:12 UTC

Hi Livnat/Meguel,

We need to prioritize this bug in order to verify the older one https://bugzilla.redhat.com/show_bug.cgi?id=1181107

Thanks

Comment 11 Giulio Fidente 2015-06-24 15:56:49 UTC

trying to apply this to tripleo; from what I understand the workaround is in comment #1 :

  echo 127.0.0.1 $( hostname ) >>/etc/hosts

does the mapping for `hostname` have to be against 127.0.0.1 or can it be against any, valid, local ip?

Comment 12 Miguel Angel Ajo 2015-06-25 12:38:59 UTC

(In reply to Giulio Fidente from comment #11)
> trying to apply this to tripleo; from what I understand the workaround is in
> comment #1 :
> 
>   echo 127.0.0.1 $( hostname ) >>/etc/hosts
> 
> does the mapping for `hostname` have to be against 127.0.0.1 or can it be
> against any, valid, local ip?

Use this workaround (or modified) better:

dig A $(hostname) | grep -A1 "ANSWER SEC" | tail -n 1 | awk '{print $NF " " $1}' | sed -e 's/.$//g'  >>/etc/hosts ;   grep $(hostname) /etc/hosts || echo "Failure setting up the hostname entry"

It will work better with a valid IP.

(In reply to Toni Freger from comment #10)
> Hi Livnat/Meguel,
> 
> We need to prioritize this bug in order to verify the older one
> https://bugzilla.redhat.com/show_bug.cgi?id=1181107
> 
> Thanks

Hi Toni, will do after the neutron mid-cycle sprint in Israel. Thanks.

Comment 14 Quentin Armitage 2016-07-29 19:48:31 UTC

keepalived commit https://github.com/acassen/keepalived/commit/9d028acd327e722e6692eaa9d47e3914e16edf3a significantly reduces the number of DNS requests made, and also added an option that allows no DNS requets to be made.

Comment 16 Miguel Angel Ajo 2016-10-03 12:55:46 UTC

This merged upstream https://review.openstack.org/#/c/343312/

And I've proposed it for OSP10, but will be merged after final upstream release.

Comment 19 Toni Freger 2016-11-21 08:53:19 UTC

Tested on OpenStack/10.0-RHEL-7/2016-11-19.4/RH7-RHOS-10.0/
openstack-neutron-9.1.0-5.el7ost.src.rpm 
With 3 controllers and 1 compute

Steps to Reproduce:
1. use a DNS server in the management network in the network nodes
2. create a router (with l3 ha routers enabled)
3. assign a FIP to an instance
4. ping the FIP, or the router IP
5. assign another FIP to a different instance, or make any modification to floating IPs over the same router.

The issue didn't reproduce, connectivity to the VM remain stable.
The active router didn't flip.

Comment 20 Miguel Angel Ajo 2016-11-21 09:01:54 UTC

(In reply to Toni Freger from comment #19)
> Tested on OpenStack/10.0-RHEL-7/2016-11-19.4/RH7-RHOS-10.0/
> openstack-neutron-9.1.0-5.el7ost.src.rpm 
> With 3 controllers and 1 compute
> 
> Steps to Reproduce:
> 1. use a DNS server in the management network in the network nodes
> 2. create a router (with l3 ha routers enabled)
> 3. assign a FIP to an instance
> 4. ping the FIP, or the router IP
> 5. assign another FIP to a different instance, or make any modification to
> floating IPs over the same router.
> 
> The issue didn't reproduce, connectivity to the VM remain stable.
> The active router didn't flip.

Pinging you on IRC for just in case,

Make sure that it was verified also while removing the workaround (see Comment 12) that OSPD sets on the system.

Comment 22 errata-xmlrpc 2016-12-14 15:26:14 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html

Note You need to log in before you can comment on or make changes to this bug.