Bug 1371948

Summary: Deleting virtual machines with floating IPs causes qrouter interfaces to be deleted and recreated
Product: Red Hat OpenStack Reporter: Robin Cernin <rcernin>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED ERRATA QA Contact: GenadiC <gcheresh>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 8.0 (Liberty)CC: amuller, chrisw, cllewellyn, darryl, ihrachys, jschwarz, majopela, mlopes, nyechiel, oblaut, pneedle, rohara, smulholland, srelf, srevivo
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-7.2.0-7.el7ost Doc Type: Bug Fix
Doc Text:
Previously, the default L3 HA implementation (keepalived) sometimes flipped the master router instance to `backup` if it received multiple SIGHUP signals in quick succession. Consequently, L3 connectivity was disrupted until the previous backup keepalived instance took over. With this update, to work around this keepalived behavior, the neutron L3 agent now throttles SIGHUP signals sent to keepalived to make sure keepalived has enough time to reload configuration without being disrupted by failovers. As a result, L3 connectivity implemented using HA routers is not disrupted after router updates arriving in quick succession (for example, with floating IP updates).
Story Points: ---
Clone Of:
: 1398286 1436359 1436363 (view as bug list) Environment:
Last Closed: 2017-06-20 12:56:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1398286, 1436359, 1436363    

Description Robin Cernin 2016-08-31 13:47:37 UTC
When deleting 1 vm from horizon everything is fine, however when selecting more than 2 vms to be deleted, the connection to the vms hosted on the same qrouter is interrupted for ~4s
 
[heat-admin@overcloud-controller-1 ~]$ nova list
+--------------------------------------+----------+--------+------------+-------------+-------------------------------------+
| ID                                   | Name     | Status | Task State | Power State | Networks                            |
+--------------------------------------+----------+--------+------------+-------------+-------------------------------------+
| 5d9a5238-1872-4b53-8be3-9bdce2563e6e | test1-vm | ACTIVE | -          | Running     | internal=192.168.3.103, 192.0.2.107 |
| 335da0aa-da31-48be-9d66-70c89cc98a04 | test2-vm | ACTIVE | -          | Running     | internal=192.168.3.104, 192.0.2.106 |
| 24e7c7ee-5fe7-4f2b-8076-b63bd5a5d0ce | test3-vm | ACTIVE | -          | Running     | internal=192.168.3.105, 192.0.2.105 |
| 4bf123e9-fef7-4d5f-9666-0a0dcd228e50 | test4-vm | ACTIVE | -          | Running     | internal=192.168.3.106, 192.0.2.104 |
+--------------------------------------+----------+--------+------------+-------------+-------------------------------------+
 
[heat-admin@overcloud-controller-1 ~]$ neutron router-list
----------------------------------------------------------------------------+-------------+------+
| 2e468f1d-4405-41cb-806b-11c172ef256d | router1 | {"network_id": "7176f07c-47ce-4687-9d77-0afa5fad74ff", "enable_snat": true, "external_fixed_ips": [{"subnet_id": "cd07bc48-dfe7-473c-a740-5cc3767b987f", "ip_address": "192.0.2.103"}]} | False       | True |
----------------------------------------------------------------------------+-------------+------+
 
[heat-admin@overcloud-controller-1 ~]$ neutron l3-agent-list-hosting-router router1
+--------------------------------------+------------------------------------+----------------+-------+----------+
| id                                   | host                               | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------+----------------+-------+----------+
| b4b09c17-942c-41d1-baf3-af1e26ae0a6b | overcloud-controller-1.localdomain | True           | :-)   | standby  |
| eb3d2d90-6658-4564-9f5d-7cc958e71d96 | overcloud-controller-0.localdomain | True           | :-)   | standby  |
| eadf9303-7273-4336-ab1e-2b81e916a631 | overcloud-controller-2.localdomain | True           | :-)   | active   |
+--------------------------------------+------------------------------------+----------------+-------+----------+
 
[root@overcloud-controller-2 ~]# ip netns exec qrouter-2e468f1d-4405-41cb-806b-11c172ef256d ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
17: ha-5b2470f2-f2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:e7:d1:77 brd ff:ff:ff:ff:ff:ff
    inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-5b2470f2-f2
       valid_lft forever preferred_lft forever
    inet 169.254.0.1/24 scope global ha-5b2470f2-f2
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee7:d177/64 scope link 
       valid_lft forever preferred_lft forever
18: qg-eb594fa1-23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:50:a1:62 brd ff:ff:ff:ff:ff:ff
    inet 192.0.2.103/24 scope global qg-eb594fa1-23
       valid_lft forever preferred_lft forever
    inet 192.0.2.104/32 scope global qg-eb594fa1-23
       valid_lft forever preferred_lft forever
    inet 192.0.2.105/32 scope global qg-eb594fa1-23
       valid_lft forever preferred_lft forever
    inet 192.0.2.106/32 scope global qg-eb594fa1-23
       valid_lft forever preferred_lft forever
    inet 192.0.2.107/32 scope global qg-eb594fa1-23
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe50:a162/64 scope link nodad 
       valid_lft forever preferred_lft forever
20: qr-21e49ce2-81: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:ae:a6:02 brd ff:ff:ff:ff:ff:ff
    inet 192.168.3.1/24 scope global qr-21e49ce2-81
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feae:a602/64 scope link nodad 
       valid_lft forever preferred_lft forever
 
Monitoring in the namespace
 
heat-admin@overcloud-controller-2 ~]$ sudo ip netns exec qrouter-2e468f1d-4405-41cb-806b-11c172ef256d ip -o monitor address
 
 
Deleted 18: qg-eb594fa1-23    inet 192.0.2.107/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
Deleted 18: qg-eb594fa1-23    inet 192.0.2.106/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
Deleted 17: ha-5b2470f2-f2    inet 169.254.0.1/24 scope global ha-5b2470f2-f2\       valid_lft forever preferred_lft forever
Deleted 18: qg-eb594fa1-23    inet 192.0.2.103/24 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
Deleted 18: qg-eb594fa1-23    inet 192.0.2.104/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
Deleted 18: qg-eb594fa1-23    inet 192.0.2.105/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
Deleted 20: qr-21e49ce2-81    inet 192.168.3.1/24 scope global qr-21e49ce2-81\       valid_lft forever preferred_lft forever
Deleted 18: qg-eb594fa1-23    inet6 fe80::f816:3eff:fe50:a162/64 scope link nodad \       valid_lft forever preferred_lft forever
Deleted 20: qr-21e49ce2-81    inet6 fe80::f816:3eff:feae:a602/64 scope link nodad \       valid_lft forever preferred_lft forever
17: ha-5b2470f2-f2    inet 169.254.0.1/24 scope global ha-5b2470f2-f2\       valid_lft forever preferred_lft forever
18: qg-eb594fa1-23    inet 192.0.2.103/24 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
18: qg-eb594fa1-23    inet 192.0.2.104/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
18: qg-eb594fa1-23    inet 192.0.2.105/32 scope global qg-eb594fa1-23\       valid_lft forever preferred_lft forever
20: qr-21e49ce2-81    inet 192.168.3.1/24 scope global qr-21e49ce2-81\       valid_lft forever preferred_lft forever
18: qg-eb594fa1-23    inet6 fe80::f816:3eff:fe50:a162/64 scope link nodad \       valid_lft forever preferred_lft forever
20: qr-21e49ce2-81    inet6 fe80::f816:3eff:feae:a602/64 scope link nodad \       valid_lft forever preferred_lft forever
 
Initiated ping to test4-vm
 
[root@overcloud-controller-2 ~]# ping 192.0.2.104
PING 192.0.2.104 (192.0.2.104) 56(84) bytes of data.
64 bytes from 192.0.2.104: icmp_seq=1 ttl=64 time=2.83 ms
64 bytes from 192.0.2.104: icmp_seq=2 ttl=64 time=0.893 ms
64 bytes from 192.0.2.104: icmp_seq=3 ttl=64 time=0.499 ms
64 bytes from 192.0.2.104: icmp_seq=4 ttl=64 time=0.640 ms
64 bytes from 192.0.2.104: icmp_seq=5 ttl=64 time=0.644 ms
...
64 bytes from 192.0.2.104: icmp_seq=116 ttl=64 time=0.642 ms
64 bytes from 192.0.2.104: icmp_seq=117 ttl=64 time=0.922 ms
64 bytes from 192.0.2.104: icmp_seq=118 ttl=64 time=12.7 ms
64 bytes from 192.0.2.104: icmp_seq=119 ttl=64 time=0.515 ms
64 bytes from 192.0.2.104: icmp_seq=120 ttl=64 time=1.69 ms
ping: sendmsg: Network is unreachable <=
ping: sendmsg: Network is unreachable <= This is when the other 2 vms are del 
ping: sendmsg: Network is unreachable <=
ping: sendmsg: Network is unreachable <=
64 bytes from 192.0.2.104: icmp_seq=125 ttl=64 time=2.36 ms
64 bytes from 192.0.2.104: icmp_seq=126 ttl=64 time=0.746 ms
64 bytes from 192.0.2.104: icmp_seq=127 ttl=64 time=1.11 ms
64 bytes from 192.0.2.104: icmp_seq=128 ttl=64 time=0.646 ms
64 bytes from 192.0.2.104: icmp_seq=129 ttl=64 time=0.661 ms

Comment 2 Robin Cernin 2016-08-31 13:48:37 UTC
Also this environment setup is 

[stack@undercloud ~]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| 2d9c97b8-dc5b-4c97-bb0f-50427c91ab90 | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.7  |
| 2a0f9a9d-983e-47b4-9127-b6152996b71e | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
| 60937186-a959-427f-b9eb-d7abcfca54c7 | overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
| d2a6a304-bc20-4a44-b698-9bcea58172a7 | overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.10 |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+

Comment 3 Robin Cernin 2016-08-31 13:50:04 UTC
Router setup HA, no DVR here.

Comment 4 John Schwarz 2016-08-31 13:51:16 UTC
The VMs were deleted through Horizon (so concurrently is a safe bet).

Can we get the configuration files and the neutron logs for all neutron processes in all nodes, please? :)

Comment 7 Charlie Llewellyn 2016-09-01 08:52:19 UTC
As an additional note to this it also occurs if you delete VMs with floating IPs sequentially within a short period via the API.

Comment 9 John Schwarz 2016-09-08 13:42:27 UTC
Observations:

1. For some reason, when this occurs, the ha- interface loses its 169.254.0.1/24 for 4-5 seconds, and then it gets it back. l3 agent/openvswitch logs doesn't indicate either of them doing the actual deletion, so it's probably hidden somewhere in the code.
2. An easier reproduction can be to simply disassociate the floating IP from the instances. No need to delete the entire instance.
3. This seems to be occurring only when disassociating floating IPs from 2 more more instances - one is not enough. Also, when putting some kind of sleep between them, this doesn't reproduce. This hints at some race condition.
4. the state-change log indicates this also occurs when associating floating IPs, and not only on disassociation.

Comment 10 John Schwarz 2016-09-08 14:55:35 UTC
The cause for Keepalived switching into backup (failing over) is because the specified actions (association and disassociation of floating ips) change the configuration of keepalived, and in addition sends SIGHUP to it. If 2 or more SIGHUPs are sent to keepalived in sequence, it will go into BACKUP (removing IP interfaces from the router's resources, which causes the connectivity loss) and restart VRRP negotiation.

An approach we were thinking on Neutron's side is throttling SIGHUP to only send 1 every X seconds. Before we start working on this, it will be good to hear from Ryan if is there's any way to mitigate this on Keepalived's side?

Comment 11 John Schwarz 2016-09-08 14:56:54 UTC
Also, it's important to note that manually sending SIGHUP to a keepalived process twice in a row also triggers this go-to-BACKUP state.

Comment 12 Ryan O'Hara 2016-09-08 16:01:13 UTC
(In reply to John Schwarz from comment #10)
> The cause for Keepalived switching into backup (failing over) is because the
> specified actions (association and disassociation of floating ips) change
> the configuration of keepalived, and in addition sends SIGHUP to it. If 2 or
> more SIGHUPs are sent to keepalived in sequence, it will go into BACKUP
> (removing IP interfaces from the router's resources, which causes the
> connectivity loss) and restart VRRP negotiation.

What is changing in keepalived.conf when you do this?

> An approach we were thinking on Neutron's side is throttling SIGHUP to only
> send 1 every X seconds. Before we start working on this, it will be good to
> hear from Ryan if is there's any way to mitigate this on Keepalived's side?

First, I don't think throttling the frequency of SIGHUP is going to help in the long run. If keepalived is being signalled with SIGHUP multiple times in quick succession, the service is not going to be sending VRRP advertisements (since it is restarting), meaning another node will take over. Related question, does keepalived.conf have 'nopreempt' keyword declared?

Comment 13 Ryan O'Hara 2016-09-08 20:20:50 UTC
After much discussion and quite a bit o investigation, here is my assessment of how keepalived is behaving:

First some background. The keepalived master node will periodically send VRRP advertisements, as configured by 'advert_int'. When a backup node does not receive a VRRP advertisement within this interval (plus some skew time), a failover occurs. Now, the other important detail is that the master keepalived node will no send VRRP advertisements while in a signal handler.

When a keepalived node in the master state received a SIGHUP, it will block other signals while is processes the SIGHUP. While the SIGHUP signal handler is being executed, the VRRP advertisements stop. If just one signal is handled, chances are good that the node will complete the signal handler and resume sending VRRP advertisements before the backup node begin a new election. Conversely, if the signal handler takes too long the backup node(s) will not receive the advertisement on time and force a new master election. If multiple signals are received in quick succession, they are effectively queued and the signal handler will be executed once per signal, serially. This causes long(er) time where the master is not sending advertisements since it is busy (overwhelmed) handling signals.

Note that short advertisement intervals and/or multiple SIGHUPs will increase the likelihood of triggering a failover in this manner.

Comment 14 John Schwarz 2016-09-25 11:45:55 UTC
I've dug into this again. I noticed that in keepalived v1.2.20 this no longer happens (and in all versions up until then it does). The patch which (I believe) prevents this behaviour is https://github.com/acassen/keepalived/commit/6b20916a.

It's important to note that I compiled and manually installed keepalived from sources to check this, as el7 only has 1.2.13. So the 'fix' (if this can be called a fix) is not available in rhel7 at all (yet?).

Assaf, how do you want to proceed with this, given this new information?

Comment 17 Ryan O'Hara 2016-09-26 22:40:05 UTC
After looking more closely at the patch referenced in comment #14, I fail to see how this patch will address the issue. The patch will do two things:

1. If the priority is set to 255 and nopreempt is *not* set, the internal state variable will be tweaked. My understanding is that the Neutron L3 HA agent does use 'nopreempt' and the priority is not set to 255. Please correct me if I am wrong.

2. If nopreempt is set and the default state is set to MASTER, the code will print a warning. I believe you have state set to BACKUP, so you will not see this warning.

I'm questioning if this patch will in fact fix the problem you're seeing.

Comment 28 GenadiC 2017-06-06 08:45:43 UTC
Verified in openstack-neutron-7.2.0-10.el7ost.noarch
Created 5 VMs with floating IPs, 
Ping to one of them while removing all other VMs worked without interruption

Comment 30 errata-xmlrpc 2017-06-20 12:56:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1540