Bug 1391553

Summary: keepalived doesn't send gratuitous ARP on receiving SIGHUP
Product: Red Hat Enterprise Linux 7 Reporter: Jakub Libosvar <jlibosva>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Brandon Perkins <bperkins>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.3CC: cfeist, cluster-maint, djansa, ihrachys, jlibosva, rohara
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: keepalived-1.3.4-1.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1394291 (view as bug list) Environment:
Last Closed: 2017-08-01 19:36:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1419049    
Bug Blocks: 1394291    

Description Jakub Libosvar 2016-11-03 14:24:16 UTC
Description of problem:
We run multiple keepalived instances on several nodes, one node always is master for given keepalived instance. All instances have multiple VIPs in their configuration files, but VIPs are active only on master instance. We send SIGHUP when configuration file changes (VIP is added or removed) but keepalived doesn't send gratuitous ARP for added VIPs. That causes issues if VIP has been previously used and ARP caches of clients weren't invalidated yet.

Version-Release number of selected component (if applicable):
keepalived-1.2.13-8.el7

How reproducible:
100%

Steps to Reproduce:
1. Configure VIPs in keepalived configuration file
2. Send SIGHUP to parent process of keepalived
3.

Actual results:


Expected results:
Gratuitous ARPs are send

Additional info:
See also bug 1386718

Comment 1 Ihar Hrachyshka 2016-11-03 14:57:43 UTC
The issue goes as follows.

First, a Neutron HA router is created. It spawns keepalived with
garp_master_delay = 60, that makes it issue GARP requests for configured VIPs
in a minute; network ARP cache is properly updated with relevant MAC addresses.

Then, one of the addresses is removed from the router (the configuration file
is updated; SIGHUP sent). At this point, network participants still believe the
IP address is served by the router because the relevant ARP cache entry is not
invalidated.

Now, another Neutron HA router is created. It spins up a new keepalived process.

The VIP address previously removed from the first router is now assigned to the
newly created HA router. It makes Neutron update keepalived configuration file
with the new address, and SIGHUP the process.

At this point, we expect that the second keepalived will detect a new VIP
address added to the configuration file, and then it will send a GARP request
for the new address including the new MAC address of the serving router. This
should make all network participants update their ARP cache. But tcpdump
executed while sending the signal to the second keepalived does not show any
attempt to send GARPs.

Since neither _refresh nor _repeat is set, the process is not supposed to
update the network periodically.

If we actually kill the second keepalived process and restart it with the same
file, then it correctly sends GARP requests for all served addresses (after
garp_master_delay).

The failure mode observed on Neutron side is that a floating IP disassociated
from one HA router and then associated to another HA router looses connectivity
until network participants invalidate their ARP cache entries for the address. In the
meantime, attempts to SSH or ping the address fail.

Comment 2 Ryan O'Hara 2016-11-04 13:30:14 UTC
Please provide an example of the config file and indicate what is changing (VIP added/remove from config file) and which nodes are receiving the change. Thanks.

Comment 3 Jakub Libosvar 2016-11-04 13:46:42 UTC
(In reply to Ryan O'Hara from comment #2)
> Please provide an example of the config file and indicate what is changing
> (VIP added/remove from config file) and which nodes are receiving the
> change. Thanks.

Config file example:
vrrp_instance VR_1 {
    state BACKUP
    interface ha-fd3acedb-be
    virtual_router_id 1
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    track_interface {
        ha-fd3acedb-be
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-fd3acedb-be
    }
    virtual_ipaddress_excluded {
        10.0.0.211/32 dev qg-394e2eaf-20
        10.0.0.213/24 dev qg-394e2eaf-20
        10.100.0.1/28 dev qr-76579cea-55
        fe80::f816:3eff:fec7:e5b1/64 dev qg-394e2eaf-20 scope link
        fe80::f816:3eff:fed1:503f/64 dev qr-76579cea-55 scope link
    }
    virtual_routes {
        0.0.0.0/0 via 10.0.0.1 dev qg-394e2eaf-20
    }


Added line is - 10.0.0.213/24 dev qg-394e2eaf-20
It's added to all nodes.

Comment 4 Jakub Libosvar 2016-11-09 11:42:24 UTC
Should we keep this bug open as tracker for backports or tracker for packaging new version of keepalived?

Comment 5 Ryan O'Hara 2016-11-10 17:58:16 UTC
With keepalived-1.2.13-9.el7.x86_64 installed, start keepalived and watch the logs (or tcpdump on the interface where the VIP(s) will live):

# systemctl start keepalived
# journalctl -afu keepalived

Now send SIGUP to the parent process. Note that this process should be in MASTER state.

# kill -HUP $(cat /var/run/keepalived.pid)

In this case there are three VIPs associated with the VRID in MASTER state. The logs should show that a gratuitous ARP is sent for each VIP:

Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.203
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:01 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: VRRP_Instance(VRRP) Sending/queueing gratuitous ARPs on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.201
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.202
Nov 10 11:55:06 mesa-01 Keepalived_vrrp[46237]: Sending gratuitous ARP on br2 for 192.168.102.203

If you modify the virtual_ipaddress list to add/remove a VIP from the VRID, it will pick up the changes correctly.

Also note that when you send SIGHUP to keepalived in MASTER state, the existing VIPs are removed and recreate on the interface. This is by design.

Comment 9 errata-xmlrpc 2017-08-01 19:36:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2169