Bug 1792160

Summary: keepalived 2.0.10 goes into FAULT STATE when an interface is renamed
Product: Red Hat Enterprise Linux 8 Reporter: Gregory Thiemonge <gthiemon>
Component: keepalivedAssignee: Ryan O'Hara <rohara>
Status: CLOSED ERRATA QA Contact: Brandon Perkins <bperkins>
Severity: high Docs Contact:
Priority: high    
Version: 8.1CC: atragler, cfeist, cgoncalves, cluster-maint, redhat-bugzilla, rohara
Target Milestone: rcKeywords: ZStream
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: keepalived-2.0.10-9.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1801895 (view as bug list) Environment:
Last Closed: 2020-04-28 16:05:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1792157, 1801895    

Description Gregory Thiemonge 2020-01-17 07:50:37 UTC
Description of problem:
Openstack Octavia project uses keepalived with VRRP in VMs to enable HA in load balancers.

keepalived runs in a namespace and use eth1 as VRRP port.
The octavia amphora-agent updates the namespace by adding new interfaces and renaming interfaces (these interfaces are used by haproxy). These interfaces are not related to VRRP and are not part of keepalived configuration.
But when keepalived detects that an interface has been renamed, it goes into FAUL STATE.

Version-Release number of selected component (if applicable):
RHEL 8.1/keepalived 2.0.10

How reproducible:
100%

Steps to Reproduce:

I reproduced the issue using one Octavia Load Balancer with HA enabled, one listener and some iproute commands that simulate what amphora-agent does:

Connect into the MASTER amphora, check VRRP ip address and VRRP state in the logs -> everything is ok

bash-4.4# ip -n amphora-haproxy a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1
       valid_lft forever preferred_lft forever
    inet 10.0.0.161/32 scope global eth1
       valid_lft forever preferred_lft forever

bash-4.4# journalctl -le | grep Keep
[..]
Jan 15 14:56:25 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering BACKUP STATE
Jan 15 14:56:28 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) received lower priority (90) advert from 10.0.0.173 - discarding
Jan 15 14:56:29 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering MASTER STATE


Create a dummy interface, move it to amphora-haproxy ns, check VRRP ip address -> still ok

bash-4.4# ip link add veth0 type veth peer veth1
bash-4.4# ip link set veth1 up
bash-4.4# ip link set veth1 netns amphora-haproxy
bash-4.4# ip -n amphora-haproxy a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1
       valid_lft forever preferred_lft forever
    inet 10.0.0.161/32 scope global eth1
       valid_lft forever preferred_lft forever
4: veth1@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:0f:5b:55:44:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0


Rename the dummy interface, check VRRP ip address and the logs -> address has been removed, keepalived is in FAULT STATE

bash-4.4# ip -n amphora-haproxy link set veth1 name foo0
bash-4.4# ip -n amphora-haproxy a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether fa:16:3e:1f:f5:b5 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.189/26 brd 10.0.0.191 scope global eth1
       valid_lft forever preferred_lft forever
4: foo0@if5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether fa:0f:5b:55:44:f5 brd ff:ff:ff:ff:ff:ff link-netnsid 0

bash-4.4# journalctl -le | grep Keep
[..]
Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: Interface name has changed from veth1 to foo0
Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) Entering FAULT STATE
Jan 15 16:31:27 amphora-f3af9d57-790c-4471-ae1f-82ee48ec2f68.novalocal Keepalived_vrrp[1346]: (2877477fd523485ebb7edbb2bf0967e4) sent


Actual results:
keepalived goes into FAULT STATE, VRRP ip address is disabled

Expected results:
renaming an unused interface should not trigger anything in keepalived

Additional info:
I found out that the bug is not present in keepalived>=2.0.11 and that the commit that fixes the issue is https://github.com/acassen/keepalived/commit/30eeb48b1a0737dc7443fd421fd6613e0d55fd17

Can we backport this commit in 2.0.10?

Comment 3 Ryan O'Hara 2020-01-30 21:35:54 UTC
Here is a scratch build with the recommended patch. Please test ASAP. Assuming it works, I will do a proper build for 8.2 and 8.1.z.

https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=26135578

Comment 4 Gregory Thiemonge 2020-01-31 14:18:08 UTC
Ryan, I've tested the package and I confirm that this scratch build fixes the issue.
Thanks

Comment 9 errata-xmlrpc 2020-04-28 16:05:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1753