Bug 1751978

Summary: [IPI baremetal] restart keepalived when default route is changed
Product: OpenShift Container Platform Reporter: Petr Horáček <phoracek>
Component: Machine Config OperatorAssignee: Antoni Segura Puimedon <asegurap>
Status: CLOSED ERRATA QA Contact: Victor Voronkov <vvoronko>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.2.0CC: achernet, asegurap, augol, bperkins, miabbott, smilner, wsun, wzheng, xtian
Target Milestone: ---Flags: dkholodo: needinfo-
vvoronko: needinfo-
vvoronko: needinfo-
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Keepalived 1.x binds to specific interfaces. It is possible for users to dynamically put interfaces in bonds or bridges. Consequence: When the interface keepalived uses is bridged, keepalived will be unable to resume operation, disrupting Virtual IP management Fix: Monitor interface changes and reload keepalived so it reads the new configuration. Result: Virtual IP management operates with minimal disruption.
Story Points: ---
Clone Of:
: 1817988 (view as bug list) Environment:
Last Closed: 2020-07-13 17:11:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1741265, 1817988    

Description Petr Horáček 2019-09-13 10:51:19 UTC
Description of problem:
When the default route is changed (from NIC to a linux bridge), keepalived doesn't update its configuration (move its IPs on top of the bridge). Due to that, we lose connectivity to API server.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Move IP from NIC to bridge

Actual results:
Bridge doesn't get KeepAlive IPs. Connectivity to the API server is lost.

Expected results:
API server is still reachable. KeepAlive IPs are moved to the bridge.

Additional notes:
Antoni Segura Puimedon suggested adding liveness probe to the KeepAlive pod, restarting and reconfiguring if linked interface doesn't have an IP anymore.

Comment 9 Steven Hardy 2019-12-13 11:18:54 UTC
*** Bug 1744560 has been marked as a duplicate of this bug. ***

Comment 15 Victor Voronkov 2020-01-30 05:17:02 UTC
Bug fix verified by setting linux bridge over the external interface:

nmcli con add type bridge ifname br10
nmcli con add type bridge-slave ifname ens4 master br10
nmcli con up bridge-slave-ens4

As expected, keepalived monitor observed the network change and rendered valid config file,
meanwhile DNS_VIP and INGRESS_VIP migrated to other host in the cluster

Comment 16 Petr Horáček 2020-03-27 11:13:37 UTC
I observed this issue again on OpenShift 4.4.

I reconfigured the host, so the interface that was originally carrying VIPs is attached to an OVS bridge.

While the keepalived-monitor observed the change and rendered new config, keepalived container failed with 'permanent error CONFIG'.

I have a cluster available in case you want to debug it there.

keepalived-monitor logs:
time="2020-03-27T10:54:03Z" level=info msg="Config change detected" new config="{{ostest test.metalkube.org 192.168.111.5 14 192.168.111.2 10 192.168.111.4 93 24 0 } {0 0 0 [] } 192.168.111.23 worker-0  brcnv [1
92.168.111.1]}"                                                                                                                                                                                                   
time="2020-03-27T10:54:03Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf

keepalived.conf:
vrrp_script chk_ingress {
    script "/usr/bin/curl -o /dev/null -kLs http://0:1936/healthz"
    interval 1
    weight 50
}

vrrp_instance ostest_INGRESS {
    state BACKUP
    interface brcnv
    virtual_router_id 93
    priority 40
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass cluster_uuid_ingress_vip
    }
    virtual_ipaddress {
        192.168.111.4/24
    }
    track_script {
        chk_ingress
    }
}

keepalived logs:
The client sent: reload
Opening file '/etc/keepalived/keepalived.conf'.
Stopped
Keepalived_vrrp exited with permanent error CONFIG. Terminating
Stopping
Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2

Comment 18 Petr Horáček 2020-03-27 11:43:46 UTC
Sorry for the noise. I will clone this instead.

Comment 21 errata-xmlrpc 2020-07-13 17:11:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409