Bug 1751978 - [IPI baremetal] restart keepalived when default route is changed
Summary: [IPI baremetal] restart keepalived when default route is changed
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
URL:
Whiteboard:
: 1744560 (view as bug list)
Depends On:
Blocks: 1741265 1817988
TreeView+ depends on / blocked
 
Reported: 2019-09-13 10:51 UTC by Petr Horáček
Modified: 2020-07-13 17:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Keepalived 1.x binds to specific interfaces. It is possible for users to dynamically put interfaces in bonds or bridges. Consequence: When the interface keepalived uses is bridged, keepalived will be unable to resume operation, disrupting Virtual IP management Fix: Monitor interface changes and reload keepalived so it reads the new configuration. Result: Virtual IP management operates with minimal disruption.
Clone Of:
: 1817988 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:11:28 UTC
Target Upstream Version:
dkholodo: needinfo-
vvoronko: needinfo-
vvoronko: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 20 0 'None' closed Add dynkeepalived to solve interface changes 2021-02-19 14:34:02 UTC
Github openshift machine-config-operator pull 1124 0 'None' closed Bug 1751978: templates/baremetal: Fix keepalived dysfunction on vrrp iface change 2021-02-19 14:34:02 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:11:52 UTC

Description Petr Horáček 2019-09-13 10:51:19 UTC
Description of problem:
When the default route is changed (from NIC to a linux bridge), keepalived doesn't update its configuration (move its IPs on top of the bridge). Due to that, we lose connectivity to API server.


Version-Release number of selected component (if applicable):


How reproducible:
Always


Steps to Reproduce:
1. Move IP from NIC to bridge

Actual results:
Bridge doesn't get KeepAlive IPs. Connectivity to the API server is lost.

Expected results:
API server is still reachable. KeepAlive IPs are moved to the bridge.

Additional notes:
Antoni Segura Puimedon suggested adding liveness probe to the KeepAlive pod, restarting and reconfiguring if linked interface doesn't have an IP anymore.

Comment 9 Steven Hardy 2019-12-13 11:18:54 UTC
*** Bug 1744560 has been marked as a duplicate of this bug. ***

Comment 15 Victor Voronkov 2020-01-30 05:17:02 UTC
Bug fix verified by setting linux bridge over the external interface:

nmcli con add type bridge ifname br10
nmcli con add type bridge-slave ifname ens4 master br10
nmcli con up bridge-slave-ens4

As expected, keepalived monitor observed the network change and rendered valid config file,
meanwhile DNS_VIP and INGRESS_VIP migrated to other host in the cluster

Comment 16 Petr Horáček 2020-03-27 11:13:37 UTC
I observed this issue again on OpenShift 4.4.

I reconfigured the host, so the interface that was originally carrying VIPs is attached to an OVS bridge.

While the keepalived-monitor observed the change and rendered new config, keepalived container failed with 'permanent error CONFIG'.

I have a cluster available in case you want to debug it there.

keepalived-monitor logs:
time="2020-03-27T10:54:03Z" level=info msg="Config change detected" new config="{{ostest test.metalkube.org 192.168.111.5 14 192.168.111.2 10 192.168.111.4 93 24 0 } {0 0 0 [] } 192.168.111.23 worker-0  brcnv [1
92.168.111.1]}"                                                                                                                                                                                                   
time="2020-03-27T10:54:03Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf

keepalived.conf:
vrrp_script chk_ingress {
    script "/usr/bin/curl -o /dev/null -kLs http://0:1936/healthz"
    interval 1
    weight 50
}

vrrp_instance ostest_INGRESS {
    state BACKUP
    interface brcnv
    virtual_router_id 93
    priority 40
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass cluster_uuid_ingress_vip
    }
    virtual_ipaddress {
        192.168.111.4/24
    }
    track_script {
        chk_ingress
    }
}

keepalived logs:
The client sent: reload
Opening file '/etc/keepalived/keepalived.conf'.
Stopped
Keepalived_vrrp exited with permanent error CONFIG. Terminating
Stopping
Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2

Comment 18 Petr Horáček 2020-03-27 11:43:46 UTC
Sorry for the noise. I will clone this instead.

Comment 21 errata-xmlrpc 2020-07-13 17:11:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.