Bug 1817988

Summary: [IPI baremetal] restart keepalived when default route is changed
Product: OpenShift Container Platform Reporter: Petr Horáček <phoracek>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Status: CLOSED NEXTRELEASE QA Contact: Victor Voronkov <vvoronko>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: achernet, amurdaca, asegurap, augol, bperkins, danken, kgarriso, miabbott, smilner, vvoronko, wsun, wzheng, xtian, yboaron
Target Milestone: ---   
Target Release: 4.4.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1751978 Environment:
Last Closed: 2020-07-02 07:45:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1751978    
Bug Blocks: 1741265    

Comment 1 Petr Horáček 2020-03-27 11:47:49 UTC
I observed the issue from https://bugzilla.redhat.com/show_bug.cgi?id=1751978 again on OpenShift 4.4.

I reconfigured the host, so the interface that was originally carrying VIPs is attached to an OVS bridge.

While the keepalived-monitor observed the change and rendered new config, keepalived container failed with 'permanent error CONFIG'.

I have a cluster available in case you want to debug it there.

keepalived-monitor logs:
time="2020-03-27T10:54:03Z" level=info msg="Config change detected" new config="{{ostest test.metalkube.org 192.168.111.5 14 192.168.111.2 10 192.168.111.4 93 24 0 } {0 0 0 [] } 192.168.111.23 worker-0  brcnv [1
92.168.111.1]}"                                                                                                                                                                                                   
time="2020-03-27T10:54:03Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf

keepalived.conf:
vrrp_script chk_ingress {
    script "/usr/bin/curl -o /dev/null -kLs http://0:1936/healthz"
    interval 1
    weight 50
}

vrrp_instance ostest_INGRESS {
    state BACKUP
    interface brcnv
    virtual_router_id 93
    priority 40
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass cluster_uuid_ingress_vip
    }
    virtual_ipaddress {
        192.168.111.4/24
    }
    track_script {
        chk_ingress
    }
}

keepalived logs:
The client sent: reload
Opening file '/etc/keepalived/keepalived.conf'.
Stopped
Keepalived_vrrp exited with permanent error CONFIG. Terminating
Stopping
Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2

Comment 2 Victor Voronkov 2020-03-29 17:50:09 UTC
Deployed a cluster with 4.4.0-rc.4 and tested the same command sequence there:

nmcli con add type bridge ifname br10
sudo nmcli con add type bridge-slave ifname enp5s0 master br10
sudo nmcli con up bridge-slave-enp5s0

keepalived.conf rendered with interface br10 and container restarted, no errors detected, so...

works for me

Comment 3 Petr Horáček 2020-03-30 07:56:36 UTC
Have it applied VIPs on the new interface br10?

Comment 5 Steve Milner 2020-04-08 13:55:32 UTC
Is this meant to for 4.4?

Comment 6 Kirsten Garrison 2020-04-08 17:44:40 UTC
This is duped from a 4.5 bz so this should probably be the 4.4 bz... @Yossi can you confirm?

Comment 7 Yossi Boaron 2020-04-12 13:48:06 UTC
Well, this bug's target release should be 4.5, the original bug's (the one this bug cloned from) target release was 4.2.

Comment 8 Antonio Murdaca 2020-04-15 08:33:18 UTC
(In reply to Yossi Boaron from comment #7)
> Well, this bug's target release should be 4.5, the original bug's (the one
> this bug cloned from) target release was 4.2.

Nope, the 4.5 BZ is the one you cloned from and it's already VERIFIED so I can't see how you would ship another fix to that, this must target something else, lower than 4.5, setting 4.4 but fix it please.