Bug 1885867

Summary: [IPI baremetal] Post upgrade Keepalived mode flip to unicast isn't synced
Product: OpenShift Container Platform Reporter: Yossi Boaron <yboaron>
Component: Machine Config OperatorAssignee: Yossi Boaron <yboaron>
Machine Config Operator sub component: platform-baremetal QA Contact: Aleksandra Malykhin <amalykhi>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: low    
Priority: unspecified CC: aconstan, amalykhi, aos-bugs, bperkins, jdelft, tsedovic, vlaad
Version: 4.6   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-04 10:24:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1958390, 2003445    
Bug Blocks:    

Description Yossi Boaron 2020-10-07 08:34:44 UTC
Description of problem:

Starting from 4.6, Keepalived mode changed from multicast to unicast, so after 4.5 to 4.6 upgrade completed Keepalived mode should flip to unicast automatically.

Since unicast and multicast  Keepalived VRRP instances are considered as separate domains, the mode flip was designed to run at the same time in all the nodes.

After 4.5 to 4.6 upgrade, the Keepalived mode was changed automatically to unicast in all the nodes but not at the same time.

Though nodes didn't flip mode at the same time, nodes with multicast config were able to communicate with nodes in unicast config and vice versa. so functionality-wise everything was fine and only a single master node held the API-VIP during this period of time. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy OCP4.5 cluster (I used 4.5.0-0.ci-2020-10-02-233843)
2.After deployment completed successfully, upgrade to 4.6 like so :
oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.ci-2020-10-03-024755 --allow-explicit-upgrade --force


Actual results:
Post upgrade, Keepalived mode in all the nodes was flipped to unicast but not at the same time 

Expected results:

Keepalived mode flip should happen at the same time in all the nodes

Additional info:

Relevant part of Keepalived-monitor container logs from the masters nodes:


Master1


time="2020-10-05T22:20:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:20:00.000282045 +0000 UTC m=+831.149437584"
time="2020-10-05T22:20:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:25:00 +0000 UTC"
time="2020-10-05T22:24:30Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:25:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:24:30Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.21 master-1 etcd-1 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:24:30Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:25:00Z" level=info msg="After sleep, before sending reload request " curTime="2020-10-05 22:25:00.000112512 +0000 UTC m=+1131.149268060"



Master-2

time="2020-10-05T21:58:11Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:10:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:10:00.000297833 +0000 UTC m=+728.334501622"
time="2020-10-05T22:10:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC"
time="2020-10-05T22:14:33Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:14:33Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.22 master-2 etcd-2 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:14:33Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:15:00Z" level=info msg="After sleep, before sending reload request " curTime="2020-10-05 22:15:00.000164153 +0000 UTC m=+1028.334367953"
~                                                                                                                                                                                                                                             


Master-0
-------------

time="2020-10-05T21:54:32Z" level=info msg="Monitor conf file doesn't exist" file=/etc/keepalived/unsupported-monitor.conf
time="2020-10-05T21:54:32Z" level=info msg="Config change detected" configChangeCtr=1 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:42Z" level=info msg="Config change detected" configChangeCtr=2 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Config change detected" configChangeCtr=3 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Apply config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:10:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:10:00.000358486 +0000 UTC m=+927.228777821"
time="2020-10-05T22:10:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC"
time="2020-10-05T22:14:34Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:14:34Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:14:34Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf

Comment 2 Yossi Boaron 2021-09-01 13:07:54 UTC
For some reason BZ status was not automatically set to on_qa after PR merged.

Will change BZ status to on_qa

Comment 11 Yossi Boaron 2021-10-13 07:02:09 UTC
@aconstan , Could you please explain the dependency between this and https://bugzilla.redhat.com/show_bug.cgi?id=1958390 ?

1958390 BZ is GCP UPI and this BZ relevant only for IPI BM

Comment 12 Red Hat Bugzilla 2023-09-15 00:49:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days