Bug 1885867 - [IPI baremetal] Post upgrade Keepalived mode flip to unicast isn't synced
Summary: [IPI baremetal] Post upgrade Keepalived mode flip to unicast isn't synced
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Yossi Boaron
QA Contact: Aleksandra Malykhin
URL:
Whiteboard:
Depends On: 1958390 2003445
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-07 08:34 UTC by Yossi Boaron
Modified: 2023-09-15 00:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-04 10:24:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 103 0 None closed Bug 1885867 : Flip to unicast only when MCO set to desired version in all nodes 2021-02-21 07:01:44 UTC

Description Yossi Boaron 2020-10-07 08:34:44 UTC
Description of problem:

Starting from 4.6, Keepalived mode changed from multicast to unicast, so after 4.5 to 4.6 upgrade completed Keepalived mode should flip to unicast automatically.

Since unicast and multicast  Keepalived VRRP instances are considered as separate domains, the mode flip was designed to run at the same time in all the nodes.

After 4.5 to 4.6 upgrade, the Keepalived mode was changed automatically to unicast in all the nodes but not at the same time.

Though nodes didn't flip mode at the same time, nodes with multicast config were able to communicate with nodes in unicast config and vice versa. so functionality-wise everything was fine and only a single master node held the API-VIP during this period of time. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Deploy OCP4.5 cluster (I used 4.5.0-0.ci-2020-10-02-233843)
2.After deployment completed successfully, upgrade to 4.6 like so :
oc adm upgrade --to-image registry.svc.ci.openshift.org/ocp/release:4.6.0-0.ci-2020-10-03-024755 --allow-explicit-upgrade --force


Actual results:
Post upgrade, Keepalived mode in all the nodes was flipped to unicast but not at the same time 

Expected results:

Keepalived mode flip should happen at the same time in all the nodes

Additional info:

Relevant part of Keepalived-monitor container logs from the masters nodes:


Master1


time="2020-10-05T22:20:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:20:00.000282045 +0000 UTC m=+831.149437584"
time="2020-10-05T22:20:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:25:00 +0000 UTC"
time="2020-10-05T22:24:30Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:25:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:24:30Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.21 master-1 etcd-1 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:24:30Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:25:00Z" level=info msg="After sleep, before sending reload request " curTime="2020-10-05 22:25:00.000112512 +0000 UTC m=+1131.149268060"



Master-2

time="2020-10-05T21:58:11Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:10:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:10:00.000297833 +0000 UTC m=+728.334501622"
time="2020-10-05T22:10:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC"
time="2020-10-05T22:14:33Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:14:33Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.22 master-2 etcd-2 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:14:33Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:15:00Z" level=info msg="After sleep, before sending reload request " curTime="2020-10-05 22:15:00.000164153 +0000 UTC m=+1028.334367953"
~                                                                                                                                                                                                                                             


Master-0
-------------

time="2020-10-05T21:54:32Z" level=info msg="Monitor conf file doesn't exist" file=/etc/keepalived/unsupported-monitor.conf
time="2020-10-05T21:54:32Z" level=info msg="Config change detected" configChangeCtr=1 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:42Z" level=info msg="Config change detected" configChangeCtr=2 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Config change detected" configChangeCtr=3 current config="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Apply config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {0 0 0 [] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[]} false}"
time="2020-10-05T21:54:52Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf
time="2020-10-05T22:10:00Z" level=info msg="Update Mode request detected, verify that upgrade process completed" desiredModeInfo.Mode=unicast tickerTime="2020-10-05 22:10:00.000358486 +0000 UTC m=+927.228777821"
time="2020-10-05T22:10:00Z" level=info msg="Planned time for Mode update" desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC"
time="2020-10-05T22:14:34Z" level=info msg="Update Mode from newConfig.EnableUnicast to desiredModeInfo.Mode" desiredModeInfo.Mode=unicast desiredModeInfo.Time="2020-10-05 22:15:00 +0000 UTC" newConfig.EnableUnicast=true
time="2020-10-05T22:14:34Z" level=info msg="Mode Update config change" curConfig="{{ostest test.metalkube.org 192.168.111.5 14 A AAAA  10 192.168.111.4 93 A AAAA 32 0} {123 123 123 [{master-0 192.168.111.20 123} {master-1 192.168.111.21 123} {master-2 192.168.111.22 123}] } 192.168.111.20 master-0 etcd-0 enp2s0 [192.168.111.1]  {[192.168.111.20 192.168.111.21 192.168.111.22 192.168.111.23 192.168.111.24]} true}"
time="2020-10-05T22:14:34Z" level=info msg="Runtimecfg rendering template" path=/etc/keepalived/keepalived.conf

Comment 2 Yossi Boaron 2021-09-01 13:07:54 UTC
For some reason BZ status was not automatically set to on_qa after PR merged.

Will change BZ status to on_qa

Comment 11 Yossi Boaron 2021-10-13 07:02:09 UTC
@aconstan , Could you please explain the dependency between this and https://bugzilla.redhat.com/show_bug.cgi?id=1958390 ?

1958390 BZ is GCP UPI and this BZ relevant only for IPI BM

Comment 12 Red Hat Bugzilla 2023-09-15 00:49:17 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.