Bug 1821667 - keepalived virtual routerids can easily clash when running several clusters
Summary: keepalived virtual routerids can easily clash when running several clusters
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
Depends On:
Blocks: 1823465
TreeView+ depends on / blocked
Reported: 2020-04-07 11:31 UTC by Karim Boumedhel
Modified: 2020-07-13 17:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Using VRRP to manager the Virtual IPs for OCP IPI clusters means that there are only 8 bits available for a virtual router ID on a given broadcast domain. There may be be virtual router IDs already in use in the broadcast domain we deploy to Consequence: Collisions end up preventing nodes from taking on their Virtual IPs. Fix: Add a tool (and document its usage) that allows the user to check which virtual router IDs will be used for the chosen cluster name. Result: Users now have a way to know about Virtual Router IDs before deploying.
Clone Of:
: 1823465 (view as bug list)
Last Closed: 2020-07-13 17:25:52 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 54 None closed bug 1821667: runtimecfg: tool to show the Virtual Router IDs 2020-09-06 22:28:37 UTC
Github openshift installer pull 3463 None closed bug 1821667: baremetal IPI: Document Virtual Router IDs 2020-09-06 22:28:36 UTC
Red Hat Product Errata RHBA-2020:2409 None None None 2020-07-13 17:26:05 UTC

Description Karim Boumedhel 2020-04-07 11:31:45 UTC
Description of problem:
keepalived virtual routerids clash when running several clusters, causing some vips not to go up

Version-Release number of selected component (if applicable):

How reproducible:
always, given one uses specific cluster names

Steps to Reproduce:
1 .deploy a cluster with name cnf10 and a second one with cnf11
2. api virtual id on first cluster conflicts with ingress virtual id on the second one as both are evaluated with this function https://github.com/openshift/baremetal-runtimecfg/pull/54/files#diff-3b5c896aef01987443b23dc503e418eaR147

Actual results:
conflicts, resulting in ingress vip not going up on workers

Expected results:
no conflicts

Additional info:
a tool should at least anticipates the generated ids to warn end user that he should not use those two cluster names together
something like this for instance

package main

import "fmt"

func FletcherChecksum8(inp string) uint8 {
	var ckA, ckB uint8
	for i := 0; i < len(inp); i++ {
		ckA = (ckA + inp[i]) % 0xf
		ckB = (ckB + ckA) % 0xf
	return (ckB << 4) | ckA

func main() {
	cluster1 := "cnf10"
	cluster2 := "cnf11"
	api_id1 := FletcherChecksum8(cluster1+"-api") + 1
	dns_id1 := FletcherChecksum8(cluster1+"-dns") + 1
	ingress_id1 := FletcherChecksum8(cluster1+"-ingress") + 1
	api_id2 := FletcherChecksum8(cluster2+"-api") + 1
	dns_id2 := FletcherChecksum8(cluster2+"-dns") + 1
	ingress_id2 := FletcherChecksum8(cluster2+"-ingress") + 1
	fmt.Printf("cluster: %s api: %d dns: %d ingress: %d\n", cluster1, api_id1, dns_id1, ingress_id1)
	fmt.Printf("cluster: %s api: %d dns: %d ingress: %d\n", cluster2, api_id2, dns_id2, ingress_id2)

Comment 1 Yossi Boaron 2020-04-13 08:56:53 UTC
Just to clarify, keepalived virtual router ids clashes only if the clusters deployed on the same L2 domain.

Comment 7 Victor Voronkov 2020-04-20 08:39:55 UTC
Verified on 4.5.0-0.nightly-2020-04-14-031010

checked from master node:

[master-0-0 ~]$ sudo crictl exec $(sudo crictl ps --name keepalived-monitor | awk 'FNR==2{ print $1}') runtimecfg vr-ids cnf10
APIVirtualRouterID: 147
DNSVirtualRouterID: 158
IngressVirtualRouterID: 2
[core@master-0-0 ~]$ sudo crictl exec $(sudo crictl ps --name keepalived-monitor | awk 'FNR==2{ print $1}') runtimecfg vr-ids cnf11
APIVirtualRouterID: 228
DNSVirtualRouterID: 239
IngressVirtualRouterID: 147

Checked on external host by documentation provided here https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
[~]# podman run quay.io/openshift/origin-baremetal-runtimecfg:4.5 vr-ids cnf11
APIVirtualRouterID: 228
DNSVirtualRouterID: 239
IngressVirtualRouterID: 147

Comment 8 errata-xmlrpc 2020-07-13 17:25:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.