Bug 1821667 - keepalived virtual routerids can easily clash when running several clusters
Summary: keepalived virtual routerids can easily clash when running several clusters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.5.0
Assignee: Antoni Segura Puimedon
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On:
Blocks: 1823465
TreeView+ depends on / blocked
 
Reported: 2020-04-07 11:31 UTC by Karim Boumedhel
Modified: 2020-07-13 17:26 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Using VRRP to manager the Virtual IPs for OCP IPI clusters means that there are only 8 bits available for a virtual router ID on a given broadcast domain. There may be be virtual router IDs already in use in the broadcast domain we deploy to Consequence: Collisions end up preventing nodes from taking on their Virtual IPs. Fix: Add a tool (and document its usage) that allows the user to check which virtual router IDs will be used for the chosen cluster name. Result: Users now have a way to know about Virtual Router IDs before deploying.
Clone Of:
: 1823465 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:25:52 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 54 0 None closed bug 1821667: runtimecfg: tool to show the Virtual Router IDs 2021-01-25 23:40:53 UTC
Github openshift installer pull 3463 0 None closed bug 1821667: baremetal IPI: Document Virtual Router IDs 2021-01-25 23:40:52 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:26:05 UTC

Description Karim Boumedhel 2020-04-07 11:31:45 UTC
Description of problem:
keepalived virtual routerids clash when running several clusters, causing some vips not to go up

Version-Release number of selected component (if applicable):
>=4.4

How reproducible:
always, given one uses specific cluster names

Steps to Reproduce:
1 .deploy a cluster with name cnf10 and a second one with cnf11
2. api virtual id on first cluster conflicts with ingress virtual id on the second one as both are evaluated with this function https://github.com/openshift/baremetal-runtimecfg/pull/54/files#diff-3b5c896aef01987443b23dc503e418eaR147

Actual results:
conflicts, resulting in ingress vip not going up on workers

Expected results:
no conflicts

Additional info:
a tool should at least anticipates the generated ids to warn end user that he should not use those two cluster names together
something like this for instance

```
package main

import "fmt"

func FletcherChecksum8(inp string) uint8 {
	var ckA, ckB uint8
	for i := 0; i < len(inp); i++ {
		ckA = (ckA + inp[i]) % 0xf
		ckB = (ckB + ckA) % 0xf
	}
	return (ckB << 4) | ckA
}

func main() {
	cluster1 := "cnf10"
	cluster2 := "cnf11"
	api_id1 := FletcherChecksum8(cluster1+"-api") + 1
	dns_id1 := FletcherChecksum8(cluster1+"-dns") + 1
	ingress_id1 := FletcherChecksum8(cluster1+"-ingress") + 1
	api_id2 := FletcherChecksum8(cluster2+"-api") + 1
	dns_id2 := FletcherChecksum8(cluster2+"-dns") + 1
	ingress_id2 := FletcherChecksum8(cluster2+"-ingress") + 1
	fmt.Printf("cluster: %s api: %d dns: %d ingress: %d\n", cluster1, api_id1, dns_id1, ingress_id1)
	fmt.Printf("cluster: %s api: %d dns: %d ingress: %d\n", cluster2, api_id2, dns_id2, ingress_id2)
}
```

Comment 1 Yossi Boaron 2020-04-13 08:56:53 UTC
Just to clarify, keepalived virtual router ids clashes only if the clusters deployed on the same L2 domain.

Comment 7 Victor Voronkov 2020-04-20 08:39:55 UTC
Verified on 4.5.0-0.nightly-2020-04-14-031010

checked from master node:

[master-0-0 ~]$ sudo crictl exec $(sudo crictl ps --name keepalived-monitor | awk 'FNR==2{ print $1}') runtimecfg vr-ids cnf10
APIVirtualRouterID: 147
DNSVirtualRouterID: 158
IngressVirtualRouterID: 2
[core@master-0-0 ~]$ sudo crictl exec $(sudo crictl ps --name keepalived-monitor | awk 'FNR==2{ print $1}') runtimecfg vr-ids cnf11
APIVirtualRouterID: 228
DNSVirtualRouterID: 239
IngressVirtualRouterID: 147

Checked on external host by documentation provided here https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
[~]# podman run quay.io/openshift/origin-baremetal-runtimecfg:4.5 vr-ids cnf11
APIVirtualRouterID: 228
DNSVirtualRouterID: 239
IngressVirtualRouterID: 147

Comment 8 errata-xmlrpc 2020-07-13 17:25:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.