Bug 1886450

Summary: Keepalived router id check not documented for RHV/VMware IPI
Product: OpenShift Container Platform Reporter: Andrew Downs <adowns>
Component: InstallerAssignee: Donna DaCosta <ddacosta>
Installer sub component: OpenShift on RHV QA Contact: Michael Burman <mburman>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aos-bugs, bperkins, ctomasko, ddacosta, dougsland, eslutsky, jpasztor, mburman, mkalinin, shardy
Version: 4.5Keywords: Documentation
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:02:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Downs 2020-10-08 13:35:03 UTC
Document URL: 

N/A

Section Number and Name: 

RHV and VMware Docs below, may not be exhaustive.


https://docs.openshift.com/container-platform/4.5/installing/installing_rhv/installing-rhv-default.html
https://docs.openshift.com/container-platform/4.5/installing/installing_rhv/installing-rhv-customizations.html
https://docs.openshift.com/container-platform/4.5/installing/installing_vsphere/installing-vsphere-installer-provisioned.html
https://docs.openshift.com/container-platform/4.5/installing/installing_vsphere/installing-vsphere-installer-provisioned-customizations.html
https://docs.openshift.com/container-platform/4.5/installing/installing_vsphere/installing-vsphere-installer-provisioned-network-customizations.html

Describe the issue: 

The BZ[1] has a work around to allow customers to work out if there will be a clash in the virtual router id assigned. This is only documented for Bare metal but it affects RHV and VMware IPI installations as well. 

Not knowing that this can happen leads to clusters failing to install for a none obvious reason.

Suggestions for improvement: 

Add the the matching docs from the bare metal install[2] so customers can run the check.

Additional information: 



[1] https://bugzilla.redhat.com/show_bug.cgi?id=1821667
[2] https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md

Comment 1 Gal Zaidman 2021-03-31 09:16:16 UTC
As far as I saw this is only documented in the Dev docs [1] not in the OCP docs of baremetal.
Do you think we need to add this to our Docs?
If so those it make sense to reference the tool: quay.io/openshift/origin-baremetal-runtimecfg in the official docs?

[1] https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md

Comment 2 Andrew Downs 2021-03-31 09:52:05 UTC
From my perspective, I'd made the assumption that these changes would be flowing into the main docs, so I'd say yes it should be. 

I think it is pretty key bit of information for any customer installing with the IPI methods. It is difficult to debug if you don't know about it. Fundamentally we are asking customers to work out a way to manage these amongst all of their clusters, so we should tell them how to gather the info and how it could affect them. Arguably the installer should present this information as well.

Comment 3 Steve Goodman 2021-05-10 12:45:11 UTC
(In reply to Andrew Downs from comment #0)
 
> The BZ[1] has a work around to allow customers to work out if there will be
> a clash in the virtual router id assigned. This is only documented for Bare
> metal but it affects RHV and VMware IPI installations as well. 
> 
> Not knowing that this can happen leads to clusters failing to install for a
> none obvious reason.
> 
> Suggestions for improvement: 
> 
> Add the the matching docs from the bare metal install[2] so customers can
> run the check.

> [2] https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md

Is this the content that you're talking about?:

----

When the Virtual IPs are managed using multicast (VRRPv2 or VRRPv3), there is a limitation for 255 unique virtual routers per multicast domain. In case you have pre-existing virtual routers using the standard IPv4 or IPv6 multicast groups, you can learn the VIPs the installation will choose by running the following command:

$ podman run quay.io/openshift/origin-baremetal-runtimecfg:TAG vr-ids cnf10
APIVirtualRouterID: 147
DNSVirtualRouterID: 158
IngressVirtualRouterID: 2

Where TAG is the release you are going to install, e.g., 4.5. Let's see another example:

$ podman run quay.io/openshift/origin-baremetal-runtimecfg:TAG vr-ids cnf11
APIVirtualRouterID: 228
DNSVirtualRouterID: 239
IngressVirtualRouterID: 147

In the example output above you can see that installing two clusters in the same multicast domain with names cnf10 and cnf11 would lead to a conflict. You should also take care that none of those are taken by other independent VRRP virtual routers running in the same broadcast domain.

----

Comment 6 Andrew Downs 2021-05-11 15:57:36 UTC
(In reply to Steve Goodman from comment #3)
> (In reply to Andrew Downs from comment #0)
>  
> > The BZ[1] has a work around to allow customers to work out if there will be
> > a clash in the virtual router id assigned. This is only documented for Bare
> > metal but it affects RHV and VMware IPI installations as well. 
> > 
> > Not knowing that this can happen leads to clusters failing to install for a
> > none obvious reason.
> > 
> > Suggestions for improvement: 
> > 
> > Add the the matching docs from the bare metal install[2] so customers can
> > run the check.
> 
> > [2] https://github.com/openshift/installer/blob/master/docs/user/metal/install_ipi.md
> 
> Is this the content that you're talking about?:
> 
> ----
> 
> When the Virtual IPs are managed using multicast (VRRPv2 or VRRPv3), there
> is a limitation for 255 unique virtual routers per multicast domain. In case
> you have pre-existing virtual routers using the standard IPv4 or IPv6
> multicast groups, you can learn the VIPs the installation will choose by
> running the following command:
> 
> $ podman run quay.io/openshift/origin-baremetal-runtimecfg:TAG vr-ids cnf10
> APIVirtualRouterID: 147
> DNSVirtualRouterID: 158
> IngressVirtualRouterID: 2
> 
> Where TAG is the release you are going to install, e.g., 4.5. Let's see
> another example:
> 
> $ podman run quay.io/openshift/origin-baremetal-runtimecfg:TAG vr-ids cnf11
> APIVirtualRouterID: 228
> DNSVirtualRouterID: 239
> IngressVirtualRouterID: 147
> 
> In the example output above you can see that installing two clusters in the
> same multicast domain with names cnf10 and cnf11 would lead to a conflict.
> You should also take care that none of those are taken by other independent
> VRRP virtual routers running in the same broadcast domain.
> 
> ----

Yep that is the info, although depending on where it ends up in RHV/VMware sections I think the "When the Virtual IPs are managed" is not a When but more like "With IPI installation the Virtual IPs are managed"

Comment 14 Gal Zaidman 2021-08-08 14:54:48 UTC
commented on the PR lets move the discussion to the PR

Comment 26 errata-xmlrpc 2022-03-10 16:02:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056