Bug 1905233

Summary: Incorrect interface selection when creating keepalived configuration for IPv6
Product: OpenShift Container Platform Reporter: Ori Amizur <oamizur>
Component: NetworkingAssignee: Yossi Boaron <yboaron>
Networking sub component: runtime-cfg QA Contact: Rei <rhalle>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: urgent CC: alazar, asegurap, bbennett, bperkins, vvoronko, yboaron
Version: 4.6Keywords: Triaged
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: OCP-Metal-External-Blocker
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Set overlapping IPV6 subnets for different NICs, for example: Interface A subnet: 1001:db8::/120 , Interface B subnet: 1001:db8::f00/120 VIP address 1001:db8::64 Consequence: The wrong interface was set in Keepalived conf file and as a result, the deployment failed. Fix: Select the NIC whose address is L2 connected with the VIP address. Result: The correct interface is selected for Keepalived conf
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-10 11:24:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1925291, 1927068    
Bug Blocks:    

Description Ori Amizur 2020-12-07 19:27:59 UTC
Description of problem:

When trying to install a cluster with two interfaces a wrong interface is selected for the VIPs.  The 2 subnets are 1001:db8::/120, and 1001:db8::f00/120.
The VIPs are API 1001:db8::64, Ingress 1001:db8::65.  When wrong interface is selected, the installation is stuck.

How reproducible:

Try install a cluster with these two subnets. It happened with Assisted Installer.


Actual results:

The interface of the second subnet is selected.  Therefore, the VIPs are configured on this interface on every master node.


Expected results:

The selected interface should be on the first subnet.


Additional info:

The problem is probably here.  

https://github.com/openshift/baremetal-runtimecfg/blob/a9f2b4411e93d10e5459f6b07cc490c39da7c8b2/pkg/config/net.go#L36-#L39

Instead of replacing the prefix, the prefix should be calculated from the RA routes.

Comment 7 errata-xmlrpc 2021-03-10 11:24:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.7.1 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0678