Bug 1873955

Summary:	[IPI baremetal] Keepalived.conf cannot use new interface name after SDN migration
Product:	OpenShift Container Platform	Reporter:	Peng Liu <pliu>
Component:	Machine Config Operator	Assignee:	Ben Nemec <bnemec>
Status:	CLOSED ERRATA	QA Contact:	Victor Voronkov <vvoronko>
Severity:	urgent	Docs Contact:
Priority:	urgent
Version:	4.6	CC:	asegurap, bperkins, jerzhang, yboaron
Target Milestone:	---	Keywords:	Triaged
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: After SDN migration from openshift-SDN to OVN-K8S node's control plane IP and the VIP being assigned to ovs bridge instead of physical NIC. After the VIP assigned to the ovs bridge, we still have an orphan route to control plane network pointing to the physical NIC instead of the ovs-bridge. Consequence: nodes can't communicate with other nodes in the control plane network. which leads to Keepalived wrongly sets the API VIP in multiple nodes, and as a result of that the API is unavailable. Fix: Set the network mask of VIPs to host netmask (e.g: /32 for IPV4) Result: Successfully migrate SDN plugin to OVN-K8S	Story Points:	---
Clone Of:
Clones:	1878905 (view as bug list)		Environment:
Last Closed:	2020-10-27 16:36:11 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1854306
Bug Blocks:

Description Peng Liu 2020-08-31 07:54:17 UTC

Description of problem:
During SDN migration (migrate cluster network provider form openshift-sdn to ovn-kube), the node ip will be allocated to ovs bridge interface `br-ex`, instead of the physical interface. However, the keepalived.conf cannot be regenerated accordingly. It sill used the name of the physical interface. It causes cluster inaccessible from the cluster network.

Version-Release number of selected component (if applicable):
4.6.0-0.ci-2020-08-30-084452

How reproducible:


Steps to Reproduce:
1. Create a baremetal cluster
2. Allow migration operation by `oc annotate Network.operator.openshift.io cluster "networkoperator.openshift.io/network-migration"=""`
3. Start migration by `oc patch Network.config.openshift.io cluster --type='merge' --patch '{"spec":{"networkType":"OVNKubernetes"}}'`
4. Wait MCO to apply new Machine Config with 'ovs-configuration.service' on master and worker. After master/worker node reboot, login to the node and check the keepalived.conf.

Actual results:
The 'interface' field of vrrp instance is still the physical interface, e.g. 'enp2s0'.

Expected results:
The 'interface' is changed to 'br-ex', which is the current default interface of the node.

Additional info:

Comment 3 Peng Liu 2020-09-16 13:54:50 UTC

Put back to 4.6, as the workaround doesn't work after https://github.com/openshift/ovn-kubernetes/pull/269. Without fixing this issue, the sdn migration cannot work.

Comment 4 Victor Voronkov 2020-10-04 08:27:05 UTC

[kni@provisionhost-0-0 ~]$ oc version
Client Version: 4.6.0-0.nightly-2020-10-02-065738
Server Version: 4.6.0-0.ci-2020-10-02-054056

Cluster deployed with OpenShiftSDN
 
After migration to OVN keepalived conf switched from interface to br-ex

[core@master-0-0 ~]$ cat /etc/keepalived/keepalived.conf | grep interface
    interface br-ex
    interface br-ex

Also even after this change, it took a lot of time to Kube-API to rise, more than few hours

Comment 7 errata-xmlrpc 2020-10-27 16:36:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196