Bug 1873955 - [IPI baremetal] Keepalived.conf cannot use new interface name after SDN migration
Summary: [IPI baremetal] Keepalived.conf cannot use new interface name after SDN migra...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.6
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.6.0
Assignee: Ben Nemec
QA Contact: Victor Voronkov
URL:
Whiteboard:
Depends On: 1854306
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-31 07:54 UTC by Peng Liu
Modified: 2020-10-27 16:36 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: After SDN migration from openshift-SDN to OVN-K8S node's control plane IP and the VIP being assigned to ovs bridge instead of physical NIC. After the VIP assigned to the ovs bridge, we still have an orphan route to control plane network pointing to the physical NIC instead of the ovs-bridge. Consequence: nodes can't communicate with other nodes in the control plane network. which leads to Keepalived wrongly sets the API VIP in multiple nodes, and as a result of that the API is unavailable. Fix: Set the network mask of VIPs to host netmask (e.g: /32 for IPV4) Result: Successfully migrate SDN plugin to OVN-K8S
Clone Of:
: 1878905 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:36:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift baremetal-runtimecfg pull 100/ 0 None None None 2020-10-07 13:16:53 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:36:39 UTC

Description Peng Liu 2020-08-31 07:54:17 UTC
Description of problem:
During SDN migration (migrate cluster network provider form openshift-sdn to ovn-kube), the node ip will be allocated to ovs bridge interface `br-ex`, instead of the physical interface. However, the keepalived.conf cannot be regenerated accordingly. It sill used the name of the physical interface. It causes cluster inaccessible from the cluster network.

Version-Release number of selected component (if applicable):
4.6.0-0.ci-2020-08-30-084452

How reproducible:


Steps to Reproduce:
1. Create a baremetal cluster
2. Allow migration operation by `oc annotate Network.operator.openshift.io cluster "networkoperator.openshift.io/network-migration"=""`
3. Start migration by `oc patch Network.config.openshift.io cluster --type='merge' --patch '{"spec":{"networkType":"OVNKubernetes"}}'`
4. Wait MCO to apply new Machine Config with 'ovs-configuration.service' on master and worker. After master/worker node reboot, login to the node and check the keepalived.conf.

Actual results:
The 'interface' field of vrrp instance is still the physical interface, e.g. 'enp2s0'.

Expected results:
The 'interface' is changed to 'br-ex', which is the current default interface of the node.

Additional info:

Comment 3 Peng Liu 2020-09-16 13:54:50 UTC
Put back to 4.6, as the workaround doesn't work after https://github.com/openshift/ovn-kubernetes/pull/269. Without fixing this issue, the sdn migration cannot work.

Comment 4 Victor Voronkov 2020-10-04 08:27:05 UTC
[kni@provisionhost-0-0 ~]$ oc version
Client Version: 4.6.0-0.nightly-2020-10-02-065738
Server Version: 4.6.0-0.ci-2020-10-02-054056

Cluster deployed with OpenShiftSDN
 
After migration to OVN keepalived conf switched from interface to br-ex

[core@master-0-0 ~]$ cat /etc/keepalived/keepalived.conf | grep interface
    interface br-ex
    interface br-ex

Also even after this change, it took a lot of time to Kube-API to rise, more than few hours

Comment 7 errata-xmlrpc 2020-10-27 16:36:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.