Bug 1937829

Summary: ovn-kube must handle single-stack to dual-stack migration
Product: OpenShift Container Platform Reporter: Antonio Ojea <aojeagar>
Component: NetworkingAssignee: Antonio Ojea <aojeagar>
Networking sub component: ovn-kubernetes QA Contact: zhaozhanqi <zzhao>
Status: CLOSED ERRATA Docs Contact:
Severity: urgent    
Priority: urgent CC: anbhat, anusaxen, aojeagar, bbennett, danw, dcbw, trozet, vvoronko, zzhao
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1924171
: 1956352 (view as bug list) Environment:
Last Closed: 2021-03-30 04:46:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1924171    
Bug Blocks:    

Description Antonio Ojea 2021-03-11 16:03:03 UTC
+++ This bug was initially created as a clone of Bug #1924171 +++

When migrating a cluster from single-stack to dual-stack, CNO will eventually update the ovnkube config and restart the ovnkube-masters. This will currently fail with:

panic: failed to set gateway chassis 35d1489e-e7f2-494a-99fb-0c4ae0419690 for distributed gateway port rtos-node_local_switch: stdout: "", stderr: "ovn-nbctl: rtos-node_local_switch: port already exists with different network\n", error: OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist lrp-add ovn_cluster_router rtos-node_local_switch 0a:58:a9:fe:00:02 169.254.0.2/20 fd99::2/64 -- --id=@gw create gateway_chassis chassis_name=35d1489e-e7f2-494a-99fb-0c4ae0419690 external_ids:dgp_name=rtos-node_local_switch name=rtos-node_local_switch_35d1489e-e7f2-494a-99fb-0c4ae0419690 priority=100 -- set logical_router_port rtos-node_local_switch gateway_chassis=@gw' failed: exit status 1

The code currently assumes that the switch will either not exist, or it will have the correct config; it doesn't deal with it having a "half-correct" config.


We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)


We are only supporting going from a single-stack config to a dual-stack config which is a superset of it. eg, from

    --cluster-subnets="10.128.0.0/14/23"
    --service-cidrs="172.30.0.0/16"

to

    --cluster-subnets="10.128.0.0/14/23,fd01::/48/64"
    --service-cidrs="172.30.0.0/16,fd02::/112"

--- Additional comment from Dan Williams on 2021-02-02 20:05:48 UTC ---

-> danw since he's actually working on it right now

--- Additional comment from Antonio Ojea on 2021-02-03 09:52:47 UTC ---

> We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)

appreciate if you can be more descriptive with the CNO steps so we can "mimic" the same scenario in the test

1. Create single stack cluster
2. enable dualstack feature gate and restart apiservers?
3. modify ovn-kube with dual stack parameters?
...

--- Additional comment from Dan Winship on 2021-02-03 13:30:24 UTC ---

(In reply to Antonio Ojea from comment #2)
> 1. Create single stack cluster

More specifically: create docker hosts that already have both IPv4 and IPv6 addresses, but then install a cluster onto them that only uses IPv4 in its config.

> 2. enable dualstack feature gate and restart apiservers?

(restarting the apiservers both to enable the feature gate and to set the dual-stack service cidr)

And you need to restart the kubelets to enable the feature gate there too.

> 3. modify ovn-kube with dual stack parameters?

Yup. And both masters and nodes will need to be restarted I think.

--- Additional comment from Antonio Ojea on 2021-02-24 16:20:47 UTC ---

merged upstream https://github.com/ovn-org/ovn-kubernetes/pull/2013

Downstream PR https://github.com/openshift/ovn-kubernetes/pull/440

--- Additional comment from Antonio Ojea on 2021-03-11 16:01:08 UTC ---

Merged in openshift https://github.com/openshift/ovn-kubernetes/pull/440
Needs backports to 4.7

Comment 3 zhaozhanqi 2021-03-22 03:30:41 UTC
@vvoronko Could you help verified this bug, thanks.

Comment 4 zhaozhanqi 2021-03-22 07:55:40 UTC
(In reply to zhaozhanqi from comment #3)
> @vvoronko Could you help verified this bug, thanks.

sorry, please ignore this. type wrong bug.

Comment 14 errata-xmlrpc 2021-03-30 04:46:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.4 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0957