Bug 1937829 - ovn-kube must handle single-stack to dual-stack migration
Summary: ovn-kube must handle single-stack to dual-stack migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.7.z
Assignee: Antonio Ojea
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On: 1924171
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-11 16:03 UTC by Antonio Ojea
Modified: 2021-05-03 14:20 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1924171
: 1956352 (view as bug list)
Environment:
Last Closed: 2021-03-30 04:46:37 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1017 0 None open Bug 1937829: Cherry-pick dual stack migration 2021-03-15 16:17:38 UTC
Github openshift ovn-kubernetes pull 460 0 None open [WIP] Cherry-pick dual-stack conversion logic 2021-03-11 16:03:03 UTC
Red Hat Product Errata RHSA-2021:0957 0 None None None 2021-03-30 04:46:56 UTC

Description Antonio Ojea 2021-03-11 16:03:03 UTC
+++ This bug was initially created as a clone of Bug #1924171 +++

When migrating a cluster from single-stack to dual-stack, CNO will eventually update the ovnkube config and restart the ovnkube-masters. This will currently fail with:

panic: failed to set gateway chassis 35d1489e-e7f2-494a-99fb-0c4ae0419690 for distributed gateway port rtos-node_local_switch: stdout: "", stderr: "ovn-nbctl: rtos-node_local_switch: port already exists with different network\n", error: OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist lrp-add ovn_cluster_router rtos-node_local_switch 0a:58:a9:fe:00:02 169.254.0.2/20 fd99::2/64 -- --id=@gw create gateway_chassis chassis_name=35d1489e-e7f2-494a-99fb-0c4ae0419690 external_ids:dgp_name=rtos-node_local_switch name=rtos-node_local_switch_35d1489e-e7f2-494a-99fb-0c4ae0419690 priority=100 -- set logical_router_port rtos-node_local_switch gateway_chassis=@gw' failed: exit status 1

The code currently assumes that the switch will either not exist, or it will have the correct config; it doesn't deal with it having a "half-correct" config.


We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)


We are only supporting going from a single-stack config to a dual-stack config which is a superset of it. eg, from

    --cluster-subnets="10.128.0.0/14/23"
    --service-cidrs="172.30.0.0/16"

to

    --cluster-subnets="10.128.0.0/14/23,fd01::/48/64"
    --service-cidrs="172.30.0.0/16,fd02::/112"

--- Additional comment from Dan Williams on 2021-02-02 20:05:48 UTC ---

-> danw since he's actually working on it right now

--- Additional comment from Antonio Ojea on 2021-02-03 09:52:47 UTC ---

> We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)

appreciate if you can be more descriptive with the CNO steps so we can "mimic" the same scenario in the test

1. Create single stack cluster
2. enable dualstack feature gate and restart apiservers?
3. modify ovn-kube with dual stack parameters?
...

--- Additional comment from Dan Winship on 2021-02-03 13:30:24 UTC ---

(In reply to Antonio Ojea from comment #2)
> 1. Create single stack cluster

More specifically: create docker hosts that already have both IPv4 and IPv6 addresses, but then install a cluster onto them that only uses IPv4 in its config.

> 2. enable dualstack feature gate and restart apiservers?

(restarting the apiservers both to enable the feature gate and to set the dual-stack service cidr)

And you need to restart the kubelets to enable the feature gate there too.

> 3. modify ovn-kube with dual stack parameters?

Yup. And both masters and nodes will need to be restarted I think.

--- Additional comment from Antonio Ojea on 2021-02-24 16:20:47 UTC ---

merged upstream https://github.com/ovn-org/ovn-kubernetes/pull/2013

Downstream PR https://github.com/openshift/ovn-kubernetes/pull/440

--- Additional comment from Antonio Ojea on 2021-03-11 16:01:08 UTC ---

Merged in openshift https://github.com/openshift/ovn-kubernetes/pull/440
Needs backports to 4.7

Comment 3 zhaozhanqi 2021-03-22 03:30:41 UTC
@vvoronko@redhat.com Could you help verified this bug, thanks.

Comment 4 zhaozhanqi 2021-03-22 07:55:40 UTC
(In reply to zhaozhanqi from comment #3)
> @vvoronko@redhat.com Could you help verified this bug, thanks.

sorry, please ignore this. type wrong bug.

Comment 14 errata-xmlrpc 2021-03-30 04:46:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.4 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:0957


Note You need to log in before you can comment on or make changes to this bug.