Bug 1924171 - ovn-kube must handle single-stack to dual-stack migration
Summary: ovn-kube must handle single-stack to dual-stack migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
high
low
Target Milestone: ---
: 4.8.0
Assignee: Dan Winship
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 1937829 1956352
TreeView+ depends on / blocked
 
Reported: 2021-02-02 18:04 UTC by Dan Winship
Modified: 2021-07-27 22:39 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1937829 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:37:56 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:39:28 UTC

Description Dan Winship 2021-02-02 18:04:39 UTC
When migrating a cluster from single-stack to dual-stack, CNO will eventually update the ovnkube config and restart the ovnkube-masters. This will currently fail with:

panic: failed to set gateway chassis 35d1489e-e7f2-494a-99fb-0c4ae0419690 for distributed gateway port rtos-node_local_switch: stdout: "", stderr: "ovn-nbctl: rtos-node_local_switch: port already exists with different network\n", error: OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist lrp-add ovn_cluster_router rtos-node_local_switch 0a:58:a9:fe:00:02 169.254.0.2/20 fd99::2/64 -- --id=@gw create gateway_chassis chassis_name=35d1489e-e7f2-494a-99fb-0c4ae0419690 external_ids:dgp_name=rtos-node_local_switch name=rtos-node_local_switch_35d1489e-e7f2-494a-99fb-0c4ae0419690 priority=100 -- set logical_router_port rtos-node_local_switch gateway_chassis=@gw' failed: exit status 1

The code currently assumes that the switch will either not exist, or it will have the correct config; it doesn't deal with it having a "half-correct" config.


We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)


We are only supporting going from a single-stack config to a dual-stack config which is a superset of it. eg, from

    --cluster-subnets="10.128.0.0/14/23"
    --service-cidrs="172.30.0.0/16"

to

    --cluster-subnets="10.128.0.0/14/23,fd01::/48/64"
    --service-cidrs="172.30.0.0/16,fd02::/112"

Comment 1 Dan Williams 2021-02-02 20:05:48 UTC
-> danw since he's actually working on it right now

Comment 2 Antonio Ojea 2021-02-03 09:52:47 UTC
> We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.)

appreciate if you can be more descriptive with the CNO steps so we can "mimic" the same scenario in the test

1. Create single stack cluster
2. enable dualstack feature gate and restart apiservers?
3. modify ovn-kube with dual stack parameters?
...

Comment 3 Dan Winship 2021-02-03 13:30:24 UTC
(In reply to Antonio Ojea from comment #2)
> 1. Create single stack cluster

More specifically: create docker hosts that already have both IPv4 and IPv6 addresses, but then install a cluster onto them that only uses IPv4 in its config.

> 2. enable dualstack feature gate and restart apiservers?

(restarting the apiservers both to enable the feature gate and to set the dual-stack service cidr)

And you need to restart the kubelets to enable the feature gate there too.

> 3. modify ovn-kube with dual stack parameters?

Yup. And both masters and nodes will need to be restarted I think.

Comment 5 Antonio Ojea 2021-03-11 16:01:08 UTC
Merged in openshift https://github.com/openshift/ovn-kubernetes/pull/440
Needs backports to 4.7

Comment 6 Dan Winship 2021-03-15 16:17:22 UTC
This is only partially implemented but this is all that we're backporting to 4.7 for now (the customer will need to use a very manual process) so I'm marking it VERIFIED so we can get the 4.7 backport in.

Comment 9 errata-xmlrpc 2021-07-27 22:37:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.