When migrating a cluster from single-stack to dual-stack, CNO will eventually update the ovnkube config and restart the ovnkube-masters. This will currently fail with: panic: failed to set gateway chassis 35d1489e-e7f2-494a-99fb-0c4ae0419690 for distributed gateway port rtos-node_local_switch: stdout: "", stderr: "ovn-nbctl: rtos-node_local_switch: port already exists with different network\n", error: OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist lrp-add ovn_cluster_router rtos-node_local_switch 0a:58:a9:fe:00:02 169.254.0.2/20 fd99::2/64 -- --id=@gw create gateway_chassis chassis_name=35d1489e-e7f2-494a-99fb-0c4ae0419690 external_ids:dgp_name=rtos-node_local_switch name=rtos-node_local_switch_35d1489e-e7f2-494a-99fb-0c4ae0419690 priority=100 -- set logical_router_port rtos-node_local_switch gateway_chassis=@gw' failed: exit status 1 The code currently assumes that the switch will either not exist, or it will have the correct config; it doesn't deal with it having a "half-correct" config. We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.) We are only supporting going from a single-stack config to a dual-stack config which is a superset of it. eg, from --cluster-subnets="10.128.0.0/14/23" --service-cidrs="172.30.0.0/16" to --cluster-subnets="10.128.0.0/14/23,fd01::/48/64" --service-cidrs="172.30.0.0/16,fd02::/112"
-> danw since he's actually working on it right now
> We probably need to set up a CI job upstream to test single-to-dual migration there, and then once that's working, pull the changes downstream. (This will be needed in 4.7.z.) appreciate if you can be more descriptive with the CNO steps so we can "mimic" the same scenario in the test 1. Create single stack cluster 2. enable dualstack feature gate and restart apiservers? 3. modify ovn-kube with dual stack parameters? ...
(In reply to Antonio Ojea from comment #2) > 1. Create single stack cluster More specifically: create docker hosts that already have both IPv4 and IPv6 addresses, but then install a cluster onto them that only uses IPv4 in its config. > 2. enable dualstack feature gate and restart apiservers? (restarting the apiservers both to enable the feature gate and to set the dual-stack service cidr) And you need to restart the kubelets to enable the feature gate there too. > 3. modify ovn-kube with dual stack parameters? Yup. And both masters and nodes will need to be restarted I think.
merged upstream https://github.com/ovn-org/ovn-kubernetes/pull/2013 Downstream PR https://github.com/openshift/ovn-kubernetes/pull/440
Merged in openshift https://github.com/openshift/ovn-kubernetes/pull/440 Needs backports to 4.7
This is only partially implemented but this is all that we're backporting to 4.7 for now (the customer will need to use a very manual process) so I'm marking it VERIFIED so we can get the 4.7 backport in.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438