Bug 1907872
Summary: | dual stack with an ipv6 network fails on bootstrap phase | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Karim Boumedhel <kboumedh> |
Component: | Etcd | Assignee: | Dan Mace <dmace> |
Status: | CLOSED ERRATA | QA Contact: | ge liu <geliu> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | bbennett, danw, dmace, dphillip, lwan, skolicha, yprokule |
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: A parsing bug in reading machine network CIDR.
Consequence: The bootstrap rendering logic fails to detect a usable machine network CIDR when using IPv6 dual stack mode unless the IPv4 CIDR is the first element in the install config's machine network CIDR array.
Fix: Fix the parsing logic to loop through all machine network CIDRs.
Result: The IPv4 address is correctly located amongst the machine network CIDRs in dual stack mode.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:43:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Karim Boumedhel
2020-12-15 12:23:06 UTC
> Dec 14 16:10:38 qct8-bootstrap bootkube.sh[2590]: Rendering CEO Manifests... > Dec 14 16:10:40 qct8-bootstrap bootkube.sh[2590]: F1214 16:10:40.104291 1 render.go:66] machineNetwork is not found in install-config This is a bug in cluster-etcd-operator. It is trying to find the IPv4 value in machineNetwork, but it accidentally only looks at the first element of machineNetwork rather than the entire list. (https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/cmd/render/render.go#L450). If you swapped the two elements around to be machineNetwork: - cidr: 10.0.0.0/16 - cidr: 2620:52:0:1302::/64 then it would work, though of course that would change your cluster in other ways by making everything IPv4-primary rather than IPv6-primary. In theory cluster-etcd-operator should just use the first value in machineNetwork, rather than trying to find the IPv4 value when the cluster is dual-stack. That is, rather than implementing "etcd listens on IPv6-only when the cluster is single-stack IPv6, and IPv4-only when the cluster is single-stack IPv4 or dual-stack", it should instead implement "etcd listens on the IP family of whatever the first element of machineNetwork is". Failing that, the machineNetwork-parsing code in render.go needs to be fixed. I tried doing that but parsing YAML by hand like that is just gross and my first few attempts got it wrong (so I guess I can't blame the current code for having gotten it wrong too...) Thanks for the research, Dan. I believe the parsing fix would be to change this line: https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/cmd/render/render.go#L451 machineCIDR := fmt.Sprintf("%v", network) to machineCIDR := fmt.Sprintf("%v", network["cidr"]) Which I think we agree would technically work but would in this case result in the bootstrap etcd member binding to the IPv4 addr even though IPv6 is probably more consistent with the rest of the setup. Picking the family based on the first CIDR in the machine network list sounds like it could produce the more consistent effect of the etcd bootstrap member binding to IPv6, but would be a more significant behavioral change. Either way I think this code needs some test coverage so I'm not opposed to either way and would defer to you (or anybody else) with a strong opinion on the matter. No, the line before that one is buggy too. for _, network := range networking["machineNetwork"].([]interface{})[0].(map[string]interface{}) { machineCIDR := fmt.Sprintf("%v", network) networking["machineNetwork"] is an array of objects. The code ought to be looping over each object in the array, and checking the value of its "cidr" property, but instead it's looping over each key/value pair of only the first object in the array, but ignoring the keys and assuming all the values are CIDRs. ie, if the install config looked like: machineNetwork: - cidr: 2620:52:0:1302::/64 totallyFakeCIDR: 99.99.0.0/16 - cidr: 10.0.0.0/16 then it would return "99.99.0.0/16". What it should be doing is something like for network := range networking["machineNetwork"].([]interface{}) { networkMap := network.(map[string]interface{}) machineCIDR := networkMap["cidr"].(string) I think? I tried a few things before and they kept not working... I have contacted to edge team to help setup dual stack env for we have not env to simulate this cluster env, and there are trying it, thanks cc: @yprokule Hi Dan, according to comment 7, it seems there is another issue appears, could u help to investigate whether the new issue is original issue of this bug? if yes, we may change back the bug status, if no, perhaps we may verify this bug and file a new bug to trace the issue. thanks according to comment7&9, QE have not hit this issue, but hit another issue, and filed a new bug to trace it, so close this bug and trace dual stack installer issue with new bug. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |