Bug 1991282
| Summary: | After CNI config, Windows node IP is replaced by hybrid overlay IP on UPI cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | gaoshang <sgao> |
| Component: | Windows Containers | Assignee: | Sebastian Soto <ssoto> |
| Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> |
| Severity: | high | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.8 | CC: | aos-bugs, chernand, mankulka, mohashai, ssoto, team-winc |
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-28 17:41:17 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2000351 | ||
|
Description
gaoshang
2021-08-08 21:40:34 UTC
I've created an upstream issue for work that needs to be done in the kubelet: https://github.com/kubernetes/kubernetes/issues/104269 This issue does not affect all UPI clusters, this will only be present in clusters with platform none. It is very possible to have UPI cluster with a platform such as VSphere. I was able to add a BYOH node to a VSphere UPI cluster with no issue. When the cloud provider is set to none, the kubelet is picking the first DNS entry that meets the node ip criteria. Heres what an example DNS lookup from a VM looks like: ``` PS C:\Users\Administrator> Resolve-DnsName -Name winhost Name Type TTL Section IPAddress ---- ---- --- ------- --------- winhost AAAA 1200 Question fe80::4d3a:3fc1:320a:6b winhost AAAA 1200 Question fe80::51b3:e88:9465:abfd winhost AAAA 1200 Question fe80::c825:26be:4a2:308f winhost A 1200 Question 10.132.0.153 winhost A 1200 Question 172.31.251.232 winhost A 1200 Question 172.29.144.1 ``` The IP of the VM is 172.31.251.232, and that is the IP that the kubelet should set nodeIP to. The IP 10.132.0.153 is the IP given to the hybrid overlay HNS endpoint. When kubelet goes to pick the IP, it chooses the hybrid overlay HNS endpoint IP as it is the first ipv4 result. This is happening in the code here: https://github.com/openshift/kubernetes/blob/9b1230e88478e693f3a3a9a19fdecd3ec524788b/pkg/kubelet/nodestatus/setters.go#L224-L236 A possible solution to this is the removal of the hybrid overlay IP from the DNS entry. The removal of the hybrid overlay IP from the DNS entry will make things better, but it doesn't completely solve this issue. If another network interface is added, or the ordering of DNS entries changes for whatever reason, the node's IP is likely to be changed. I think that the only way that this can be truly fixed is prescribing the node's IP via the `node-ip` flag. There's a lot to take into account with this, so whether this should be done right now still needs to be seen. I have also ran into this running on vSphere using `platform: none` Marking the bug VERIFIED for the release-4.8 PR to merge, will move back to ON_QA. This bug has been verified on OCP 4.9.0-0.nightly-2021-09-05-204238 and passed, thanks. On baremetal cluster with `platform: none`, BYOH Windows node bootstrapped with correct IP address. # oc get nodes -owide -l kubernetes.io/os=windows NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME sgao-win1 Ready worker 17m v1.21.1-1398+98073871f173ba 10.0.55.187 <none> Windows Server 2019 Datacenter 10.0.17763.2061 docker://20.10.6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.0 product release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3702 |