Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1991282

Summary: After CNI config, Windows node IP is replaced by hybrid overlay IP on UPI cluster
Product: OpenShift Container Platform Reporter: gaoshang <sgao>
Component: Windows ContainersAssignee: Sebastian Soto <ssoto>
Status: CLOSED ERRATA QA Contact: gaoshang <sgao>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.8CC: aos-bugs, chernand, mankulka, mohashai, ssoto, team-winc
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-28 17:41:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2000351    

Description gaoshang 2021-08-08 21:40:34 UTC
Description of problem:
On UPI cluster, after running wmcb initialize-kubelet, Windows worker is using node IP, see 10.0.75.176 in [1]. After WMCO configured OVNKubernetesHybridOverlayNetwork, an overlay network IP is added to Windows, see 10.132.0.51 in [3]. Then after running wmcb.exe configure-cni, found that Windows node IP is replaced by hybrid overlay IP, see [2]. This will cause Windows node left in SchedulingDisabled status and keep reconciling on UPI cluster.

[1]
# oc get node -owide
NAME             STATUS                     ROLES    AGE     VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
...
sgao-winworker   Ready,SchedulingDisabled   worker   3m14s   v1.21.1-1397+a678cfd2c37e87   10.0.75.176   <none>        Windows Server 2019 Datacenter                                 10.0.17763.2061

[2]
# oc get node -owide
NAME             STATUS                     ROLES    AGE     VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
...
sgao-winworker   Ready,SchedulingDisabled   worker   3m26s   v1.21.1-1397+a678cfd2c37e87   10.132.0.51   <none>        Windows Server 2019 Datacenter                                 10.0.17763.2061                docker://20.10.6

[3]
PS C:\Users\Administrator> ipconfig

Windows IP Configuration


Ethernet adapter vEthernet (Ethernet 2):

   Connection-specific DNS Suffix  . : us-east-2.compute.internal
   Link-local IPv6 Address . . . . . : fe80::1932:9ff0:36d3:8b02%15
   IPv4 Address. . . . . . . . . . . : 10.0.75.176
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . : 10.0.64.1

Ethernet adapter vEthernet (VIPEndpoint):

   Connection-specific DNS Suffix  . : us-east-2.compute.internal
   Link-local IPv6 Address . . . . . : fe80::819b:4b41:708:cb05%31
   IPv4 Address. . . . . . . . . . . : 10.132.0.51
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . :

Ethernet adapter vEthernet (nat):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::5cb9:4b8e:63ec:d3c%10
   IPv4 Address. . . . . . . . . . . : 192.168.192.1
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :

Ethernet adapter vEthernet (nat):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::19c2:2df3:8584:173%10
   IPv4 Address. . . . . . . . . . . : 172.19.16.1
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :


Version-Release number of selected component (if applicable):
OCP version: 4.8.0-0.nightly-2021-08-05-031749
WMCO master commit: ccae1dd992a0f34702df23c76f3659f796ec64e0

How reproducible:
Always

Steps to Reproduce:
1. Install UPI cluster on baremetal
2. Create Windows machine manually, change hostname to lowercase, install openssh
3. Add Windows IP to windows-instances configmap
4. Wait and check WMCO bootstrapping Windows machine

Actual results:
Windows node IP is replaced by hybrid overlay IP

Expected results:
Windows node IP should not be replaced by hybrid overlay IP

Additional info:

Comment 3 Sebastian Soto 2021-08-11 18:05:56 UTC
I've created an upstream issue for work that needs to be done in the kubelet: https://github.com/kubernetes/kubernetes/issues/104269

This issue does not affect all UPI clusters, this will only be present in clusters with platform none.
It is very possible to have UPI cluster with a platform such as VSphere. I was able to add a BYOH node to a VSphere UPI cluster with no issue.

Comment 4 Sebastian Soto 2021-08-11 20:08:17 UTC
When the cloud provider is set to none, the kubelet is picking the first DNS entry that meets the node ip criteria.

Heres what an example DNS lookup from a VM looks like:
```
PS C:\Users\Administrator> Resolve-DnsName -Name winhost

Name                                           Type   TTL   Section    IPAddress
----                                           ----   ---   -------    ---------
winhost                                        AAAA   1200  Question   fe80::4d3a:3fc1:320a:6b
winhost                                        AAAA   1200  Question   fe80::51b3:e88:9465:abfd
winhost                                        AAAA   1200  Question   fe80::c825:26be:4a2:308f
winhost                                        A      1200  Question   10.132.0.153
winhost                                        A      1200  Question   172.31.251.232
winhost                                        A      1200  Question   172.29.144.1
```

The IP of the VM is 172.31.251.232, and that is the IP that the kubelet should set nodeIP to.
The IP 10.132.0.153 is the IP given to the hybrid overlay HNS endpoint.

When kubelet goes to pick the IP, it chooses the hybrid overlay HNS endpoint IP as it is the first ipv4 result.
This is happening in the code here: https://github.com/openshift/kubernetes/blob/9b1230e88478e693f3a3a9a19fdecd3ec524788b/pkg/kubelet/nodestatus/setters.go#L224-L236

A possible solution to this is the removal of the hybrid overlay IP from the DNS entry.

Comment 5 Sebastian Soto 2021-08-12 20:56:51 UTC
The removal of the hybrid overlay IP from the DNS entry will make things better, but it doesn't completely solve this issue.
If another network interface is added, or the ordering of DNS entries changes for whatever reason, the node's IP is likely to be changed.

I think that the only way that this can be truly fixed is prescribing the node's IP via the `node-ip` flag.
There's a lot to take into account with this, so whether this should be done right now still needs to be seen.

Comment 6 Christian Hernandez 2021-08-17 19:00:52 UTC
I have also ran into this running on vSphere using `platform: none`

Comment 7 Mansi Kulkarni 2021-09-02 20:21:07 UTC
Marking the bug VERIFIED for the release-4.8 PR to merge, will move back to ON_QA.

Comment 8 gaoshang 2021-09-06 04:24:53 UTC
This bug has been verified on OCP 4.9.0-0.nightly-2021-09-05-204238 and passed, thanks.

On baremetal cluster with `platform: none`, BYOH Windows node bootstrapped with correct IP address.


# oc get nodes -owide -l kubernetes.io/os=windows
NAME        STATUS   ROLES    AGE   VERSION                       INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION    CONTAINER-RUNTIME
sgao-win1   Ready    worker   17m   v1.21.1-1398+98073871f173ba   10.0.55.187   <none>        Windows Server 2019 Datacenter   10.0.17763.2061   docker://20.10.6

Comment 13 errata-xmlrpc 2021-10-28 17:41:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.0 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3702