+++ This bug was initially created as a clone of Bug #2001547 +++ Description of problem: After configured a BYOH Windows instance with DNS name in config map, it got deconfigured immediately, this happens on UPI cluster on baremetal, BYOH Windows instance configured with IP address have not this issue. Following is the DNS name and config map used: PS C:\Users\Administrator> nslookup 10.0.55.187 Server: ip-10-0-0-2.us-east-2.compute.internal Address: 10.0.0.2 Name: ip-10-0-55-187.us-east-2.compute.internal Address: 10.0.55.187 # cat configmap_byoh.yaml kind: ConfigMap apiVersion: v1 metadata: name: windows-instances namespace: openshift-windows-machine-config-operator data: ip-10-0-55-187.us-east-2.compute.internal: |- username=Administrator # oc get node -l kubernetes.io/os=windows NAME STATUS ROLES AGE VERSION sgao-win1 Ready,SchedulingDisabled worker 8m15s v1.22.1-1660+bbcc9aea9e4bef # oc describe node sgao-win1 Name: sgao-win1 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=windows kubernetes.io/arch=amd64 kubernetes.io/hostname=sgao-win1 kubernetes.io/os=windows node-role.kubernetes.io/worker= node.kubernetes.io/windows-build=10.0.17763 node.openshift.io/os_id=Windows windowsmachineconfig.openshift.io/byoh=true Annotations: k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-C9-F5-8A k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.10.0/24 volumes.kubernetes.io/controller-managed-attach-detach: true windowsmachineconfig.openshift.io/pub-key-hash: 1df2c166b1c401180523270e9cf6bc2cd2724b9279ea65668a3b95298525a0f5 windowsmachineconfig.openshift.io/username: -----BEGIN ENCRYPTED DATA-----<wmcoMarker><wmcoMarker>wx4EBwMIGyHM95CxsERgtbij7q4k3mYrEsaFVNoTO8jS5gF07WsxBH7z0Xp/aegs<wmcoMarker>VVx3CEY4... windowsmachineconfig.openshift.io/version: 3.1.0+d5fd8c8 CreationTimestamp: Mon, 06 Sep 2021 06:36:54 -0400 Taints: node.kubernetes.io/unschedulable:NoSchedule os=Windows:NoSchedule Unschedulable: true Lease: HolderIdentity: sgao-win1 AcquireTime: <unset> RenewTime: Mon, 06 Sep 2021 06:44:08 -0400 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Mon, 06 Sep 2021 06:41:56 -0400 Mon, 06 Sep 2021 06:36:54 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Mon, 06 Sep 2021 06:41:56 -0400 Mon, 06 Sep 2021 06:36:54 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Mon, 06 Sep 2021 06:41:56 -0400 Mon, 06 Sep 2021 06:36:54 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Mon, 06 Sep 2021 06:41:56 -0400 Mon, 06 Sep 2021 06:37:04 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.0.55.187 Hostname: sgao-win1 Capacity: cpu: 2 ephemeral-storage: 31455228Ki memory: 8125980Ki pods: 250 Allocatable: cpu: 1500m ephemeral-storage: 27915396253 memory: 6975004Ki pods: 250 System Info: Machine ID: sgao-win1 System UUID: EC277FBA-77CD-9B78-69E1-578CDC479EFA Boot ID: Kernel Version: 10.0.17763.2061 OS Image: Windows Server 2019 Datacenter Operating System: windows Architecture: amd64 Container Runtime Version: docker://20.10.6 Kubelet Version: v1.22.1-1660+bbcc9aea9e4bef Kube-Proxy Version: v1.22.1-1660+bbcc9aea9e4bef Non-terminated Pods: (0 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 0 (0%) 0 (0%) memory 0 (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeNotSchedulable 164m (x2 over 5h18m) kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 139m kubelet Starting kubelet. Normal NodeHasSufficientMemory 139m (x2 over 139m) kubelet Node sgao-win1 status is now: NodeHasSufficientPID Warning CheckLimitsForResolvConf 139m kubelet open c:\k\etc\resolv.conf: The system cannot find the file specified. Normal NodeReady 139m kubelet Node sgao-win1 status is now: NodeReady Normal NodeNotSchedulable 139m kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 136m kubelet Starting kubelet. Normal NodeHasSufficientMemory 136m kubelet Node sgao-win1 status is now: NodeHasSufficientMemory Normal NodeSchedulable 135m kubelet Node sgao-win1 status is now: NodeSchedulable Normal NodeNotSchedulable 135m (x2 over 136m) kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 134m kubelet Starting kubelet. Warning CheckLimitsForResolvConf 134m kubelet open c:\k\etc\resolv.conf: The system cannot find the file specified. Normal NodeHasSufficientMemory 134m kubelet Node sgao-win1 status is now: NodeHasSufficientMemory Normal NodeNotSchedulable 134m kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 131m kubelet Starting kubelet. Normal NodeHasSufficientMemory 131m kubelet Node sgao-win1 status is now: NodeHasSufficientMemory Normal NodeHasSufficientPID 131m kubelet Node sgao-win1 status is now: NodeHasSufficientPID Normal NodeSchedulable 130m kubelet Node sgao-win1 status is now: NodeSchedulable Normal NodeNotSchedulable 130m (x2 over 131m) kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 124m kubelet Starting kubelet. Warning CheckLimitsForResolvConf 124m kubelet open c:\k\etc\resolv.conf: The system cannot find the file specified. Normal NodeHasSufficientMemory 124m (x2 over 124m) kubelet Node sgao-win1 status is now: NodeHasSufficientPID Normal NodeReady 123m kubelet Node sgao-win1 status is now: NodeReady Normal NodeNotSchedulable 123m kubelet Node sgao-win1 status is now: NodeNotSchedulable Normal Starting 121m kubelet Starting kubelet. Version-Release number of selected component (if applicable): OCP version: 4.9.0-0.nightly-2021-09-05-204238 WMCO mater branch commit: d5fd8c8d9b7ed21f4dc5eac1f410e893c305e840 How reproducible: Always Steps to Reproduce: 1, Install OCP 4.9 UPI on baremetal 2, Build WMCO locally with latest commit in master branch and install it on cluster 4, Manually install a BYOH Windows instance, configure it with DNS name in configmap Actual results: BYOH Windows instance got deconfigured imediately when it's Ready and stuck in "Configure" - "Deconfigure" cycle Expected results: BYOH Windows instance should be configured as node and in "Ready" status Additional info: Looks like this happens due to DNS name does not exist in node.Status.Addresses, see https://github.com/openshift/windows-machine-config-operator/blob/d5fd8c8d9b7ed21f4dc5eac1f410e893c305e840/controllers/configmap_controller.go#L215 # oc describe node sgao-win1 ... Addresses: InternalIP: 10.0.55.187 Hostname: sgao-win1 Capacity: --- Additional comment from mohashai on 2021-09-08 15:45:18 UTC --- @sgao We are currently unable to reproduce this bug as we do not have access to a baremetal UPI setup (or any platform=none environment). Would it be possible for you to give me access to a QE cluster where this bug was seen? --- Additional comment from sgao on 2021-09-09 11:58:19 UTC --- @mohashai Sure, I'll prepare a cluster later today and DM to you, thanks. --- Additional comment from mohashai on 2021-09-13 20:27:05 UTC --- Current status: - QE has seen this across multiple platforms - I have a fix up, PR is reviewable - successfully tested locally on 4.9 vSphere (and other platforms) --- Additional comment from mohashai on 2021-09-15 16:28:04 UTC --- Fix is approved and ready to merge, but a change/deprecation in how to reference images to test is holding up our CI.
This bug has been verified on OCP 4.8.0-0.nightly-2021-09-17-214908, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Windows Container Support for Red Hat OpenShift 3.1.0 product release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3215