Bug 2081825

Summary: [WMCO] BYOH instance fails to remove when deleting config-map in Azure
Product: OpenShift Container Platform Reporter: jvaldes
Component: Windows ContainersAssignee: jvaldes
Status: CLOSED ERRATA QA Contact: Jose Luis Franco <jfrancoa>
Severity: medium Docs Contact:
Priority: high    
Version: 4.10CC: aos-bugs, asakthiv, jfrancoa, jvaldes, team-winc-bot
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Node external IP addresses without reverse lookup records were used to associate the Windows intance. Consequence: Windows instance de-configuration failed, when Node external IP is present without PTR record Fix: Do not fail if PTR record is not present for the first node address, keep looking among all the node addresses until a reverse lookup record is found. Result: Windows instance de-configuration succeed when node external IP is present without PTR record.
Story Points: ---
Clone Of: 2070892 Environment:
Last Closed: 2022-06-13 07:07:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2070892    
Bug Blocks:    

Comment 2 Jose Luis Franco 2022-05-09 13:21:57 UTC
Built new wmco-image based on latest commit from release-4.10 -> https://github.com/openshift/windows-machine-config-operator/commit/d457afd8896c1da31e1cdcbb8ec7681cf56ace84

Created a byoh instance using Azure CLI with internal and external IP and added its IP to the config map:


[cloud-user@preserve-jfrancoa ~]$ oc create -f config-map.yml                                                                                            
configmap/windows-instances created  

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide
NAME                                            STATUS     ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP      OS-IMAGE
                                                   KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready      master   130m   v1.23.5+b463d71                    10.0.0.7      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready      master   129m   v1.23.5+b463d71                    10.0.0.8      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready      master   130m   v1.23.5+b463d71                    10.0.0.6      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.5    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.6    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.4    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready      worker   92m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>           Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-byoh                                    NotReady   <none>   9s     v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.9    20.237.250.192   Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready      worker   87m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>           Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9

Waited for the instance to be in Ready state and then delete the config-map:

[cloud-user@preserve-jfrancoa ~]$ oc delete -f config-map.yml


configmap "windows-instances" deleted

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide

After few seconds the BYOH node starts being deconfigured:

NAME                                            STATUS                        ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP
      OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready                         master   143m   v1.23.5+b463d71                    10.0.0.7      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready                         master   142m   v1.23.5+b463d71                    10.0.0.8      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready                         master   143m   v1.23.5+b463d71                    10.0.0.6      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.5    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.6    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.4    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready                         worker   106m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>      Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-byoh                                    NotReady,SchedulingDisabled   worker   13m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.9    20.237.250.192   Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready                         worker   100m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>      Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9

Getting finally removed after few minutest:

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide
NAME                                            STATUS   ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE
                                              KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready    master   146m   v1.23.5+b463d71                    10.0.0.7      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready    master   145m   v1.23.5+b463d71                    10.0.0.8      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready    master   146m   v1.23.5+b463d71                    10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.5    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.6    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.4    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready    worker   109m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>        Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready    worker   103m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>        Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9


Relevant wmco logs:

1.6520852211520228e+09  INFO    wc 10.0.128.9   removing directories
1.6520852212266784e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\ rmdir C:\\k\\ /s /q", "out": ""}
1.6520852212441494e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\Temp\\ rmdir C:\\Temp\\ /s /q", "out": ""}
1.6520852212592874e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\cni\\ rmdir C:\\k\\cni\\ /s /q", "out": ""}
1.6520852212737815e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\cni\\config\\ rmdir C:\\k\\cni\\config\\ /s /q", "out": ""}
1.6520852212917967e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\ rmdir C:\\var\\log\\ /s /q", "out": ""}
1.652085221309175e+09   DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\kube-proxy\\ rmdir C:\\var\\log\\kube-proxy\\ /s /q", "out": ""}
1.6520852213246267e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\hybrid-overlay\\ rmdir C:\\var\\log\\hybrid-overlay\\ /s /q", "out": ""}
1.6520852213246655e+09  INFO    wc 10.0.128.9   removing HNS networks
1.6520852376439564e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'BaseOVNKubernetesHybridOverlayNetwork'} | Remove-HnsNetwork;\"", "out": ""}
1.652085239098755e+09   DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'BaseOVNKubernetesHybridOverlayNetwork'}\"", "out": ""}
1.6520854092172823e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'OVNKubernetesHybridOverlayNetwork'}\"", "out": ""}
1.6520854092399719e+09  INFO    nc 10.0.128.9   instance has been deconfigured  {"node": "windows-byoh"}
1.6520854092501373e+09  DEBUG   events  Normal  {"object": {"kind":"ConfigMap","namespace":"openshift-windows-machine-config-operator","name":"windows-instances","apiVersion":"v1"}, "reason": "InstanceTeardown", "message": "Deconfigured node with addresses [{Hostname windows-byoh} {InternalIP 10.0.128.9} {ExternalIP 20.237.250.192}]"}

[cloud-user@preserve-jfrancoa ~]$ oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-05-07-205137   True        False         6h45m   Cluster version is 4.10.0-0.nightly-2022-05-07-205137

Comment 4 jvaldes 2022-06-09 17:54:28 UTC
Added doc type and text.

Comment 6 errata-xmlrpc 2022-06-13 07:07:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift support for Windows Containers 5.1.0 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4989