Bug 2081825 - [WMCO] BYOH instance fails to remove when deleting config-map in Azure
Summary: [WMCO] BYOH instance fails to remove when deleting config-map in Azure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: 4.10.z
Assignee: jvaldes
QA Contact: Jose Luis Franco
URL:
Whiteboard:
Depends On: 2070892
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-05-04 18:17 UTC by jvaldes
Modified: 2022-06-13 07:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Node external IP addresses without reverse lookup records were used to associate the Windows intance. Consequence: Windows instance de-configuration failed, when Node external IP is present without PTR record Fix: Do not fail if PTR record is not present for the first node address, keep looking among all the node addresses until a reverse lookup record is found. Result: Windows instance de-configuration succeed when node external IP is present without PTR record.
Clone Of: 2070892
Environment:
Last Closed: 2022-06-13 07:07:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift windows-machine-config-operator pull 1054 0 None open [release-4.10] Bug 2081825: Skip node addresses without PTR record during instance de-configuration 2022-05-06 11:54:07 UTC
Red Hat Product Errata RHBA-2022:4989 0 None None None 2022-06-13 07:07:50 UTC

Comment 2 Jose Luis Franco 2022-05-09 13:21:57 UTC
Built new wmco-image based on latest commit from release-4.10 -> https://github.com/openshift/windows-machine-config-operator/commit/d457afd8896c1da31e1cdcbb8ec7681cf56ace84

Created a byoh instance using Azure CLI with internal and external IP and added its IP to the config map:


[cloud-user@preserve-jfrancoa ~]$ oc create -f config-map.yml                                                                                            
configmap/windows-instances created  

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide
NAME                                            STATUS     ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP      OS-IMAGE
                                                   KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready      master   130m   v1.23.5+b463d71                    10.0.0.7      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready      master   129m   v1.23.5+b463d71                    10.0.0.8      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready      master   130m   v1.23.5+b463d71                    10.0.0.6      <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.5    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.6    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready      worker   118m   v1.23.5+b463d71                    10.0.128.4    <none>           Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready      worker   92m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>           Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-byoh                                    NotReady   <none>   9s     v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.9    20.237.250.192   Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready      worker   87m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>           Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9

Waited for the instance to be in Ready state and then delete the config-map:

[cloud-user@preserve-jfrancoa ~]$ oc delete -f config-map.yml


configmap "windows-instances" deleted

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide

After few seconds the BYOH node starts being deconfigured:

NAME                                            STATUS                        ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP
      OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready                         master   143m   v1.23.5+b463d71                    10.0.0.7      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready                         master   142m   v1.23.5+b463d71                    10.0.0.8      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready                         master   143m   v1.23.5+b463d71                    10.0.0.6      <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.5    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.6    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready                         worker   131m   v1.23.5+b463d71                    10.0.128.4    <none>      Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready                         worker   106m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>      Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-byoh                                    NotReady,SchedulingDisabled   worker   13m    v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.9    20.237.250.192   Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready                         worker   100m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>      Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9

Getting finally removed after few minutest:

[cloud-user@preserve-jfrancoa ~]$ oc get nodes -o wide
NAME                                            STATUS   ROLES    AGE    VERSION                            INTERNAL-IP   EXTERNAL-IP   OS-IMAGE
                                              KERNEL-VERSION                 CONTAINER-RUNTIME
jfrancoa-0905-azure-gklk7-master-0              Ready    master   146m   v1.23.5+b463d71                    10.0.0.7      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-1              Ready    master   145m   v1.23.5+b463d71                    10.0.0.8      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-master-2              Ready    master   146m   v1.23.5+b463d71                    10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-qr5tc   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.5    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-v6kk9   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.6    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
jfrancoa-0905-azure-gklk7-worker-westus-xf4db   Ready    worker   134m   v1.23.5+b463d71                    10.0.128.4    <none>        Red Hat Enterprise Linux CoreOS 410.84.202205031645-0 (Ootpa)   4.18.0-305.45.1.el8_4.x86_64   cri-o://1.23.2-9.rhaos4.10.git526d7a7.el8
windows-4zkkd                                   Ready    worker   109m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.7    <none>        Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9
windows-dlzpf                                   Ready    worker   103m   v1.23.5-rc.0.2060+1f952b3ba7b9ff   10.0.128.8    <none>        Windows Server 2019 Datacenter                                  10.0.17763.2803                docker://20.10.9


Relevant wmco logs:

1.6520852211520228e+09  INFO    wc 10.0.128.9   removing directories
1.6520852212266784e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\ rmdir C:\\k\\ /s /q", "out": ""}
1.6520852212441494e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\Temp\\ rmdir C:\\Temp\\ /s /q", "out": ""}
1.6520852212592874e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\cni\\ rmdir C:\\k\\cni\\ /s /q", "out": ""}
1.6520852212737815e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\k\\cni\\config\\ rmdir C:\\k\\cni\\config\\ /s /q", "out": ""}
1.6520852212917967e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\ rmdir C:\\var\\log\\ /s /q", "out": ""}
1.652085221309175e+09   DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\kube-proxy\\ rmdir C:\\var\\log\\kube-proxy\\ /s /q", "out": ""}
1.6520852213246267e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "if exist C:\\var\\log\\hybrid-overlay\\ rmdir C:\\var\\log\\hybrid-overlay\\ /s /q", "out": ""}
1.6520852213246655e+09  INFO    wc 10.0.128.9   removing HNS networks
1.6520852376439564e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'BaseOVNKubernetesHybridOverlayNetwork'} | Remove-HnsNetwork;\"", "out": ""}
1.652085239098755e+09   DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'BaseOVNKubernetesHybridOverlayNetwork'}\"", "out": ""}
1.6520854092172823e+09  DEBUG   wc 10.0.128.9   run     {"cmd": "powershell.exe -NonInteractive -ExecutionPolicy Bypass \"Get-HnsNetwork | where { $_.Name -eq 'OVNKubernetesHybridOverlayNetwork'}\"", "out": ""}
1.6520854092399719e+09  INFO    nc 10.0.128.9   instance has been deconfigured  {"node": "windows-byoh"}
1.6520854092501373e+09  DEBUG   events  Normal  {"object": {"kind":"ConfigMap","namespace":"openshift-windows-machine-config-operator","name":"windows-instances","apiVersion":"v1"}, "reason": "InstanceTeardown", "message": "Deconfigured node with addresses [{Hostname windows-byoh} {InternalIP 10.0.128.9} {ExternalIP 20.237.250.192}]"}

[cloud-user@preserve-jfrancoa ~]$ oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-05-07-205137   True        False         6h45m   Cluster version is 4.10.0-0.nightly-2022-05-07-205137

Comment 4 jvaldes 2022-06-09 17:54:28 UTC
Added doc type and text.

Comment 6 errata-xmlrpc 2022-06-13 07:07:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift support for Windows Containers 5.1.0 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:4989


Note You need to log in before you can comment on or make changes to this bug.