This bug was initially created as a light copy of Bug #1991739 I am copying this bug because: The resulting symptom is still showing up, with less frequency since the root cause is slight different, but still able to replicate on vSphere with a 4.7 cluster running WMCO 2.0.3. Description of problem: WMCO ignores the `Deleting` phase notification event for Windows machines without or invalid IPv4 address. Version-Release number of selected component (if applicable): WMCO 2.0.3 running on cluster with version 4.7.24 How reproducible: Sometimes, depends on platform performance while removing a virtual machine Steps to Reproduce: 1. WMCO configured and running 2. Create a valid machineSet with 1 replicas 3. Observe the node information in the `windows-exporter` metrics endpoint object. Note the IP Addresses, for example: 172.31.251.250 4. Delete the machineSet 5. Wait for the Windows machine to disappear 6. Check one more time the `windows-exporter` metrics endpoint object, if there is still an entry in `Subsets` mapped to an IP address of a deleted machine, you have reproduced the bug. Metrics are no longer available for a deleted machine Actual results: WMCO with DEBUG logging enabled shows: ``` DEBUG controller.windowsmachine invalid Machine { "name": "winworker-rh5cr", "error": "no internal IP address associated", "errorVerbose": "no internal IP address associated, ...” ... } ``` The `windows-exporter` metrics endpoint object contains Subsets with an IP address of a deleted machine ``` $ oc describe endpoints -n openshift-windows-machine-config-operator Name: windows-exporter Namespace: openshift-windows-machine-config-operator Labels: name=windows-exporter Annotations: <none> Subsets: Addresses: 172.31.251.250 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- metrics 9182 TCP Events: <none> ``` Expected results: WMCO with DEBUG logging enabled shows: ``` DEBUG controller.windowsmachine machine not provisioned { "windowsmachine": "openshift-machine-api/winworker-vdmnd", "phase": "Deleting" } INFO metrics Prometheus configured { "endpoints": "windows-exporter", "port": 9182, "name": "metrics" } ``` The IP Address of the deleted machine does not appears in the `windows-exporter` metrics endpoint object. With replicas set to 1, the Subsets must have no entries, empty. ``` $ oc describe endpoints -n openshift-windows-machine-config-operator Name: windows-exporter Namespace: openshift-windows-machine-config-operator Labels: name=windows-exporter Annotations: <none> Subsets: Events: <none>
Marking as VERIFIED to allow the release-4.8/4.9 PRs to merge. Will update it to ON_QA once that PR merges.
The IP of the previous machine hasn't been deleted since config-map still retain the machine endpoint. oc describe configmaps windows-instances Name: windows-instances Namespace: openshift-windows-machine-config-operator Labels: <none> Annotations: <none> Data ==== 10.0.136.148: ---- username=Administrator Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning InstanceSetupFailure 11m (x10 over 160m) configmap error configuring host with address 10.0.136.148: failed to create new nodeconfig: error instantiating Windows instance from VM: unable to setup VM 10.0.136.148 sshConnectivity: error instantiating SSH client: unable to connect to Windows VM 10.0.136.148: timed out waiting for the condition c describe endpoints -n openshift-windows-machine-config-operator Name: windows-exporter Namespace: openshift-windows-machine-config-operator Labels: name=windows-exporter Annotations: <none> Subsets: Addresses: 10.0.131.216,10.0.136.148,10.0.158.108 NotReadyAddresses: <none> Ports: Name Port Protocol ---- ---- -------- metrics 9182 TCP
Created attachment 1829808 [details] WMCO log
Testing deletion of a non BYOH node is successful. Verified on {"version": "4.0.0+7cdce8b"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Windows Container Support for Red Hat OpenShift 5.0.0 [security update]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0577