Bug 2008994

Summary: WMCO ignores delete events for machines with invalid IP addresses
Product: OpenShift Container Platform Reporter: jvaldes
Component: Windows ContainersAssignee: jvaldes
Status: CLOSED ERRATA QA Contact: gaoshang <sgao>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: aos-bugs, rrasouli, sgao, team-winc
Target Milestone: ---   
Target Release: 4.8.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: windows-exporter metrics endpoint object contains a reference to deleted machine Consequence: WMCO ignores delete events for machines with invalid IP addresses Fix: Remove the validation of the machine object from the event filtering Result: windows-exporter metrics endpoint object is correctly updated even when the machine is still in Deleting phase.
Story Points: ---
Clone Of: 2008992 Environment:
Last Closed: 2021-12-08 22:07:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2008992    
Bug Blocks:    

Description jvaldes 2021-09-29 16:19:52 UTC
+++ This bug was initially created as a clone of Bug #2008992 +++

This bug was initially created as a light copy of Bug #1991739

I am copying this bug because: 
The resulting symptom is still showing up, with less frequency since the root cause is slight different, but still able to replicate on vSphere with a 4.7 cluster running WMCO 2.0.3.

Description of problem:
WMCO ignores the `Deleting` phase notification event for Windows machines without or invalid IPv4 address.

Version-Release number of selected component (if applicable):
WMCO 2.0.3 running on cluster with version 4.7.24 

How reproducible:
Sometimes, depends on platform performance while removing a virtual machine

Steps to Reproduce:
1. WMCO configured and running
2. Create a valid machineSet with 1 replicas
3. Observe the node information in the `windows-exporter` metrics endpoint object.
    Note the IP Addresses, for example: 172.31.251.250
4. Delete the machineSet
5. Wait for the Windows machine to disappear
6. Check one more time the `windows-exporter` metrics endpoint object, if there is still an entry in `Subsets` mapped to an IP address of a deleted machine, you have reproduced the bug. Metrics are no longer available for a deleted machine

Actual results:
WMCO with DEBUG logging enabled shows:
```
DEBUG   controller.windowsmachine   invalid Machine {
	"name": "winworker-rh5cr",
	 "error": "no internal IP address associated",
	 "errorVerbose": "no internal IP address associated, ...”
	...
}

```

The `windows-exporter` metrics endpoint object contains Subsets with an IP address of a deleted machine
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
      Addresses:          172.31.251.250
      NotReadyAddresses:  <none>
      Ports:
        Name     Port  Protocol
        ----     ----  --------
        metrics  9182  TCP

    Events:  <none>
```


Expected results:

WMCO with DEBUG logging enabled shows:
```
DEBUG controller.windowsmachine   machine not provisioned {
 	"windowsmachine": "openshift-machine-api/winworker-vdmnd",
	"phase": "Deleting"
}

INFO	metrics	Prometheus configured	{
 	 "endpoints": "windows-exporter",
	 "port": 9182,
	 "name": "metrics"
}

```

The IP Address of the deleted machine does not appears in the `windows-exporter` metrics endpoint object.
With replicas set to 1, the Subsets must have no entries, empty.
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
    Events:  <none>

Comment 1 Ronnie Rasouli 2021-11-03 12:39:36 UTC
verified on 3.1.0+06e96071

Comment 4 errata-xmlrpc 2021-12-08 22:07:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Windows Container Support for Red Hat OpenShift 3.1.1 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4710