Bug 2008992

Summary: WMCO ignores delete events for machines with invalid IP addresses
Product: OpenShift Container Platform Reporter: jvaldes
Component: Windows ContainersAssignee: jvaldes
Status: CLOSED ERRATA QA Contact: Ronnie Rasouli <rrasouli>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.10CC: anusaxen, aos-bugs, rrasouli, sgao, team-winc
Target Milestone: ---   
Target Release: 4.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Cause: windows-exporter metrics endpoint object contains a reference to deleted machine Consequence: WMCO ignores delete events for machines with invalid IP addresses Fix: Remove the validation of the machine object from the event filtering Result: windows-exporter metrics endpoint object is correctly updated even when the machine is still in Deleting phase.
Story Points: ---
Clone Of: 2008601
: 2008994 (view as bug list) Environment:
Last Closed: 2021-12-13 12:46:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2008601    
Bug Blocks: 2008994    

Description jvaldes 2021-09-29 16:07:57 UTC
+++ This bug was initially created as a clone of Bug #2008601 +++

This bug was initially created as a light copy of Bug #1991739

I am copying this bug because: 
The resulting symptom is still showing up, with less frequency since the root cause is slight different, but still able to replicate on vSphere with a 4.7 cluster running WMCO 2.0.3.

Description of problem:
WMCO ignores the `Deleting` phase notification event for Windows machines without or invalid IPv4 address.

Version-Release number of selected component (if applicable):
WMCO 2.0.3 running on cluster with version 4.7.24 

How reproducible:
Sometimes, depends on platform performance while removing a virtual machine

Steps to Reproduce:
1. WMCO configured and running
2. Create a valid machineSet with 1 replicas
3. Observe the node information in the `windows-exporter` metrics endpoint object.
    Note the IP Addresses, for example: 172.31.251.250
4. Delete the machineSet
5. Wait for the Windows machine to disappear
6. Check one more time the `windows-exporter` metrics endpoint object, if there is still an entry in `Subsets` mapped to an IP address of a deleted machine, you have reproduced the bug. Metrics are no longer available for a deleted machine

Actual results:
WMCO with DEBUG logging enabled shows:
```
DEBUG   controller.windowsmachine   invalid Machine {
	"name": "winworker-rh5cr",
	 "error": "no internal IP address associated",
	 "errorVerbose": "no internal IP address associated, ...”
	...
}

```

The `windows-exporter` metrics endpoint object contains Subsets with an IP address of a deleted machine
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
      Addresses:          172.31.251.250
      NotReadyAddresses:  <none>
      Ports:
        Name     Port  Protocol
        ----     ----  --------
        metrics  9182  TCP

    Events:  <none>
```


Expected results:

WMCO with DEBUG logging enabled shows:
```
DEBUG controller.windowsmachine   machine not provisioned {
 	"windowsmachine": "openshift-machine-api/winworker-vdmnd",
	"phase": "Deleting"
}

INFO	metrics	Prometheus configured	{
 	 "endpoints": "windows-exporter",
	 "port": 9182,
	 "name": "metrics"
}

```

The IP Address of the deleted machine does not appears in the `windows-exporter` metrics endpoint object.
With replicas set to 1, the Subsets must have no entries, empty.
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
    Events:  <none>

Comment 1 jvaldes 2021-09-30 17:56:03 UTC
Marking as VERIFIED to allow the release-4.8 PRs to merge. Will update it to ON_QA once that PR merges.

Comment 3 jvaldes 2021-10-05 14:38:55 UTC
Marking as POST since PR#716 is on hold.

Comment 5 Ronnie Rasouli 2021-11-22 09:36:51 UTC
Verified on 4.0.0+a18a1d26

Comment 7 errata-xmlrpc 2021-12-13 12:46:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.1 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4757