Bug 2008994 - WMCO ignores delete events for machines with invalid IP addresses
Summary: WMCO ignores delete events for machines with invalid IP addresses
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.8.z
Assignee: jvaldes
QA Contact: gaoshang
URL:
Whiteboard:
Depends On: 2008992
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-29 16:19 UTC by jvaldes
Modified: 2021-12-08 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: windows-exporter metrics endpoint object contains a reference to deleted machine Consequence: WMCO ignores delete events for machines with invalid IP addresses Fix: Remove the validation of the machine object from the event filtering Result: windows-exporter metrics endpoint object is correctly updated even when the machine is still in Deleting phase.
Clone Of: 2008992
Environment:
Last Closed: 2021-12-08 22:07:43 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift windows-machine-config-operator pull 717 0 None open [release-4.8] Bug 2008994: Fix delete event subscription 2021-09-30 17:56:44 UTC
Red Hat Product Errata RHBA-2021:4710 0 None None None 2021-12-08 22:07:49 UTC

Description jvaldes 2021-09-29 16:19:52 UTC
+++ This bug was initially created as a clone of Bug #2008992 +++

This bug was initially created as a light copy of Bug #1991739

I am copying this bug because: 
The resulting symptom is still showing up, with less frequency since the root cause is slight different, but still able to replicate on vSphere with a 4.7 cluster running WMCO 2.0.3.

Description of problem:
WMCO ignores the `Deleting` phase notification event for Windows machines without or invalid IPv4 address.

Version-Release number of selected component (if applicable):
WMCO 2.0.3 running on cluster with version 4.7.24 

How reproducible:
Sometimes, depends on platform performance while removing a virtual machine

Steps to Reproduce:
1. WMCO configured and running
2. Create a valid machineSet with 1 replicas
3. Observe the node information in the `windows-exporter` metrics endpoint object.
    Note the IP Addresses, for example: 172.31.251.250
4. Delete the machineSet
5. Wait for the Windows machine to disappear
6. Check one more time the `windows-exporter` metrics endpoint object, if there is still an entry in `Subsets` mapped to an IP address of a deleted machine, you have reproduced the bug. Metrics are no longer available for a deleted machine

Actual results:
WMCO with DEBUG logging enabled shows:
```
DEBUG   controller.windowsmachine   invalid Machine {
	"name": "winworker-rh5cr",
	 "error": "no internal IP address associated",
	 "errorVerbose": "no internal IP address associated, ...”
	...
}

```

The `windows-exporter` metrics endpoint object contains Subsets with an IP address of a deleted machine
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
      Addresses:          172.31.251.250
      NotReadyAddresses:  <none>
      Ports:
        Name     Port  Protocol
        ----     ----  --------
        metrics  9182  TCP

    Events:  <none>
```


Expected results:

WMCO with DEBUG logging enabled shows:
```
DEBUG controller.windowsmachine   machine not provisioned {
 	"windowsmachine": "openshift-machine-api/winworker-vdmnd",
	"phase": "Deleting"
}

INFO	metrics	Prometheus configured	{
 	 "endpoints": "windows-exporter",
	 "port": 9182,
	 "name": "metrics"
}

```

The IP Address of the deleted machine does not appears in the `windows-exporter` metrics endpoint object.
With replicas set to 1, the Subsets must have no entries, empty.
```
$ oc describe endpoints -n openshift-windows-machine-config-operator
    Name:         windows-exporter
    Namespace:    openshift-windows-machine-config-operator
    Labels:       name=windows-exporter
    Annotations:  <none>
    Subsets:
    Events:  <none>

Comment 1 Ronnie Rasouli 2021-11-03 12:39:36 UTC
verified on 3.1.0+06e96071

Comment 4 errata-xmlrpc 2021-12-08 22:07:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Windows Container Support for Red Hat OpenShift 3.1.1 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4710


Note You need to log in before you can comment on or make changes to this bug.