Must gather logs: 1. Issue: oc get endpoints -n openshift-windows-machine-config-operator command prints ENDPOINTS <none> in WMCO 6.0.0 for vSphere cloud provider: [cloud-user@preserve-jfrancoa openshift-tests-private]$ oc get endpoints -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter <none> 22m When checking the windows-exporter service we can confirm that the endpoints are really missing: [cloud-user@preserve-jfrancoa 119919]$ oc describe service/windows-exporter -n openshift-windows-machine-config-operator Name: windows-exporter Namespace: openshift-windows-machine-config-operator Labels: name=windows-exporter operators.coreos.com/windows-machine-config-operator.openshift-windows-machine-confi= Annotations: <none> Selector: <none> Type: ClusterIP IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.72.109 IPs: 172.30.72.109 Port: metrics 9182/TCP TargetPort: 9182/TCP Endpoints: <none> Session Affinity: None Events: <none> 2. WMCO & OpenShift Version [cloud-user@preserve-jfrancoa 119919]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-07-11-080250 True False 8h Cluster version is 4.11.0-0.nightly-2022-07-11-080250 [cloud-user@preserve-jfrancoa 119919]$ oc get csv -n openshift-windows-machine-config-operator NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.v5.5.0 OpenShift Elasticsearch Operator 5.5.0 Succeeded windows-machine-config-operator.v6.0.0 Windows Machine Config Operator 6.0.0 Succeeded [cloud-user@preserve-jfrancoa 119919]$ oc get cm -n openshift-windows-machine-config-operator NAME DATA AGE kube-root-ca.crt 1 7h54m openshift-service-ca.crt 1 7h54m windows-machine-config-operator-lock 0 44m windows-services-6.0.0-9a1eca1 2 7h53m 3. Platform - VSphere 4. If the platform is vSphere, what is the VMware tools version? 5. Is it a new test case or an old test case? Old test case if it is the old test case, is it regression or first-time tested? It is a regression Is it platform-specific or consistent across all platforms? So far it has occurred only in vSphere, for Azure the endpoints are both present 6. Steps to Reproduce 1. Deploy a 4.11 OCP cluster in vSphere 2. Install WMCO 6.0.0 and create some machinesets 3. Restart the WMCO container by deleting the wmco pod: oc pod delete <wmco-pod-id> -n openshift-windows-machine-config-operator 3. Run: oc get endpoints -n openshift-windows-machine-config-operator 4. Check the endpoints field 7. Actual Result and Expected Result Actual: windows-exporter endpoints shows <none> Expected: windows-exporter endpoints displays the two IPs corresponding to the endpoints 8. A possible workaround has been tried? Is there a way to recover from the issue being tried out? Scaling down and scaling up the machineset made the windows-exporter endpoints appearing. Even though the Windows workers were up and running and the workloads could successfuly run, WMCO was not able to update the windows-exporter endpoints. Once the scale down and happens, it was observed the following log in wmco logs: 1.6576336929887342e+09 INFO metrics Prometheus configured {"endpoints": "windows-exporter", "port": 9182, "name": "metrics"} 9. Logs Must-gather-windows-node-logs(https://github.com/openshift/must-gather/blob/master/collection-scripts/gather_windows_node_logs#L24) oc get network.operator cluster -o yaml oc logs -f deployment/windows-machine-config-operator -n openshift-windows-machine-config-operator Windows MachineSet yaml or windows-instances ConfigMap oc get machineset <windows_machineSet_name> -n openshift-machine-api -o yaml oc get configmaps <windows_configmap_name> -n <namespace_name> -o yaml Optional logs: Anything that can be useful to debug the issue.
WMCO VERSION ============== [jfrancoa@localhost wmco]$ oc get cm -n openshift-windows-machine-config-operator NAME DATA AGE kube-root-ca.crt 1 21m openshift-service-ca.crt 1 21m windows-machine-config-operator-lock 0 20m windows-services-6.0.0-07ebdd7 2 20m [jfrancoa@localhost wmco]$ oc get csv -n openshift-windows-machine-config-operator NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.v5.5.0 OpenShift Elasticsearch Operator 5.5.0 Succeeded windows-machine-config-operator.v6.0.0 Windows Machine Config Operator 6.0.0 Succeeded VALIDATION ============ [jfrancoa@localhost wmco]$ oc get endpoints -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 172.31.249.42:9182,172.31.249.201:9182 21m [jfrancoa@localhost wmco]$ oc get pods -n openshift-windows-machine-config-operator NAME READY STATUS RESTARTS AGE windows-machine-config-operator-554d8d85f4-4pqtj 1/1 Running 0 21m [jfrancoa@localhost wmco]$ oc delete pods windows-machine-config-operator-554d8d85f4-4pqtj -n openshift-windows-machine-config-operator pod "windows-machine-config-operator-554d8d85f4-4pqtj" deleted [jfrancoa@localhost wmco]$ oc get endpoints -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 172.31.249.42:9182,172.31.249.201:9182 4s
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift support for Windows Containers 7.0.0 [security update]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:9096