Bug 2015415
Summary: | WMCO pod recreation cause windows-exporter endpoint cleaned | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | gaoshang <sgao> |
Component: | Windows Containers | Assignee: | Mohammad Saif Shaikh <mohashai> |
Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> |
Severity: | medium | Docs Contact: | |
Priority: | urgent | ||
Version: | 4.9 | CC: | aos-bugs, mohashai, rrasouli |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-13 12:46:10 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
gaoshang
2021-10-19 07:17:44 UTC
I was not able to reproduce this issue. After deleting operator pod, the windows-exporter endpoints were quickly repopulated with all Windows IP:port. Environment specs: - OCP version: latest-4.9 - WMCO version: 4.0.0+7991f6f0 - Platform: Azure Results with Windows MachineSet having 2 replicas, both configured as node: $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 10.0.128.7:9182,10.0.128.8:9182 76m $ oc delete pod windows-machine-config-operator-5486449875-6lzzs -n openshift-windows-machine-config-operator pod "windows-machine-config-operator-5486449875-6lzzs" deleted $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter <none> 0s $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 10.0.128.7:9182,10.0.128.8:9182 4s @sgao I also did not see this on vSphere. Cluster installed using 4.9 nightly build and WMCO version was 4.0.0+ba09417. $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 172.31.251.205:9182,172.31.251.132:9182 13m windows-machine-config-operator-registry-server 10.129.2.14:50051 178m $ oc get pods -A |grep windows openshift-windows-machine-config-operator windows-machine-config-operator-74db66f78f-vmfn4 1/1 Running 0 16m openshift-windows-machine-config-operator windows-machine-config-operator-registry-server-d75f9658d-885rl 1/1 Running 0 179m $ oc delete pod windows-machine-config-operator-74db66f78f-vmfn4 -n openshift-windows-machine-config-operator pod "windows-machine-config-operator-74db66f78f-vmfn4" deleted $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 172.31.251.205:9182,172.31.251.132:9182 1s $oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter 172.31.251.205:9182,172.31.251.132:9182 56s The team has triaged this bug and decided it is not a blocker to the v4.0.0 release of WMCO. The endpoints object seems to be properly repopulated with Windows node IPs at most a few minutes after the operator pod is restarted. @mohashai That's strange, unless WMCO reconcile triggered(scale up/down node), ep always empty(after 30 mins) on vSphere here. I'll keep my env tonight, thanks. $ oc get node -l kubernetes.io/os=windows NAME STATUS ROLES AGE VERSION winworker-2f7np Ready worker 52m v1.22.1-1676+af080cb8d127b3 winworker-flw7c Ready worker 56m v1.22.1-1676+af080cb8d127b3 $ oc get ep -n openshift-windows-machine-config-operator NAME ENDPOINTS AGE windows-exporter <none> 30m @mohashai Found that with template windows-server-2004-template-nics-vmtoolsv11333, this bug no longer exist on OCP 4.9.0-0.nightly-2021-10-22-102153 + vSphere Thanks for identifying a solution @sgao, I've moved this to on QA. You can mark it as verified when it is good on your end. We'll try to get this in to WMCO v4.0.0 (OCP 4.9), though it remains not a blocker for the release. oc delete pod/windows-machine-config-operator-67d8b7d6d6-bcfhd pod "windows-machine-config-operator-67d8b7d6d6-bcfhd" deleted rrasouli@rrasouli-mac openshift-tests-private % oc get pod NAME READY STATUS RESTARTS AGE windows-machine-config-operator-67d8b7d6d6-fcnld 1/1 Running 0 5s rrasouli@rrasouli-mac openshift-tests-private % oc get ep NAME ENDPOINTS AGE windows-exporter 10.0.154.207:9182,10.0.159.181:9182 5s verified on 3.1.0+8ffe65a Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.1 product release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4757 |