Description of problem: Deleting version annotation will reconfig Windows node in previous WMCO, and after reconfigure, the version annotation will be added back. Now on vSphere, it keeps on reporting `no internal IP address associated` error, does not trigger Windows node reconcile and never adds the version back. {"level":"error","ts":1634565502.8201108,"logger":"controller-runtime.manager.controller.machine","msg":"Reconciler error","reconciler group":"machine.openshift.io","reconciler kind":"Machine","name":"winworker-hqh94","namespace":"openshift-machine-api","error":"invalid machine winworker-hqh94: no internal IP address associated","errorVerbose":"no internal IP address associated\ngithub.com/openshift/windows-machine-config-operator/controllers.getInternalIPAddress\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:497\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371\ninvalid machine winworker-hqh94\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:299\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"} Version-Release number of selected component (if applicable): OCP version: 4.9.0-0.nightly-2021-10-16-173626 WMCO version: 4.0.0+7991f6f0 How reproducible: Always Steps to Reproduce: 1, Scale up Windows node created by machineset 2, Delete version annotation of Windows node 3, Check WMCO log Actual results: WMCO keeps on reporting `no internal IP address associated` error Expected results: WMCO should trigger Windows node reconcile and add version annotation back Additional info:
I was able to reproduce this issue locally. There was vmware tool config issue that was identified by @jose and he fixed it with temp vspehre windows golden image. I tried jvaldes/windows-server-2004-template-nics-vmtoolsv11333 image which is present in the vcenter ( used by dev team) . I dont see the problem and the root cause of the issue is not seen. Basically ip addresses information in the windows machine object is getting wiped out and that is causing this issue. It doesnt happen with @Jose's image. 1) run oc annotate node winworker-dt6ck windowsmachineconfig.openshift.io/version- 2) check node info where version is missing esiva:/home/esiva/go/src/windows-machine-config-operator $ oc describe node winworker-dt6ck Name: winworker-dt6ck Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=windows kubernetes.io/arch=amd64 kubernetes.io/hostname=winworker-dt6ck kubernetes.io/os=windows node-role.kubernetes.io/worker= node.kubernetes.io/windows-build=10.0.19041 node.openshift.io/os_id=Windows Annotations: k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03 k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24 machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck volumes.kubernetes.io/controller-managed-attach-detach: true windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019 3) wait for a while for reconciler to kick in 4) Node comes back to ready state 5) Verified and made sure version is back. $ oc describe node winworker-dt6ck Name: winworker-dt6ck Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=windows kubernetes.io/arch=amd64 kubernetes.io/hostname=winworker-dt6ck kubernetes.io/os=windows node-role.kubernetes.io/worker= node.kubernetes.io/windows-build=10.0.19041 node.openshift.io/os_id=Windows Annotations: k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03 k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24 machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck volumes.kubernetes.io/controller-managed-attach-detach: true windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019 windowsmachineconfig.openshift.io/version: 4.0.0+ba09417 If QE team uses same Vcenter, one can use `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333` instead of of `windows-golden-images/windows-server-2004-template` in template.
I tried `jvaldes/windows-server-2004-template-nics-vmtoolsv11333` and this is same as `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333`. Jose placed it in proper folder.
With template windows-server-2004-template-nics-vmtoolsv11333, this bug no longer exist on OCP 4.9.0-0.nightly-2021-10-22-102153, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.1 product release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4757