Bug 2015773
| Summary: | Deleting version annotation failed to trigger Windows node reconcile on vSphere | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | gaoshang <sgao> |
| Component: | Windows Containers | Assignee: | elango siva <esiva> |
| Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.9 | CC: | aos-bugs, esiva, team-winc |
| Target Milestone: | --- | ||
| Target Release: | 4.9.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-12-13 12:46:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
I was able to reproduce this issue locally.
There was vmware tool config issue that was identified by @jose and he fixed it with temp vspehre windows golden image.
I tried jvaldes/windows-server-2004-template-nics-vmtoolsv11333 image which is present in the vcenter ( used by dev team) . I dont see the problem and the root cause of the issue is not seen. Basically ip addresses information in the windows machine object is getting wiped out and that is causing this issue. It doesnt happen with @Jose's image.
1) run oc annotate node winworker-dt6ck windowsmachineconfig.openshift.io/version-
2) check node info where version is missing
esiva:/home/esiva/go/src/windows-machine-config-operator
$ oc describe node winworker-dt6ck
Name: winworker-dt6ck
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=windows
kubernetes.io/arch=amd64
kubernetes.io/hostname=winworker-dt6ck
kubernetes.io/os=windows
node-role.kubernetes.io/worker=
node.kubernetes.io/windows-build=10.0.19041
node.openshift.io/os_id=Windows
Annotations: k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03
k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24
machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck
volumes.kubernetes.io/controller-managed-attach-detach: true
windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019
3) wait for a while for reconciler to kick in
4) Node comes back to ready state
5) Verified and made sure version is back.
$ oc describe node winworker-dt6ck
Name: winworker-dt6ck
Roles: worker
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=windows
kubernetes.io/arch=amd64
kubernetes.io/hostname=winworker-dt6ck
kubernetes.io/os=windows
node-role.kubernetes.io/worker=
node.kubernetes.io/windows-build=10.0.19041
node.openshift.io/os_id=Windows
Annotations: k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03
k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24
machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck
volumes.kubernetes.io/controller-managed-attach-detach: true
windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019
windowsmachineconfig.openshift.io/version: 4.0.0+ba09417
If QE team uses same Vcenter, one can use `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333` instead of of `windows-golden-images/windows-server-2004-template` in template.
I tried `jvaldes/windows-server-2004-template-nics-vmtoolsv11333` and this is same as `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333`. Jose placed it in proper folder. With template windows-server-2004-template-nics-vmtoolsv11333, this bug no longer exist on OCP 4.9.0-0.nightly-2021-10-22-102153, thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.1 product release), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:4757 |
Description of problem: Deleting version annotation will reconfig Windows node in previous WMCO, and after reconfigure, the version annotation will be added back. Now on vSphere, it keeps on reporting `no internal IP address associated` error, does not trigger Windows node reconcile and never adds the version back. {"level":"error","ts":1634565502.8201108,"logger":"controller-runtime.manager.controller.machine","msg":"Reconciler error","reconciler group":"machine.openshift.io","reconciler kind":"Machine","name":"winworker-hqh94","namespace":"openshift-machine-api","error":"invalid machine winworker-hqh94: no internal IP address associated","errorVerbose":"no internal IP address associated\ngithub.com/openshift/windows-machine-config-operator/controllers.getInternalIPAddress\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:497\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371\ninvalid machine winworker-hqh94\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:299\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"} Version-Release number of selected component (if applicable): OCP version: 4.9.0-0.nightly-2021-10-16-173626 WMCO version: 4.0.0+7991f6f0 How reproducible: Always Steps to Reproduce: 1, Scale up Windows node created by machineset 2, Delete version annotation of Windows node 3, Check WMCO log Actual results: WMCO keeps on reporting `no internal IP address associated` error Expected results: WMCO should trigger Windows node reconcile and add version annotation back Additional info: