Bug 2015773 - Deleting version annotation failed to trigger Windows node reconcile on vSphere
Summary: Deleting version annotation failed to trigger Windows node reconcile on vSphere
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.9.z
Assignee: elango siva
QA Contact: gaoshang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-10-20 05:21 UTC by gaoshang
Modified: 2021-12-13 12:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-12-13 12:46:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2021:4757 0 None None None 2021-12-13 12:46:23 UTC

Description gaoshang 2021-10-20 05:21:03 UTC
Description of problem: Deleting version annotation will reconfig Windows node in previous WMCO, and after reconfigure, the version annotation will be added back. Now on vSphere, it keeps on reporting `no internal IP address associated` error, does not trigger Windows node reconcile and never adds the version back.

{"level":"error","ts":1634565502.8201108,"logger":"controller-runtime.manager.controller.machine","msg":"Reconciler error","reconciler group":"machine.openshift.io","reconciler kind":"Machine","name":"winworker-hqh94","namespace":"openshift-machine-api","error":"invalid machine winworker-hqh94: no internal IP address associated","errorVerbose":"no internal IP address associated\ngithub.com/openshift/windows-machine-config-operator/controllers.getInternalIPAddress\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:497\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371\ninvalid machine winworker-hqh94\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/remote-source/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:299\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1371","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214"}

Version-Release number of selected component (if applicable):
OCP version: 4.9.0-0.nightly-2021-10-16-173626
WMCO version: 4.0.0+7991f6f0

How reproducible:
Always

Steps to Reproduce:
1, Scale up Windows node created by machineset
2, Delete version annotation of Windows node
3, Check WMCO log

Actual results:
WMCO keeps on reporting `no internal IP address associated` error
Expected results:
WMCO should trigger Windows node reconcile and add version annotation back

Additional info:

Comment 1 elango siva 2021-10-20 22:03:14 UTC
I was able to reproduce this issue locally. 

There was vmware tool config issue that was identified by @jose and he fixed it with temp vspehre windows golden image.
I tried jvaldes/windows-server-2004-template-nics-vmtoolsv11333 image which is present in the vcenter ( used by dev team) .  I dont see the problem and the root cause of the issue is not seen.  Basically ip addresses information in the windows machine object is getting wiped out and that is causing this issue. It doesnt happen with @Jose's image.

1) run oc annotate node winworker-dt6ck windowsmachineconfig.openshift.io/version-
2) check node info where version is missing
    esiva:/home/esiva/go/src/windows-machine-config-operator
    $ oc describe node winworker-dt6ck 
    Name:               winworker-dt6ck
    Roles:              worker
    Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=windows
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=winworker-dt6ck
                    kubernetes.io/os=windows
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/windows-build=10.0.19041
                    node.openshift.io/os_id=Windows
    Annotations:        k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03
                    k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24
                    machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck
                    volumes.kubernetes.io/controller-managed-attach-detach: true
                    windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019

3) wait for a while for reconciler to kick in 

4) Node comes back to ready state 

5) Verified and made sure version is back. 

$ oc describe node winworker-dt6ck 
Name:               winworker-dt6ck
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=windows
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=winworker-dt6ck
                    kubernetes.io/os=windows
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/windows-build=10.0.19041
                    node.openshift.io/os_id=Windows
Annotations:        k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac: 00-15-5D-74-48-03
                    k8s.ovn.org/hybrid-overlay-node-subnet: 10.132.1.0/24
                    machine.openshift.io/machine: openshift-machine-api/winworker-dt6ck
                    volumes.kubernetes.io/controller-managed-attach-detach: true
                    windowsmachineconfig.openshift.io/pub-key-hash: 7c00ba8122aa764a192fe7d2d9ac4d3627b9c443c09480b18c055c2e178a6019
                    windowsmachineconfig.openshift.io/version: 4.0.0+ba09417


If QE team uses same Vcenter, one can use `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333` instead of of `windows-golden-images/windows-server-2004-template` in template.

Comment 3 elango siva 2021-10-20 22:22:38 UTC
I tried `jvaldes/windows-server-2004-template-nics-vmtoolsv11333` and this is same as `windows-golden-images/windows-server-2004-template-nics-vmtoolsv11333`. Jose placed it in proper folder.

Comment 4 gaoshang 2021-10-25 12:35:11 UTC
With template windows-server-2004-template-nics-vmtoolsv11333, this bug no longer exist on OCP 4.9.0-0.nightly-2021-10-22-102153, thanks.

Comment 7 errata-xmlrpc 2021-12-13 12:46:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Windows Container Support for Red Hat OpenShift 4.0.1 product release), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:4757


Note You need to log in before you can comment on or make changes to this bug.