Bug 1953692
Summary: | WMCO incorrectly shows node as ready after a failed configuration | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Aravindh Puthiyaparambil <aravindh> | |
Component: | Windows Containers | Assignee: | Aravindh Puthiyaparambil <aravindh> | |
Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.7 | CC: | aos-bugs, gfontana, mankulka | |
Target Milestone: | --- | |||
Target Release: | 4.8.0 | |||
Hardware: | x86_64 | |||
OS: | Windows | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: WMCO was not cordoning the nodes after initial kubelet setup, prematurely making the node available for scheduling
Consequence: Pods would get scheduled on the nodes but would not go to Running
Fix: Cordon the nodes after initial kubelet setup and uncordon after full configuration
Result: Node is no longer prematurely accepts workloads
|
Story Points: | --- | |
Clone Of: | ||||
: | 1956412 (view as bug list) | Environment: | ||
Last Closed: | 2021-08-03 20:29:16 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1956412 |
Description
Aravindh Puthiyaparambil
2021-04-26 16:43:14 UTC
This bug has been verified on OCP 4.8 + vSphere + Windows Server 2019 and passed, thanks. Version-Release number of selected component (if applicable): WMCO built from https://github.com/openshift/windows-machine-config-operator/commit/1ca41c250ff937d1543559ba19e805a7473d45bf OCP version 4.8.0-0.nightly-2021-04-30-201824 Steps: 1. Install OCP 4.8 on vSphere, build WMCO and install it, refer to https://github.com/openshift/windows-machine-config-operator/blob/master/docs/HACKING.md 2. Create Windows machineset with Windows Server 2019 3. Check WMCO log and watch Windows node status 1), When kubelet service started, Windows node would be Ready but cordoned. $ oc logs -f deployment.apps/windows-machine-config-operator -n openshift-windows-machine-config-operator ... 2021-05-06T11:59:48.281Z INFO VM 172.31.249.149 configured kubelet {"cmd": "C:\\k\\\\wmcb.exe initialize-kubelet --ignition-file C:\\Windows\\Temp\\worker.ign --kubelet-path C:\\k\\kubelet.exe", "output": "Bootstrapping completed successfully"} $ oc get nodes -l kubernetes.io/os=windows -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME winworker-zk6s4 Ready,SchedulingDisabled worker 14s v1.21.0-rc.0.1190+e22a836a8b2659 172.31.249.149 172.31.249.149 Windows Server 2019 Standard 10.0.17763.1697 docker://19.3.14 2), Wait until running hybrid-overlay-node service failed, Windows node would be NotReady and cordoned. 2021-05-06T12:13:04.920Z ERROR controller-runtime.manager.controller.machine Reconciler error {"reconciler group": "machine.openshift.io", "reconciler kind": "Machine", "name": "winworker-zk6s4", "namespace": "openshift-machine-api", "error": "failed to configure Windows VM 422c050e-a0bc-b215-2a89-3986cbc84aab: configuring node network failed: error waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation for winworker-zk6s4: timeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation: timed out waiting for the condition", "errorVerbose": "timed out waiting for the condition\ntimeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).waitForNodeAnnotation\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:306\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).configureNetwork\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:225\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure.func1\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:170\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:193\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).addWorkerNode\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:440\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:374\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nerror waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation for winworker-zk6s4\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).configureNetwork\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:226\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure.func1\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:170\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:193\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).addWorkerNode\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:440\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:374\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nconfiguring node network failed\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure.func1\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:171\ngithub.com/openshift/windows-machine-config-operator/pkg/nodeconfig.(*nodeConfig).Configure\n\t/build/windows-machine-config-operator/pkg/nodeconfig/nodeconfig.go:193\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).addWorkerNode\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:440\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:374\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\nfailed to configure Windows VM 422c050e-a0bc-b215-2a89-3986cbc84aab\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).addWorkerNode\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:442\ngithub.com/openshift/windows-machine-config-operator/controllers.(*WindowsMachineReconciler).Reconcile\n\t/build/windows-machine-config-operator/controllers/windowsmachine_controller.go:374\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:253 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2 /build/windows-machine-config-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:214 $ oc get nodes -l kubernetes.io/os=windows -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME winworker-zk6s4 NotReady,SchedulingDisabled worker 53m v1.21.0-rc.0.1190+e22a836a8b2659 172.31.249.149 172.31.249.149 Windows Server 2019 Standard 10.0.17763.1697 docker://19.3.14 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Platform for Windows Containers 3.0.0 security and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3001 |