Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1937930

Summary: windows node fails to become ready:ERROR controller-runtime.controller Reconciler error {"controller": "windowsmachine-controller", "request": "openshift-machine-api/...", "error": "failed to configure Windows VM
Product: OpenShift Container Platform Reporter: milti leonard <mleonard>
Component: DocumentationAssignee: Cody Hoag <choag>
Status: CLOSED CURRENTRELEASE QA Contact: gaoshang <sgao>
Severity: urgent Docs Contact: Vikram Goyal <vigoyal>
Priority: medium    
Version: 4.6.zCC: aos-bugs, aravindh, jokerman, sgao
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Windows   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-03-18 17:55:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description milti leonard 2021-03-11 18:21:57 UTC
Description of problem:
windows node fails to become ready: 2021-03-08T19:53:58.392657710Z 2021-03-08T19:53:58.392Z ERROR   controller-runtime.controller   Reconciler error        {"controller": "windowsmachine-controller", "request": "openshift-machine-api/facos581win-6fns4-windows-worker-us-west-2a-cch9r", "error": "failed to configure Windows VM i-0b3c065b4d424ba67: configuring node network failed: error waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation for ip-10-0-15-148.us-west-2.compute.internal: timed out waiting for the condition timeout waiting for k8s.ovn.org/hybrid-overlay-distributed-router-gateway-mac node annotation


Version-Release number of selected component (if applicable):
4.6

How reproducible:
unsure

Steps to Reproduce:
1.
2.
3.

Actual results:
windows node fails to become ready to run workloads

Expected results:
windows node succeeds to become ready

Additional info:

Comment 1 milti leonard 2021-03-11 18:28:43 UTC
ive requested a must-gather from the cu; the WNCO log can be accessed from this logBundle attachment [1] in supportshell. i requested an inspection of several COs and the WMCO ns.


[1] https://attachments.access.redhat.com/hydra/rest/cases/02880518/attachments/43e023cf-1e2b-4c25-922b-8f3fcf746b3b?usePresignedUrl=true

Comment 3 Aravindh Puthiyaparambil 2021-03-11 23:43:27 UTC
The customer is using a cluster with the custom VXLAN port option and trying to use a Windows Server 2019 Datacenter image which does not support this. The customer can try using the unsupported Windows Server 1909 image (AMI ID for us-west-2: ami-0e40d9dd911bc83ef) if this is just a POC. In general we are running into https://bugzilla.redhat.com/show_bug.cgi?id=1905950 on AWS and they need to be made aware of that. We have opened https://github.com/microsoft/Windows-Containers/issues/78 against Microsoft to investigate this.

We need to clarify this in our docs, so I have opened https://github.com/openshift/windows-machine-config-operator/pull/338 which needs to be reflected in the OpenShift docs. So I will make this is a docs bug.

Comment 4 Aravindh Puthiyaparambil 2021-03-12 16:34:46 UTC
Alternatively the customer can bring up hybrid OVN cluster WITHOUT the custom VXLAN port option and use Windows Server 2019 version 10.0.17763.1457 or earlier.

Comment 5 Cody Hoag 2021-03-12 18:38:25 UTC
Please review these doc updates: https://github.com/openshift/openshift-docs/pull/30388. Thanks!

Comment 13 Cody Hoag 2021-03-18 14:46:27 UTC
QE verified in Slack and PR

Comment 14 Cody Hoag 2021-03-18 14:47:43 UTC
This has been merged. I'll provide the live links when they're available.