Bug 1859537
Summary: | Connection timed out when WMCO reconciling Windows node | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | gaoshang <sgao> |
Component: | Windows Containers | Assignee: | Aravindh Puthiyaparambil <aravindh> |
Status: | CLOSED ERRATA | QA Contact: | gaoshang <sgao> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 4.6 | CC: | aos-bugs, aravindh, gmarkley, rgudimet |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:16:41 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
gaoshang
2020-07-22 11:08:03 UTC
@sgao, please test with 4.6 clusters. We don't support WMCO with 4.5. You are also using go 1.14. Please use go 1.13 to match with what we are using. Is https://github.com/openshift/windows-machine-config-operator/pull/77 included in your run? That PR has a fix for the scenario you are running into. Moreover we are moving away from WNI for creating the Windows VMs and switching to MachineSets. Please see https://github.com/openshift/windows-machine-config-operator/pull/81. I suggest closing this for now and keeping an eye out if this happens once we do the switch over. This definitely is an issue. Seeing it on the PR where I am disabling the kube version check: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_windows-machine-config-operator/83/pull-ci-openshift-windows-machine-config-operator-master-e2e-operator/1286027984758640640. @sgao, please keep this open. (In reply to Aravindh Puthiyaparambil from comment #2) > This definitely is an issue. Seeing it on the PR where I am disabling the > kube version check: > https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/ > openshift_windows-machine-config-operator/83/pull-ci-openshift-windows- > machine-config-operator-master-e2e-operator/1286027984758640640. > > @sgao, please keep this open. Sure, thanks for the info, I'll test it with 4.6 cluster + go 1.13 when fixed. This bug has been verified with OCP 4.6.0-0.nightly-2020-07-24-002417 by skipping clusterconfig.validate Steps: 1. Install a fresh OCP cluster and run WMCO operator 2. Create a Windows machineset, check WMCO log, when SSH dial connection timed out, operator will retry and finally bootstrapped. # oc logs -f deployment.apps/windows-machine-config-operator 2020-07-24T07:37:56.520Z INFO cmd operator {"version": "0.0.1"} 2020-07-24T07:37:56.520Z INFO cmd go {"version": "go1.13.14", "os": "linux", "arch": "amd64"} 2020-07-24T07:37:56.520Z INFO cmd operator-sdk {"version": "v0.18.1"} 2020-07-24T07:37:56.529Z INFO leader Trying to become the leader. 2020-07-24T07:37:56.529Z DEBUG k8sutil Found namespace {"Namespace": "windows-machine-config-operator"} I0724 07:37:57.579949 1 request.go:621] Throttling request took 1.0457767s, request: GET:https://172.30.0.1:443/apis/cloudcredential.openshift.io/v1?timeout=32s 2020-07-24T07:37:58.682Z DEBUG k8sutil Found podname {"Pod.Name": "windows-machine-config-operator-664986f85b-sp94n"} 2020-07-24T07:37:58.692Z DEBUG k8sutil Found Pod {"Pod.Namespace": "windows-machine-config-operator", "Pod.Name": "windows-machine-config-operator-664986f85b-sp94n"} 2020-07-24T07:37:58.694Z INFO leader No pre-existing lock was found. 2020-07-24T07:37:58.700Z INFO leader Became the leader. 2020-07-24T07:38:00.855Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": "0.0.0.0:8383"} 2020-07-24T07:38:00.856Z INFO cmd registering Components. 2020-07-24T07:38:03.012Z DEBUG k8sutil Found namespace {"Namespace": "windows-machine-config-operator"} 2020-07-24T07:38:03.012Z DEBUG kubemetrics Starting collecting operator types 2020-07-24T07:38:03.012Z DEBUG kubemetrics Generating metric families {"apiVersion": "machine.openshift.io/v1beta1", "kind": "Machine"} 2020-07-24T07:38:05.167Z DEBUG kubemetrics Starting serving custom resource metrics 2020-07-24T07:38:07.321Z DEBUG k8sutil Found namespace {"Namespace": "windows-machine-config-operator"} 2020-07-24T07:38:07.321Z DEBUG k8sutil Found podname {"Pod.Name": "windows-machine-config-operator-664986f85b-sp94n"} 2020-07-24T07:38:07.325Z DEBUG k8sutil Found Pod {"Pod.Namespace": "windows-machine-config-operator", "Pod.Name": "windows-machine-config-operator-664986f85b-sp94n"} 2020-07-24T07:38:07.334Z DEBUG metrics Pods owner found {"Kind": "Deployment", "Name": "windows-machine-config-operator", "Namespace": "windows-machine-config-operator"} 2020-07-24T07:38:07.353Z INFO metrics Metrics Service object created {"Service.Name": "windows-machine-config-operator-metrics", "Service.Namespace": "windows-machine-config-operator"} I0724 07:38:08.403861 1 request.go:621] Throttling request took 1.043664612s, request: GET:https://172.30.0.1:443/apis/storage.k8s.io/v1?timeout=32s 2020-07-24T07:38:09.526Z INFO cmd starting the Cmd. 2020-07-24T07:38:09.526Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"} 2020-07-24T07:38:09.526Z INFO controller-runtime.controller Starting EventSource {"controller": "windowsmachine-controller", "source": "kind source: /, Kind="} 2020-07-24T07:38:09.727Z INFO controller-runtime.controller Starting Controller {"controller": "windowsmachine-controller"} 2020-07-24T07:38:09.727Z INFO controller-runtime.controller Starting workers {"controller": "windowsmachine-controller", "worker count": 1} 2020-07-24T08:27:37.699Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:37.707Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:37.775Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:37.782Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:39.514Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:39.523Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:39.526Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:39.533Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.832Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.843Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.843Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.851Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.854Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:27:59.862Z DEBUG windowsmachine-controller configuring the Windows VM {"ID": "i-0181b2e1f467b1298"} ----------------------------------------------------------------------- 2020-07-24T08:30:11.982Z DEBUG VM i-0181b2e1f467b1298 SSH dial {"IP Address": "10.0.151.105", "error": "dial tcp 10.0.151.105:22: connect: connection timed out"} 2020-07-24T08:33:20.398Z DEBUG VM i-0181b2e1f467b1298 SSH dial {"IP Address": "10.0.151.105", "error": "dial tcp 10.0.151.105:22: connect: connection timed out"} ----------------------------------------------------------------------- 2020-07-24T08:34:53.685Z DEBUG VM i-0181b2e1f467b1298 ignition file download {"cmd": "C:\\Temp\\wget-ignore-cert.ps1 -server https://api-int.sgao.qe.devcluster.openshift.com:22623/config/worker -output C:\\Windows\\Temp\\worker.ign -useragent Ignition/0.35.0", "output": ""} 2020-07-24T08:34:58.255Z DEBUG VM i-0181b2e1f467b1298 output from wmcb {"output": "Bootstrapping completed successfully"} 2020-07-24T08:35:13.261Z DEBUG nodeconfig worker label %s already present on node %s {"node-role.kubernetes.io/worker": "ip-10-0-151-105.us-east-2.compute.internal"} 2020-07-24T08:38:08.664Z DEBUG VM i-0181b2e1f467b1298 started service {"name": "kube-proxy", "binary": "C:\\k\\kube-proxy.exe", "args": "--windows-service --v=4 --proxy-mode=kernelspace --feature-gates=WinOverlay=true --hostname-override=ip-10-0-151-105.us-east-2.compute.internal --kubeconfig=c:\\k\\kubeconfig --cluster-cidr=10.132.0.0/24 --log-dir=C:\\k\\log\\kube-proxy\\ --logtostderr=false --network-name=OVNKubernetesHybridOverlayNetwork --source-vip=10.132.0.23 --enable-dsr=false"} 2020-07-24T08:38:08.664Z INFO windowsmachine-controller Windows VM has joined the cluster as a worker node {"ID": "i-0181b2e1f467b1298"} 2020-07-24T08:38:08.664Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:38:08.664Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T08:38:08.665Z DEBUG controller-runtime.manager.events Normal {"object": {"kind":"Machine","namespace":"openshift-machine-api","name":"sgao-d2wtc-windows-worker-us-east-2a-6nr8v","uid":"637fa39c-1c09-476f-a21a-e68c202fddfa","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"297231"}, "reason": "WMCO Setup", "message": "Machine sgao-d2wtc-windows-worker-us-east-2a-6nr8v Configured Successfully"} 2020-07-24T08:38:08.673Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T09:19:23.010Z INFO windowsmachine-controller reconciling {"namespace": "openshift-machine-api", "name": "sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} 2020-07-24T09:19:23.020Z DEBUG controller-runtime.controller Successfully Reconciled {"controller": "windowsmachine-controller", "request": "openshift-machine-api/sgao-d2wtc-windows-worker-us-east-2a-6nr8v"} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |