Description of problem: After bootstrapped a windows node by wmcb.exe, windows kubelet continuously report error "failed to list vm sizes in GetVolumeLimits". PS C:\k> Get-Content .\kubelet.log -tail 1 -wait ... E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus Version-Release number of selected component (if applicable): OCP version: 4.2 GA # ./openshift-install version ./openshift-install v4.2.0 built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb release image quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129 windows-machine-config-operator version: # git tag 0.1 windows instance: MicrosoftWindowsServer:WindowsServer:2019-Datacenter-with-Containers:latest How reproducible: Always Steps to Reproduce: 1. Install OCP 4.2 GA in Azure 2. Launch and configure windows instance 3. Bootstrap windows node via wmcb > .\wmcb.exe initialize-kubelet --ignition-file worker.ign --kubelet-path kubelet.exe {"level":"info","ts":1572947207.8846965,"logger":"wmcb","msg":"Bootstrapping completed successfully"} 4. Check log file in C:\k\kubelet.log Actual results: Windows kubelet continuously report error "failed to list vm sizes in GetVolumeLimits". PS C:\k> Get-Content .\kubelet.log -tail 1 -wait ... E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus E1105 11:02:49.233810 3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus Expected results: No such error reported Additional info:
I believe the error is related to https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits. In short, it means we're querying azure api to find the dynamic volume limits for the given Windows VM but since we didn't provide the cloud config which has service principal with access to query the azure api, kubelet is reporting this error. What's the output of `oc get nodes`? Do you see that the Windows node is not ready?
(In reply to ravig from comment #1) > I believe the error is related to > https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume- > limits. In short, it means we're querying azure api to find the dynamic > volume limits for the given Windows VM but since we didn't provide the cloud > config which has service principal with access to query the azure api, > kubelet is reporting this error. We do provide the cloud.conf for azure. Without the config, the kubelet will fail to come up on Azure worker nodes. Please see https://github.com/openshift/windows-machine-config-operator/pull/58 > What's the output of > `oc get nodes`? > > Do you see that the Windows node is not ready? This does not affect the Windows node from becoming ready.
(In reply to ravig from comment #1) > I believe the error is related to > https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume- > limits. In short, it means we're querying azure api to find the dynamic > volume limits for the given Windows VM but since we didn't provide the cloud > config which has service principal with access to query the azure api, > kubelet is reporting this error. > > What's the output of > `oc get nodes`? > > Do you see that the Windows node is not ready? Windows node is in ready state and not affected, it's a low level log message bug, just want to make sure it is tracked. # oc get nodes NAME STATUS ROLES AGE VERSION sgao-cluster-win-bg7pz-master-0 Ready master 27h v1.14.6+c07e432da sgao-cluster-win-bg7pz-master-1 Ready master 27h v1.14.6+c07e432da sgao-cluster-win-bg7pz-master-2 Ready master 27h v1.14.6+c07e432da sgao-cluster-win-bg7pz-worker-eastus1-4l29r Ready worker 27h v1.14.6+c07e432da sgao-cluster-win-bg7pz-worker-eastus2-wvk59 Ready worker 27h v1.14.6+c07e432da sgao-cluster-win-bg7pz-worker-eastus3-k2xqp Ready worker 27h v1.14.6+c07e432da winnode Ready <none> 151m v1.14.0
@gaoshang is this error seen in the kubelet logs in Linux worker node.
(In reply to aravindh from comment #4) > @gaoshang is this error seen in the kubelet logs in Linux worker node. No, this error is not seen in the kubelet logs in Linux worker node. sh-4.4# journalctl -u kubelet | grep "failed to list vm sizes" sh-4.4# journalctl -u kubelet | grep "GetVolumeLimits"
This Error doesn't exist in OCP 4.3 cluster with azure. Version-Release number of selected component (if applicable): # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.0-0.nightly-2019-11-24-183610 True False 5m35s Cluster version is 4.3.0-0.nightly-2019-11-24-183610 windows-machine-config-operator commit: # git show commit 1eb1f983774101b5077828fd2efb4dfb711d5886
This error cannot be reproduced in OpenShift 4.3 or 4.4 clusters with Azure. Looks like the error was thrown by https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/volume/azure_dd/azure_dd.go#L172 in 4.2, and it was subsequently removed in https://github.com/kubernetes/kubernetes/commit/b962361536f9ad4c847ccbbcd3fc7614b3f50b84 Since we do not support 4.2, we should close this bug.
@sumehta I think so, since this bug do not exist anymore, I'll move bug status to VERIFIED, thanks.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581