Bug 1768839 - Log Error - windows kubelet report error failed to list vm sizes in GetVolumeLimits
Summary: Log Error - windows kubelet report error failed to list vm sizes in GetVolume...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Windows Containers
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 4.4.0
Assignee: sumehta
QA Contact: gaoshang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-05 11:10 UTC by gaoshang
Modified: 2020-05-04 11:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 11:14:43 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:15:11 UTC

Description gaoshang 2019-11-05 11:10:11 UTC
Description of problem:

After bootstrapped a windows node by wmcb.exe, windows kubelet continuously report error "failed to list vm sizes in GetVolumeLimits".

PS C:\k> Get-Content .\kubelet.log -tail 1 -wait
...
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus

Version-Release number of selected component (if applicable):
OCP version: 4.2 GA
# ./openshift-install version
./openshift-install v4.2.0
built from commit 90ccb37ac1f85ae811c50a29f9bb7e779c5045fb
release image quay.io/openshift-release-dev/ocp-release@sha256:c5337afd85b94c93ec513f21c8545e3f9e36a227f55d41bc1dfb8fcc3f2be129

windows-machine-config-operator version:
# git tag
0.1

windows instance:
MicrosoftWindowsServer:WindowsServer:2019-Datacenter-with-Containers:latest

How reproducible:
Always

Steps to Reproduce:
1. Install OCP 4.2 GA in Azure
2. Launch and configure windows instance
3. Bootstrap windows node via wmcb
> .\wmcb.exe initialize-kubelet --ignition-file worker.ign --kubelet-path kubelet.exe
{"level":"info","ts":1572947207.8846965,"logger":"wmcb","msg":"Bootstrapping completed successfully"}
4. Check log file in C:\k\kubelet.log

Actual results:

Windows kubelet continuously report error "failed to list vm sizes in GetVolumeLimits".

PS C:\k> Get-Content .\kubelet.log -tail 1 -wait
...
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus
E1105 11:02:49.233810    3360 azure_dd.go:172] failed to list vm sizes in GetVolumeLimits, plugin.host: winnode, location: eastus

Expected results:

No such error reported

Additional info:

Comment 1 ravig 2019-11-05 14:19:53 UTC
I believe the error is related to https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-limits. In short, it means we're querying azure api to find the dynamic volume limits for the given Windows VM but since we didn't provide the cloud config which has service principal with access to query the azure api, kubelet is reporting this error. 

What's the output of 
`oc get nodes`?

Do you see that the Windows node is not ready?

Comment 2 Aravindh Puthiyaparambil 2019-11-05 15:37:18 UTC
(In reply to ravig from comment #1)
> I believe the error is related to
> https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-
> limits. In short, it means we're querying azure api to find the dynamic
> volume limits for the given Windows VM but since we didn't provide the cloud
> config which has service principal with access to query the azure api,
> kubelet is reporting this error. 

We do provide the cloud.conf for azure. Without the config, the kubelet will fail to come up on Azure worker nodes. Please see https://github.com/openshift/windows-machine-config-operator/pull/58

> What's the output of 
> `oc get nodes`?
> 
> Do you see that the Windows node is not ready?

This does not affect the Windows node from becoming ready.

Comment 3 gaoshang 2019-11-06 03:13:54 UTC
(In reply to ravig from comment #1)
> I believe the error is related to
> https://kubernetes.io/docs/concepts/storage/storage-limits/#dynamic-volume-
> limits. In short, it means we're querying azure api to find the dynamic
> volume limits for the given Windows VM but since we didn't provide the cloud
> config which has service principal with access to query the azure api,
> kubelet is reporting this error. 
> 
> What's the output of 
> `oc get nodes`?
> 
> Do you see that the Windows node is not ready?

Windows node is in ready state and not affected, it's a low level log message bug, just want to make sure it is tracked.

# oc get nodes
NAME                                          STATUS     ROLES    AGE    VERSION
sgao-cluster-win-bg7pz-master-0               Ready      master   27h    v1.14.6+c07e432da
sgao-cluster-win-bg7pz-master-1               Ready      master   27h    v1.14.6+c07e432da
sgao-cluster-win-bg7pz-master-2               Ready      master   27h    v1.14.6+c07e432da
sgao-cluster-win-bg7pz-worker-eastus1-4l29r   Ready      worker   27h    v1.14.6+c07e432da
sgao-cluster-win-bg7pz-worker-eastus2-wvk59   Ready      worker   27h    v1.14.6+c07e432da
sgao-cluster-win-bg7pz-worker-eastus3-k2xqp   Ready      worker   27h    v1.14.6+c07e432da
winnode                                       Ready      <none>   151m   v1.14.0

Comment 4 Aravindh Puthiyaparambil 2019-11-12 15:35:05 UTC
@gaoshang is this error seen in the kubelet logs in Linux worker node.

Comment 5 gaoshang 2019-11-13 11:53:54 UTC
(In reply to aravindh from comment #4)
> @gaoshang is this error seen in the kubelet logs in Linux worker node.

No, this error is not seen in the kubelet logs in Linux worker node.

sh-4.4# journalctl -u kubelet | grep "failed to list vm sizes"
sh-4.4# journalctl -u kubelet | grep "GetVolumeLimits"

Comment 6 gaoshang 2019-11-27 03:57:28 UTC
This Error doesn't exist in OCP 4.3 cluster with azure.

Version-Release number of selected component (if applicable):
# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.0-0.nightly-2019-11-24-183610   True        False         5m35s   Cluster version is 4.3.0-0.nightly-2019-11-24-183610
windows-machine-config-operator commit:
# git show
commit 1eb1f983774101b5077828fd2efb4dfb711d5886

Comment 7 sumehta 2020-01-16 20:01:19 UTC
This error cannot be reproduced in OpenShift 4.3 or 4.4 clusters with Azure.
Looks like the error was thrown by https://github.com/kubernetes/kubernetes/blob/release-1.14/pkg/volume/azure_dd/azure_dd.go#L172 in 4.2, and it was subsequently removed in https://github.com/kubernetes/kubernetes/commit/b962361536f9ad4c847ccbbcd3fc7614b3f50b84
Since we do not support 4.2, we should close this bug.

Comment 8 gaoshang 2020-01-19 02:49:33 UTC
@sumehta@redhat.com I think so, since this bug do not exist anymore, I'll move bug status to VERIFIED, thanks.

Comment 10 errata-xmlrpc 2020-05-04 11:14:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.