Description of problem: When continuing to upload past ~2500 VMI on the system we start to get VMS failing and stuck on scheduling - when we reach ~5K VMI we have 4755 at Running state, 119 at Failed state and 176 at Scheduling state. Version-Release number of selected component (if applicable): 1.4 How reproducible: Alway's Steps to Reproduce: 1.Create 5000 VMS with 115 nodes 2.Start the VMI Actual results: Some of the VMI start to fail Expected results: All VMI starts successfully Additional info: VMI counts on the system : root@master-0: ~ # oc get vmi --all-namespaces | grep -c Running 4755 root@master-0: ~ # oc get vmi --all-namespaces | grep -c Failed 119 root@master-0: ~ # oc get vmi --all-namespaces | grep -c Scheduling 176 kubelet restarts on the node : root@master-0: ~ # oc describe node app-node-6.scale-ci.example.com Type Reason Age From Message ---- ------ ---- ---- ------- Normal NodeReady 1h (x3 over 2h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeReady Normal NodeHasSufficientMemory 1h (x3 over 2h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 1h (x3 over 2h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasNoDiskPressure Normal NodeNotReady 1h (x3 over 1h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeNotReady Normal NodeHasSufficientDisk 1h (x3 over 2h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasSufficientDisk Normal NodeHasSufficientPID 1h kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasSufficientPID Normal Starting 1h kubelet, app-node-6.scale-ci.example.com Starting kubelet. Normal NodeHasSufficientDisk 1h (x2 over 1h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasSufficientDisk Normal NodeHasSufficientMemory 1h (x2 over 1h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 1h (x2 over 1h) kubelet, app-node-6.scale-ci.example.com Node app-node-6.scale-ci.example.com status is now: NodeHasNoDiskPressure Error message on kvm limits : root@master-0: ~ # oc describe po virt-launcher-vm-cirros-41-t2h25scrowed Node-Selectors: kubernetes.io/hostname=app-node-40.scale-ci.example.com kubevirt.io/schedulable=true node-role.kubernetes.io/compute=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 39s (x9190 over 5h) default-scheduler 0/131 nodes are available: 12 Insufficient devices.kubevirt.io/kvm, 131 node(s) didn't match node selector. kvm limits went to 0 : root@master-0: ~ # oc get node | grep app-node | awk '{print $1}' | xargs oc describe node | grep -E "devices.kubevirt.io/kvm:|Hostname:" Hostname: app-node-0.scale-ci.example.com devices.kubevirt.io/kvm: 0 devices.kubevirt.io/kvm: 0 Hostname: app-node-1.scale-ci.example.com devices.kubevirt.io/kvm: 0 devices.kubevirt.io/kvm: 0
Added logs of app-node-12 that devices.kubevirt.io went from 110 to 0 by oc get event | grep app-node-12.scale-ci.example.com. Hostname: app-node-12.scale-ci.example.com devices.kubevirt.io/kvm: 0 devices.kubevirt.io/kvm: 0
Guy, in the last comment: How many VMs were running on app-node-12.scale-ci.example.com when it had 0 kvm devices?
I cannot say exactly but i estimate around 30-40 VMS according to when it failed.
It's look like it's not a load related issue. With no VM on the system more and more Nodes KVM capacity change to 0. Attached is the virt handler log of app node 2, I see there the error's "server was unable to return a response in the time allotted", according to the below issue could it be network issues on the lab that cause the error ? https://github.com/kubernetes/kubernetes/issues/60957
Guy, I see in the environment that on some nodes virt-handler restarted one or multiple times. Is there a correlation between these virt-handlers and the nodes where you see the capacity of zero? Also did the kubelet restart on the nodes with KVM capacity 0? There can be a bug in the device plugin manager (like this one: https://github.com/kubernetes/kubernetes/issues/62773), or a bug in the kubevirt device plugins. Further I investigated these warnings: > {"component":"virt-handler","level":"warning","msg":"kubevirt.io/kubevirt/pkg/virt-handler/vm.go:514: watch of *v1.VirtualMachineInstance ended with: The resourceVersion for the provided watch is too old.","pos":"reflector.go:341","timestamp":"2019-02-28T07:23:13.502518Z"} They only appear every now and then (minutes in-between). That seems to be pretty normal and not a concern if it does not happen frequently: https://github.com/kubernetes/kubernetes/issues/22024.
No correlation between the restarts of the virt-handler and zero kvm capacity. All pods except app-node-44.scale-ci.example.com have currently zero kvm capacity. only 40 out of 122 pods virt-handler where restarted : root@master-0: ~ # oc get pods --all-namespaces -o wide | grep virt-handler | grep -v "0 8d" | grep -c virt-handler 40 root@master-0: ~ # oc get node | grep -c app-node 122
Guy, thanks for checking. I think I found it. It is related to kubelet restarts. To quote the k8s manual in https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#device-plugin-implementation: > A device plugin is expected to detect kubelet restarts and re-register itself with the new kubelet instance. In the current implementation, a new kubelet instance deletes all the existing Unix sockets under /var/lib/kubelet/device-plugins when it starts. A device plugin can monitor the deletion of its Unix socket and re-register itself upon such an event. virt-handler does not re-register itself.
I cannot find the kubelet log, not on /var/log/kubelet.log or at journal, can you share how to obtain it ? Also, no load was on the system so i wounder why is kubelet restarts...
Pan, can we add a known issue for this bug to CNV 1.4 (including the workaround). Details are provided in "Doc Text"
Fix is here: https://github.com/kubevirt/kubevirt/pull/2068
> I cannot find the kubelet log, not on /var/log/kubelet.log or at journal, can you share how to obtain it ? > Also, no load was on the system so i wounder why is kubelet restarts... @Guy, yes there is nothing specific in the logs. I could finally connect the dots because of this: * The virt-handler logs just showed that it started the device plugins * The socket device-plugins socket files were gone (that happens when the kubelet starts again) * The kubelet reported 0 devices * When you restart virt-handler the socket files reappear and the devices are offered again.
Hi Fabian, thanks for the info. Can you review the following text? * If the kubelet on a node crashes or restarts, this causes the kubelet to incorrectly report 0 KVM devices. Virtual machines are not properly scheduled on affected nodes. + Verify the number of devices that the kubelet reports by running `$ oc get node $NODE | grep devices.kubevirt`. The output on an affected node shows `devices.kubevirt.io/kvm: 0`. https://bugzilla.redhat.com/show_bug.cgi?id=1681175[(BZ#1681175)] + [NOTE] ==== As a workaround, kill the relevant `virt-handler` pod on the affected node. The pod automatically restarts, and the kubelet reports the correct number of available KVM devices. ====
Passing the needinfo on to Roman, Roman. Is this correct?
Looks correct.
Known Issue merged: https://github.com/openshift/openshift-docs/pull/13989 It is published at the bottom of the release notes here: https://docs.openshift.com/container-platform/3.11/cnv_release_notes/cnv_release_notes.html Let me know if anything else is needed. Thanks!
https://github.com/kubevirt/kubevirt/pull/2068 is merged.
Was tested by restarting the node and verifying the kvm devices where stayed on 110 and did not become zero. version tested on : virt-operator:v2.1.0-12 hyperconverged-cluster-operator:v2.1.0-16