Bug 1731819 - Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few migrations [NEEDINFO]
Summary: Migration pod become "Evicted" due to low on resource: ephemeral-storage on n...
Keywords:
Status: POST
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Itamar Holder
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-22 06:49 UTC by Yan Du
Modified: 2021-02-12 15:03 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
sgott: needinfo? (kbidarka)


Attachments (Terms of Use)
describe node (48.93 KB, text/plain)
2019-07-22 06:50 UTC, Yan Du
no flags Details
fedora vm (1.14 KB, text/plain)
2019-08-07 12:40 UTC, Yan Du
no flags Details
vm spec (1.92 KB, text/plain)
2019-08-07 13:45 UTC, Yan Du
no flags Details

Description Yan Du 2019-07-22 06:49:12 UTC
Description of problem:
Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few of migrations

Version-Release number of selected component (if applicable):
hyperconverged-cluster-operator:v2.0.0-32
virt-api:v2.0.0-39


How reproducible:
Always

Steps to Reproduce:
1. Create a namespace
2. Create a fedora vm
3. Do migrations 
4. Delete the migration/vm/namesapce
5. Repeat step1-4 quite a few times

Actual results:
# oc get pod
virt-launcher-vmb-rz8df   2/2     Running   0          10m
virt-launcher-vmb-sr8xx   0/2     Evicted   0          8m8s

# oc describe pod virt-launcher-vmb-sr8xx
Name:               virt-launcher-vmb-sr8xx
Namespace:          network-migration-test
Priority:           0
PriorityClassName:  <none>
Node:               working-jjh2c-worker-0-7fc89/
Start Time:         Sun, 21 Jul 2019 22:46:29 -0400
Labels:             kubevirt.io=virt-launcher
                    kubevirt.io/created-by=8e286d81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/migrationJobUID=e78e0e81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/nodeName=working-jjh2c-worker-0-rphvm
                    kubevirt.io/vm=fedora-vm
Annotations:        k8s.v1.cni.cncf.io/networks: [{"interface":"net1","name":"br1test","namespace":"network-migration-test"}]
                    k8s.v1.cni.cncf.io/networks-status: 
                    kubevirt.io/domain: vmb
                    kubevirt.io/migrationJobName: l2-migration
                    openshift.io/scc: privileged
                    traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0. 
IP:                 
Controlled By:      VirtualMachineInstance/vmb
Containers:
  volumecontainerdisk:
    Image:      quay.io/redhat/cnv-tests-fedora:30
    Port:       <none>
    Host Port:  <none>
    Command:
      /entry-point.sh
    Readiness:  exec [cat /tmp/healthy] delay=2s timeout=5s period=5s #success=2 #failure=5
    Environment:
      COPY_PATH:  /var/run/kubevirt-ephemeral-disks/container-disk-data/network-migration-test/vmb/disk_containerdisk/disk-image
    Mounts:
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
  compute:
    Image:      brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39
    Port:       <none>
    Host Port:  <none>
    Command:
      /usr/bin/virt-launcher
      --qemu-timeout
      5m
      --name
      vmb
      --uid
      8e286d81-ac2a-11e9-9ac4-664f163f5f0f
      --namespace
      network-migration-test
      --kubevirt-share-dir
      /var/run/kubevirt
      --ephemeral-disk-dir
      /var/run/kubevirt-ephemeral-disks
      --readiness-file
      /var/run/kubevirt-infra/healthy
      --grace-period-seconds
      15
      --hook-sidecars
      0
      --less-pvc-space-toleration
      10
    Limits:
      bridge.network.kubevirt.io/br1test:  1
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
    Requests:
      bridge.network.kubevirt.io/br1test:  1
      cpu:                                 100m
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
      memory:                              1208392Ki
    Readiness:                             exec [cat /var/run/kubevirt-infra/healthy] delay=2s timeout=5s period=2s #success=1 #failure=5
    Environment:
      KUBEVIRT_RESOURCE_NAME_br1test:  bridge.network.kubevirt.io/br1test
    Mounts:
      /var/run/kubevirt from virt-share-dir (rw)
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
      /var/run/kubevirt-infra from infra-ready-mount (rw)
      /var/run/libvirt from libvirt-runtime (rw)
Volumes:
  infra-ready-mount:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  virt-share-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt
    HostPathType:  
  libvirt-runtime:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  ephemeral-disks:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
QoS Class:       Burstable
Node-Selectors:  kubevirt.io/schedulable=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age    From                                   Message
  ----     ------     ----   ----                                   -------
  Normal   Scheduled  8m32s  default-scheduler                      Successfully assigned network-migration-test/virt-launcher-vmb-sr8xx to working-jjh2c-worker-0-7fc89
  Normal   Pulled     8m24s  kubelet, working-jjh2c-worker-0-7fc89  Container image "quay.io/redhat/cnv-tests-fedora:30" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container volumecontainerdisk
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container volumecontainerdisk
  Normal   Pulled     8m23s  kubelet, working-jjh2c-worker-0-7fc89  Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container compute
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container compute
  Warning  Unhealthy  8m20s  kubelet, working-jjh2c-worker-0-7fc89  Readiness probe failed: cat: /var/run/kubevirt-infra/healthy: No such file or directory
  Warning  Evicted    6m25s  kubelet, working-jjh2c-worker-0-7fc89  The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0.
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container volumecontainerdisk
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container compute


Expected results:
The ephemeral-storage on node should have a recycling mechanism after the resource is  deleted

Additional info:

Comment 1 Yan Du 2019-07-22 06:50:02 UTC
Created attachment 1592523 [details]
describe  node

Comment 2 Adam Litke 2019-07-31 12:35:40 UTC
Moving back to virtualization.  This is happening during live migration.  From a cursory look it seems that kubernetes is responding in a somewhat surprising way to the namespace being deleted during the migration.

Comment 3 sgott 2019-08-07 12:10:51 UTC
Yan, could you please provide the VM definition used for this?

Comment 4 Yan Du 2019-08-07 12:40:08 UTC
Created attachment 1601356 [details]
fedora vm

Comment 5 Yan Du 2019-08-07 12:42:26 UTC
Hi, Stuart, please get the vm yaml file in attachment.

Comment 6 Yan Du 2019-08-07 13:45:18 UTC
Created attachment 1601367 [details]
vm spec

Stuart, actually I met the issue when I debugging the network migration automation script, and the vm is configured with multus network based on the fedora template in #Comment4. I attached it as well.

Comment 7 sgott 2019-08-07 20:21:24 UTC
Vladik,

Could you investigate this?

Comment 11 sgott 2020-01-22 13:45:07 UTC
Kedar, is this still an issue?

Comment 12 Fabian Deutsch 2020-08-28 08:14:57 UTC
"The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0."

This means that pods got evicted because they were using ephemral storage that they did not request.
KubeVirt should reflect that it actually is consuming epheraml resources in these two containers by requesting an educated-guess amount of ephemeral storage, let's say: 1Mi.

Comment 13 sgott 2020-11-18 13:38:09 UTC
Is this something that we've addressed in code already?

Comment 15 Roman Mohr 2021-01-26 08:17:56 UTC
(In reply to Fabian Deutsch from comment #12)
> "The node was low on resource: ephemeral-storage. Container
> volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container
> compute was using 212Ki, which exceeds its request of 0."
> 
> This means that pods got evicted because they were using ephemral storage
> that they did not request.
> KubeVirt should reflect that it actually is consuming epheraml resources in
> these two containers by requesting an educated-guess amount of ephemeral
> storage, let's say: 1Mi.

I like Fabians suggestion. Sounds like that is the proper thing to to.

Comment 16 David Vossel 2021-02-11 22:14:39 UTC
set evictionStrategy: LiveMigrate on the VMI spec and the migration pod likely won't get evicted.

Comment 17 David Vossel 2021-02-11 22:18:00 UTC
> I like Fabians suggestion. Sounds like that is the proper thing to to.


@rmohr@redhat.com you saw the vmi yaml didn't have EvictionStrategy set, right?

Comment 18 Vladik Romanovsky 2021-02-12 02:47:03 UTC
This was reported against 2.0... did we even have an evictionStrategy back then?

We also copied the container disk then. not sure if this is relevant today.

I was debating whether we should still request a minor amount of ephemeral-storage, we do have temporary files, iirc. It's probably not enough to cause an eviction.

Comment 19 Roman Mohr 2021-02-12 09:06:05 UTC
(In reply to David Vossel from comment #17)
> > I like Fabians suggestion. Sounds like that is the proper thing to to.
> 
> 
> @rmohr@redhat.com you saw the vmi yaml didn't have EvictionStrategy set,
> right?

Yes the attached yaml does not have it set. I think this eviction is coming from the kubelet and it will happen  independent of PDBs. I think that if you are in the QoS setting of Burstable, you can get deleted if you are above your requests independent of PDBs. I am almost sure but would have to try it again.

Independent of that, I think that our workloads should by default never get in the situation to be evicted (independend if /evict or a delete from the kubelet) due to infra-overhead.

Comment 20 Roman Mohr 2021-02-12 09:46:50 UTC
(In reply to Vladik Romanovsky from comment #18)
> This was reported against 2.0... did we even have an evictionStrategy back
> then?
> 
> We also copied the container disk then. not sure if this is relevant today.
> 
> I was debating whether we should still request a minor amount of
> ephemeral-storage, we do have temporary files, iirc. It's probably not
> enough to cause an eviction.

I think that every Ki above the request will make the pod a candidate to evict it. 
We have the issue that some of our disks `ephemeral`, `containerDisk` and `emptyDisk` for instance, just write into emptyDirs. In that case I agree that users have to add the request to the VMI to be guarded against this type of eviction.

However I would still want to add a request of e.g. 5Mi in general to cover our overhead. Since I am pretty sure that every Ki above the limit is too much.

Comment 21 David Vossel 2021-02-12 15:03:40 UTC
>  I think this eviction is coming from the kubelet and it will happen  independent of PDBs. I think that if you are in the QoS setting of Burstable, you can get deleted if you are above your requests independent of PDBs. I am almost sure but would have to try it again.

I understand now and agree that adding a storage request is a good first step.


Note You need to log in before you can comment on or make changes to this bug.