Bug 1731819

Summary:

Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few migrations

Product:

Container Native Virtualization (CNV)

Reporter:

Yan Du <yadu>

Component:

Virtualization

Assignee:

Itamar Holder <iholder>

Status:

CLOSED ERRATA

QA Contact:

Israel Pinto <ipinto>

Severity:

high

Docs Contact:

Priority:

high

Version:

2.0

CC:

cnv-qe-bugs, dvossel, fdeutsch, iholder, ipinto, kbidarka, ncredi, rmohr, roman, sgordon, sgott, vromanso, zpeng

Target Milestone:

---

Target Release:

4.8.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

hco-bundle-registry-container-v4.8.0-347 virt-operator-container-v4.8.0-58

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2021-07-27 14:20:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
describe node	none
fedora vm	none
vm spec	none

Description Yan Du 2019-07-22 06:49:12 UTC

Description of problem:
Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few of migrations

Version-Release number of selected component (if applicable):
hyperconverged-cluster-operator:v2.0.0-32
virt-api:v2.0.0-39


How reproducible:
Always

Steps to Reproduce:
1. Create a namespace
2. Create a fedora vm
3. Do migrations 
4. Delete the migration/vm/namesapce
5. Repeat step1-4 quite a few times

Actual results:
# oc get pod
virt-launcher-vmb-rz8df   2/2     Running   0          10m
virt-launcher-vmb-sr8xx   0/2     Evicted   0          8m8s

# oc describe pod virt-launcher-vmb-sr8xx
Name:               virt-launcher-vmb-sr8xx
Namespace:          network-migration-test
Priority:           0
PriorityClassName:  <none>
Node:               working-jjh2c-worker-0-7fc89/
Start Time:         Sun, 21 Jul 2019 22:46:29 -0400
Labels:             kubevirt.io=virt-launcher
                    kubevirt.io/created-by=8e286d81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/migrationJobUID=e78e0e81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/nodeName=working-jjh2c-worker-0-rphvm
                    kubevirt.io/vm=fedora-vm
Annotations:        k8s.v1.cni.cncf.io/networks: [{"interface":"net1","name":"br1test","namespace":"network-migration-test"}]
                    k8s.v1.cni.cncf.io/networks-status: 
                    kubevirt.io/domain: vmb
                    kubevirt.io/migrationJobName: l2-migration
                    openshift.io/scc: privileged
                    traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0. 
IP:                 
Controlled By:      VirtualMachineInstance/vmb
Containers:
  volumecontainerdisk:
    Image:      quay.io/redhat/cnv-tests-fedora:30
    Port:       <none>
    Host Port:  <none>
    Command:
      /entry-point.sh
    Readiness:  exec [cat /tmp/healthy] delay=2s timeout=5s period=5s #success=2 #failure=5
    Environment:
      COPY_PATH:  /var/run/kubevirt-ephemeral-disks/container-disk-data/network-migration-test/vmb/disk_containerdisk/disk-image
    Mounts:
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
  compute:
    Image:      brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39
    Port:       <none>
    Host Port:  <none>
    Command:
      /usr/bin/virt-launcher
      --qemu-timeout
      5m
      --name
      vmb
      --uid
      8e286d81-ac2a-11e9-9ac4-664f163f5f0f
      --namespace
      network-migration-test
      --kubevirt-share-dir
      /var/run/kubevirt
      --ephemeral-disk-dir
      /var/run/kubevirt-ephemeral-disks
      --readiness-file
      /var/run/kubevirt-infra/healthy
      --grace-period-seconds
      15
      --hook-sidecars
      0
      --less-pvc-space-toleration
      10
    Limits:
      bridge.network.kubevirt.io/br1test:  1
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
    Requests:
      bridge.network.kubevirt.io/br1test:  1
      cpu:                                 100m
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
      memory:                              1208392Ki
    Readiness:                             exec [cat /var/run/kubevirt-infra/healthy] delay=2s timeout=5s period=2s #success=1 #failure=5
    Environment:
      KUBEVIRT_RESOURCE_NAME_br1test:  bridge.network.kubevirt.io/br1test
    Mounts:
      /var/run/kubevirt from virt-share-dir (rw)
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
      /var/run/kubevirt-infra from infra-ready-mount (rw)
      /var/run/libvirt from libvirt-runtime (rw)
Volumes:
  infra-ready-mount:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  virt-share-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt
    HostPathType:  
  libvirt-runtime:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  ephemeral-disks:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
QoS Class:       Burstable
Node-Selectors:  kubevirt.io/schedulable=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age    From                                   Message
  ----     ------     ----   ----                                   -------
  Normal   Scheduled  8m32s  default-scheduler                      Successfully assigned network-migration-test/virt-launcher-vmb-sr8xx to working-jjh2c-worker-0-7fc89
  Normal   Pulled     8m24s  kubelet, working-jjh2c-worker-0-7fc89  Container image "quay.io/redhat/cnv-tests-fedora:30" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container volumecontainerdisk
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container volumecontainerdisk
  Normal   Pulled     8m23s  kubelet, working-jjh2c-worker-0-7fc89  Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container compute
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container compute
  Warning  Unhealthy  8m20s  kubelet, working-jjh2c-worker-0-7fc89  Readiness probe failed: cat: /var/run/kubevirt-infra/healthy: No such file or directory
  Warning  Evicted    6m25s  kubelet, working-jjh2c-worker-0-7fc89  The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0.
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container volumecontainerdisk
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container compute


Expected results:
The ephemeral-storage on node should have a recycling mechanism after the resource is  deleted

Additional info:

Comment 1 Yan Du 2019-07-22 06:50:02 UTC

Created attachment 1592523 [details]
describe  node

Comment 2 Adam Litke 2019-07-31 12:35:40 UTC

Moving back to virtualization.  This is happening during live migration.  From a cursory look it seems that kubernetes is responding in a somewhat surprising way to the namespace being deleted during the migration.

Comment 3 sgott 2019-08-07 12:10:51 UTC

Yan, could you please provide the VM definition used for this?

Comment 4 Yan Du 2019-08-07 12:40:08 UTC

Created attachment 1601356 [details]
fedora vm

Comment 5 Yan Du 2019-08-07 12:42:26 UTC

Hi, Stuart, please get the vm yaml file in attachment.

Comment 6 Yan Du 2019-08-07 13:45:18 UTC

Created attachment 1601367 [details]
vm spec

Stuart, actually I met the issue when I debugging the network migration automation script, and the vm is configured with multus network based on the fedora template in #Comment4. I attached it as well.

Comment 7 sgott 2019-08-07 20:21:24 UTC

Vladik,

Could you investigate this?

Comment 11 sgott 2020-01-22 13:45:07 UTC

Kedar, is this still an issue?

Comment 12 Fabian Deutsch 2020-08-28 08:14:57 UTC

"The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0."

This means that pods got evicted because they were using ephemral storage that they did not request.
KubeVirt should reflect that it actually is consuming epheraml resources in these two containers by requesting an educated-guess amount of ephemeral storage, let's say: 1Mi.

Comment 13 sgott 2020-11-18 13:38:09 UTC

Is this something that we've addressed in code already?

Comment 15 Roman Mohr 2021-01-26 08:17:56 UTC

(In reply to Fabian Deutsch from comment #12)
> "The node was low on resource: ephemeral-storage. Container
> volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container
> compute was using 212Ki, which exceeds its request of 0."
> 
> This means that pods got evicted because they were using ephemral storage
> that they did not request.
> KubeVirt should reflect that it actually is consuming epheraml resources in
> these two containers by requesting an educated-guess amount of ephemeral
> storage, let's say: 1Mi.

I like Fabians suggestion. Sounds like that is the proper thing to to.

Comment 16 David Vossel 2021-02-11 22:14:39 UTC

set evictionStrategy: LiveMigrate on the VMI spec and the migration pod likely won't get evicted.

Comment 17 David Vossel 2021-02-11 22:18:00 UTC

> I like Fabians suggestion. Sounds like that is the proper thing to to.


@rmohr you saw the vmi yaml didn't have EvictionStrategy set, right?

Comment 18 Vladik Romanovsky 2021-02-12 02:47:03 UTC

This was reported against 2.0... did we even have an evictionStrategy back then?

We also copied the container disk then. not sure if this is relevant today.

I was debating whether we should still request a minor amount of ephemeral-storage, we do have temporary files, iirc. It's probably not enough to cause an eviction.

Comment 19 Roman Mohr 2021-02-12 09:06:05 UTC

(In reply to David Vossel from comment #17)
> > I like Fabians suggestion. Sounds like that is the proper thing to to.
> 
> 
> @rmohr you saw the vmi yaml didn't have EvictionStrategy set,
> right?

Yes the attached yaml does not have it set. I think this eviction is coming from the kubelet and it will happen  independent of PDBs. I think that if you are in the QoS setting of Burstable, you can get deleted if you are above your requests independent of PDBs. I am almost sure but would have to try it again.

Independent of that, I think that our workloads should by default never get in the situation to be evicted (independend if /evict or a delete from the kubelet) due to infra-overhead.

Comment 20 Roman Mohr 2021-02-12 09:46:50 UTC

(In reply to Vladik Romanovsky from comment #18)
> This was reported against 2.0... did we even have an evictionStrategy back
> then?
> 
> We also copied the container disk then. not sure if this is relevant today.
> 
> I was debating whether we should still request a minor amount of
> ephemeral-storage, we do have temporary files, iirc. It's probably not
> enough to cause an eviction.

I think that every Ki above the request will make the pod a candidate to evict it. 
We have the issue that some of our disks `ephemeral`, `containerDisk` and `emptyDisk` for instance, just write into emptyDirs. In that case I agree that users have to add the request to the VMI to be guarded against this type of eviction.

However I would still want to add a request of e.g. 5Mi in general to cover our overhead. Since I am pretty sure that every Ki above the limit is too much.

Comment 21 David Vossel 2021-02-12 15:03:40 UTC

>  I think this eviction is coming from the kubelet and it will happen  independent of PDBs. I think that if you are in the QoS setting of Burstable, you can get deleted if you are above your requests independent of PDBs. I am almost sure but would have to try it again.

I understand now and agree that adding a storage request is a good first step.

Comment 22 Kedar Bidarkar 2021-03-10 13:38:36 UTC

Do you have a link to the PR, that fixes this bug?

Comment 23 Itamar Holder 2021-03-11 07:22:31 UTC

(In reply to Kedar Bidarkar from comment #22)
> Do you have a link to the PR, that fixes this bug?

Yes I do: https://github.com/kubevirt/kubevirt/pull/5013.

Comment 24 zhe peng 2021-06-02 08:59:12 UTC

verify with build 
hco-bundle-registry-container-v4.8.0-367
virt-operator-container-v4.8.0-58

step:
1. Create a namespace
2. Create a fedora vm
3. Do migrations 
4. Delete the migration/vm/namesapce
5. Repeat step1-4 quite a few times

run step 1-4 >10 times
no Evicted pod found
NAME                            READY   STATUS      RESTARTS   AGE
virt-launcher-vm-fedora-k55nl   1/1     Running     0          7m43s
virt-launcher-vm-fedora-lvkmj   0/1     Completed   0          19m
virt-launcher-vm-fedora-rxjgb   0/1     Completed   0          18m

Comment 27 errata-xmlrpc 2021-07-27 14:20:49 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Virtualization 4.8.0 Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2920