Bug 1731819 - Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few migrations [NEEDINFO]
Summary: Migration pod become "Evicted" due to low on resource: ephemeral-storage on n...
Keywords:
Status: NEW
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: future
Assignee: Vladik Romanovsky
QA Contact: Israel Pinto
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-22 06:49 UTC by Yan Du
Modified: 2020-01-22 13:45 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
sgott: needinfo? (vromanso)
sgott: needinfo? (kbidarka)


Attachments (Terms of Use)
describe node (48.93 KB, text/plain)
2019-07-22 06:50 UTC, Yan Du
no flags Details
fedora vm (1.14 KB, text/plain)
2019-08-07 12:40 UTC, Yan Du
no flags Details
vm spec (1.92 KB, text/plain)
2019-08-07 13:45 UTC, Yan Du
no flags Details

Description Yan Du 2019-07-22 06:49:12 UTC
Description of problem:
Migration pod become "Evicted" due to low on resource: ephemeral-storage on node after running quite a few of migrations

Version-Release number of selected component (if applicable):
hyperconverged-cluster-operator:v2.0.0-32
virt-api:v2.0.0-39


How reproducible:
Always

Steps to Reproduce:
1. Create a namespace
2. Create a fedora vm
3. Do migrations 
4. Delete the migration/vm/namesapce
5. Repeat step1-4 quite a few times

Actual results:
# oc get pod
virt-launcher-vmb-rz8df   2/2     Running   0          10m
virt-launcher-vmb-sr8xx   0/2     Evicted   0          8m8s

# oc describe pod virt-launcher-vmb-sr8xx
Name:               virt-launcher-vmb-sr8xx
Namespace:          network-migration-test
Priority:           0
PriorityClassName:  <none>
Node:               working-jjh2c-worker-0-7fc89/
Start Time:         Sun, 21 Jul 2019 22:46:29 -0400
Labels:             kubevirt.io=virt-launcher
                    kubevirt.io/created-by=8e286d81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/migrationJobUID=e78e0e81-ac2a-11e9-9ac4-664f163f5f0f
                    kubevirt.io/nodeName=working-jjh2c-worker-0-rphvm
                    kubevirt.io/vm=fedora-vm
Annotations:        k8s.v1.cni.cncf.io/networks: [{"interface":"net1","name":"br1test","namespace":"network-migration-test"}]
                    k8s.v1.cni.cncf.io/networks-status: 
                    kubevirt.io/domain: vmb
                    kubevirt.io/migrationJobName: l2-migration
                    openshift.io/scc: privileged
                    traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0
Status:             Failed
Reason:             Evicted
Message:            The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0. 
IP:                 
Controlled By:      VirtualMachineInstance/vmb
Containers:
  volumecontainerdisk:
    Image:      quay.io/redhat/cnv-tests-fedora:30
    Port:       <none>
    Host Port:  <none>
    Command:
      /entry-point.sh
    Readiness:  exec [cat /tmp/healthy] delay=2s timeout=5s period=5s #success=2 #failure=5
    Environment:
      COPY_PATH:  /var/run/kubevirt-ephemeral-disks/container-disk-data/network-migration-test/vmb/disk_containerdisk/disk-image
    Mounts:
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
  compute:
    Image:      brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39
    Port:       <none>
    Host Port:  <none>
    Command:
      /usr/bin/virt-launcher
      --qemu-timeout
      5m
      --name
      vmb
      --uid
      8e286d81-ac2a-11e9-9ac4-664f163f5f0f
      --namespace
      network-migration-test
      --kubevirt-share-dir
      /var/run/kubevirt
      --ephemeral-disk-dir
      /var/run/kubevirt-ephemeral-disks
      --readiness-file
      /var/run/kubevirt-infra/healthy
      --grace-period-seconds
      15
      --hook-sidecars
      0
      --less-pvc-space-toleration
      10
    Limits:
      bridge.network.kubevirt.io/br1test:  1
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
    Requests:
      bridge.network.kubevirt.io/br1test:  1
      cpu:                                 100m
      devices.kubevirt.io/kvm:             1
      devices.kubevirt.io/tun:             1
      devices.kubevirt.io/vhost-net:       1
      memory:                              1208392Ki
    Readiness:                             exec [cat /var/run/kubevirt-infra/healthy] delay=2s timeout=5s period=2s #success=1 #failure=5
    Environment:
      KUBEVIRT_RESOURCE_NAME_br1test:  bridge.network.kubevirt.io/br1test
    Mounts:
      /var/run/kubevirt from virt-share-dir (rw)
      /var/run/kubevirt-ephemeral-disks from ephemeral-disks (rw)
      /var/run/kubevirt-infra from infra-ready-mount (rw)
      /var/run/libvirt from libvirt-runtime (rw)
Volumes:
  infra-ready-mount:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  virt-share-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt
    HostPathType:  
  libvirt-runtime:
    Type:    EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:  
  ephemeral-disks:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
QoS Class:       Burstable
Node-Selectors:  kubevirt.io/schedulable=true
Tolerations:     node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age    From                                   Message
  ----     ------     ----   ----                                   -------
  Normal   Scheduled  8m32s  default-scheduler                      Successfully assigned network-migration-test/virt-launcher-vmb-sr8xx to working-jjh2c-worker-0-7fc89
  Normal   Pulled     8m24s  kubelet, working-jjh2c-worker-0-7fc89  Container image "quay.io/redhat/cnv-tests-fedora:30" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container volumecontainerdisk
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container volumecontainerdisk
  Normal   Pulled     8m23s  kubelet, working-jjh2c-worker-0-7fc89  Container image "brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/container-native-virtualization/virt-launcher:v2.0.0-39" already present on machine
  Normal   Created    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Created container compute
  Normal   Started    8m23s  kubelet, working-jjh2c-worker-0-7fc89  Started container compute
  Warning  Unhealthy  8m20s  kubelet, working-jjh2c-worker-0-7fc89  Readiness probe failed: cat: /var/run/kubevirt-infra/healthy: No such file or directory
  Warning  Evicted    6m25s  kubelet, working-jjh2c-worker-0-7fc89  The node was low on resource: ephemeral-storage. Container volumecontainerdisk was using 4Ki, which exceeds its request of 0. Container compute was using 212Ki, which exceeds its request of 0.
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container volumecontainerdisk
  Normal   Killing    6m25s  kubelet, working-jjh2c-worker-0-7fc89  Stopping container compute


Expected results:
The ephemeral-storage on node should have a recycling mechanism after the resource is  deleted

Additional info:

Comment 1 Yan Du 2019-07-22 06:50:02 UTC
Created attachment 1592523 [details]
describe  node

Comment 2 Adam Litke 2019-07-31 12:35:40 UTC
Moving back to virtualization.  This is happening during live migration.  From a cursory look it seems that kubernetes is responding in a somewhat surprising way to the namespace being deleted during the migration.

Comment 3 sgott 2019-08-07 12:10:51 UTC
Yan, could you please provide the VM definition used for this?

Comment 4 Yan Du 2019-08-07 12:40:08 UTC
Created attachment 1601356 [details]
fedora vm

Comment 5 Yan Du 2019-08-07 12:42:26 UTC
Hi, Stuart, please get the vm yaml file in attachment.

Comment 6 Yan Du 2019-08-07 13:45:18 UTC
Created attachment 1601367 [details]
vm spec

Stuart, actually I met the issue when I debugging the network migration automation script, and the vm is configured with multus network based on the fedora template in #Comment4. I attached it as well.

Comment 7 sgott 2019-08-07 20:21:24 UTC
Vladik,

Could you investigate this?

Comment 11 sgott 2020-01-22 13:45:07 UTC
Kedar, is this still an issue?


Note You need to log in before you can comment on or make changes to this bug.