Bug 1812970 - virt-launcher pod hit OOMKilled when dedicatedCpuPlacement set to true
Summary: virt-launcher pod hit OOMKilled when dedicatedCpuPlacement set to true
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.3.0
Hardware: x86_64
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 2.3.0
Assignee: Vladik Romanovsky
QA Contact: Kedar Bidarkar
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 15:36 UTC by zenghui.shi
Modified: 2020-05-04 19:11 UTC (History)
13 users (show)

Fixed In Version: hco-bundle-registry-container-v2.3.0-70 virt-operator-container-v2.3.0-36
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-04 19:10:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github kubevirt kubevirt pull 3162 0 None closed Allow a manual adjustment of guest memory overhead 2020-12-21 05:19:26 UTC
Github kubevirt kubevirt pull 3178 0 None closed Increase memory overhead for VFIO devices 2020-12-21 05:19:28 UTC
Red Hat Product Errata RHEA-2020:2011 0 None None None 2020-05-04 19:11:10 UTC

Description zenghui.shi 2020-03-12 15:36:34 UTC
Description of problem:

When creating a VMI with dedicated CPUs and hugepages using CNV 2.3.0 (installed via subscription), virt-launcher-* pod entered OOMKilled status. 

Version-Release number of selected component (if applicable):
CNV 2.3.0

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:

oc get pods
NAME                            READY   STATUS      RESTARTS   AGE
virt-launcher-vmi-sriov-t9xd7   1/2     OOMKilled   0          11m

# oc get vmi
NAME        AGE   PHASE    IP            NODENAME
vmi-sriov   11m   Failed   10.128.2.23   dev-worker-1



Expected results:


Additional info:

Additional info includes 1) VMI spec, 2) Pod yaml and 3)Node status.


1. VMI spec:

---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-sriov
  name: vmi-sriov
spec:
  domain:
    cpu:
      sockets: 6
      cores: 1
      threads: 1
      dedicatedCpuPlacement: true
    memory:
      guest: 2Gi
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      interfaces:
      - masquerade: {}
        name: default
      - name: sriov-net
        sriov: {}
      rng: {}
    machine:
      type: ""
  networks:
  - name: default
    pod: {}
  - multus:
      networkName: default/sriov-mlx  # change me
    name: sriov-net
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: kubevirt/fedora-cloud-container-disk-demo:latest
    name: containerdisk
  - cloudInitNoCloud:
      userData: |
        #!/bin/bash
        echo "fedora" |passwd fedora --stdin
        dhclient eth1
    name: cloudinitdisk


2. Pod yaml:

# oc get pods -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      k8s.ovn.org/pod-networks: '{"default":{"ip_address":"10.128.2.23/23","mac_address":"ae:4e:81:80:02:18","gateway_ip":"10.128.2.1"}}'
      k8s.v1.cni.cncf.io/networks: '[{"name":"sriov-mlx","namespace":"default","mac":"02:a2:53:00:00:0c","interface":"net1"}]'
      k8s.v1.cni.cncf.io/networks-status: |-
        [{
            "name": "ovn-kubernetes",
            "interface": "eth0",
            "ips": [
                "10.128.2.23"
            ],
            "mac": "ae:4e:81:80:02:18",
            "dns": {}
        },{
            "name": "sriov-net",
            "interface": "net1",
            "dns": {}
        }]
      kubevirt.io/domain: vmi-sriov
      traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0
    creationTimestamp: "2020-03-12T15:19:55Z"
    generateName: virt-launcher-vmi-sriov-
    labels:
      kubevirt.io: virt-launcher
      kubevirt.io/created-by: 2b023657-496f-4b46-9ac1-7968574798be
      special: vmi-sriov
    name: virt-launcher-vmi-sriov-t9xd7
    namespace: default
    ownerReferences:
    - apiVersion: kubevirt.io/v1alpha3
      blockOwnerDeletion: true
      controller: true
      kind: VirtualMachineInstance
      name: vmi-sriov
      uid: 2b023657-496f-4b46-9ac1-7968574798be
    resourceVersion: "146263"
    selfLink: /api/v1/namespaces/default/pods/virt-launcher-vmi-sriov-t9xd7
    uid: 6a35a910-8d3c-4987-9bff-e66ec4214b0a
  spec:
    automountServiceAccountToken: false
    containers:
    - command:
      - /usr/bin/virt-launcher
      - --qemu-timeout
      - 5m
      - --name
      - vmi-sriov
      - --uid
      - 2b023657-496f-4b46-9ac1-7968574798be
      - --namespace
      - default
      - --kubevirt-share-dir
      - /var/run/kubevirt
      - --ephemeral-disk-dir
      - /var/run/kubevirt-ephemeral-disks
      - --container-disk-dir
      - /var/run/kubevirt/container-disks
      - --readiness-file
      - /var/run/kubevirt-infra/healthy
      - --grace-period-seconds
      - "15"
      - --hook-sidecars
      - "0"
      - --less-pvc-space-toleration
      - "10"
      env:
      - name: KUBEVIRT_RESOURCE_NAME_sriov-net
        value: openshift.io/mlxnics
      image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499
      imagePullPolicy: IfNotPresent
      name: compute
      readinessProbe:
        exec:
          command:
          - cat
          - /var/run/kubevirt-infra/healthy
        failureThreshold: 5
        initialDelaySeconds: 4
        periodSeconds: 1
        successThreshold: 1
        timeoutSeconds: 5
      resources:
        limits:
          cpu: "6"
          devices.kubevirt.io/kvm: "1"
          devices.kubevirt.io/tun: "1"
          devices.kubevirt.io/vhost-net: "1"
          memory: 2259016Ki
          openshift.io/mlxnics: "1"
        requests:
          cpu: "6"
          devices.kubevirt.io/kvm: "1"
          devices.kubevirt.io/tun: "1"
          devices.kubevirt.io/vhost-net: "1"
          memory: 2259016Ki
          openshift.io/mlxnics: "1"
      securityContext:
        capabilities:
          add:
          - NET_ADMIN
          - SYS_NICE
        privileged: false
        runAsUser: 0
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/run/kubevirt-ephemeral-disks
        name: ephemeral-disks
      - mountPath: /var/run/kubevirt/container-disks
        mountPropagation: HostToContainer
        name: container-disks
      - mountPath: /var/run/kubevirt
        name: virt-share-dir
      - mountPath: /var/run/libvirt
        name: libvirt-runtime
      - mountPath: /sys/devices/
        name: pci-devices
      - mountPath: /var/run/kubevirt-infra
        name: infra-ready-mount
      - mountPath: /etc/podnetinfo
        name: podnetinfo
    - args:
      - --copy-path
      - /var/run/kubevirt-ephemeral-disks/container-disk-data/2b023657-496f-4b46-9ac1-7968574798be/disk_0
      command:
      - /usr/bin/container-disk
      image: kubevirt/fedora-cloud-container-disk-demo:latest
      imagePullPolicy: Always
      name: volumecontainerdisk
      readinessProbe:
        exec:
          command:
          - /usr/bin/container-disk
          - --health-check
        failureThreshold: 5
        initialDelaySeconds: 1
        periodSeconds: 1
        successThreshold: 1
        timeoutSeconds: 1
      resources:
        limits:
          cpu: 10m
          memory: 40M
        requests:
          cpu: 10m
          memory: 40M
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /var/run/kubevirt-ephemeral-disks/container-disk-data/2b023657-496f-4b46-9ac1-7968574798be
        name: container-disks
      - mountPath: /usr/bin
        name: virt-bin-share-dir
    dnsPolicy: ClusterFirst
    enableServiceLinks: true
    hostname: vmi-sriov
    imagePullSecrets:
    - name: default-dockercfg-44vbq
    nodeName: dev-worker-1
    nodeSelector:
      cpumanager: "true"
      kubevirt.io/schedulable: "true"
    priority: 0
    restartPolicy: Never
    schedulerName: default-scheduler
    securityContext:
      fsGroup: 107
      runAsUser: 0
      seLinuxOptions:
        type: spc_t
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoExecute
      key: node.kubernetes.io/not-ready
      operator: Exists
      tolerationSeconds: 300
    - effect: NoExecute
      key: node.kubernetes.io/unreachable
      operator: Exists
      tolerationSeconds: 300
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    volumes:
    - hostPath:
        path: /sys/devices/
        type: ""
      name: pci-devices
    - emptyDir: {}
      name: infra-ready-mount
    - hostPath:
        path: /var/run/kubevirt
        type: ""
      name: virt-share-dir
    - hostPath:
        path: /var/lib/kubevirt/init/usr/bin
        type: ""
      name: virt-bin-share-dir
    - emptyDir: {}
      name: libvirt-runtime
    - emptyDir: {}
      name: ephemeral-disks
    - hostPath:
        path: /var/run/kubevirt/container-disks/2b023657-496f-4b46-9ac1-7968574798be
        type: ""
      name: container-disks
    - downwardAPI:
        defaultMode: 420
        items:
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.labels
          path: labels
        - fieldRef:
            apiVersion: v1
            fieldPath: metadata.annotations
          path: annotations
      name: podnetinfo
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: "2020-03-12T15:18:35Z"
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: "2020-03-12T15:18:53Z"
      message: 'containers with unready status: [compute]'
      reason: ContainersNotReady
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: "2020-03-12T15:18:53Z"
      message: 'containers with unready status: [compute]'
      reason: ContainersNotReady
      status: "False"
      type: ContainersReady
    - lastProbeTime: null
      lastTransitionTime: "2020-03-12T15:19:54Z"
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: cri-o://111f69a3e7b584615596a6ca7620c8072c859d53ee11c2634374e6f6f4b2b692
      image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499
      imageID: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499
      lastState: {}
      name: compute
      ready: false
      restartCount: 0
      started: false
      state:
        terminated:
          containerID: cri-o://111f69a3e7b584615596a6ca7620c8072c859d53ee11c2634374e6f6f4b2b692
          exitCode: 137
          finishedAt: "2020-03-12T15:18:52Z"
          reason: OOMKilled
          startedAt: "2020-03-12T15:18:37Z"
    - containerID: cri-o://44a4205485d9afcb16092dbdc45cafc6b9ada75245d4a8a852dd9797c4c0834a
      image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest
      imageID: docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:1d4f6f6d52974db84d2e1a031b6f634254fd97823c05d13d98d124846b001d0a
      lastState: {}
      name: volumecontainerdisk
      ready: true
      restartCount: 0
      started: true
      state:
        running:
          startedAt: "2020-03-12T15:18:44Z"
    hostIP: 192.168.111.16
    phase: Running
    podIP: 10.128.2.23
    podIPs:
    - ip: 10.128.2.23
    qosClass: Guaranteed
    startTime: "2020-03-12T15:18:35Z"
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


3. Node status:

# oc describe node dev-worker-1
Name:               dev-worker-1
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    cpumanager=true
                    feature.node.kubernetes.io/cpu-feature-3dnowprefetch=true
                    feature.node.kubernetes.io/cpu-feature-abm=true
                    feature.node.kubernetes.io/cpu-feature-adx=true
                    feature.node.kubernetes.io/cpu-feature-aes=true
                    feature.node.kubernetes.io/cpu-feature-arat=true
                    feature.node.kubernetes.io/cpu-feature-avx=true
                    feature.node.kubernetes.io/cpu-feature-avx2=true
                    feature.node.kubernetes.io/cpu-feature-avx512bw=true
                    feature.node.kubernetes.io/cpu-feature-avx512cd=true
                    feature.node.kubernetes.io/cpu-feature-avx512dq=true
                    feature.node.kubernetes.io/cpu-feature-avx512f=true
                    feature.node.kubernetes.io/cpu-feature-avx512vl=true
                    feature.node.kubernetes.io/cpu-feature-bmi1=true
                    feature.node.kubernetes.io/cpu-feature-bmi2=true
                    feature.node.kubernetes.io/cpu-feature-clwb=true
                    feature.node.kubernetes.io/cpu-feature-erms=true
                    feature.node.kubernetes.io/cpu-feature-f16c=true
                    feature.node.kubernetes.io/cpu-feature-fma=true
                    feature.node.kubernetes.io/cpu-feature-fsgsbase=true
                    feature.node.kubernetes.io/cpu-feature-hle=true
                    feature.node.kubernetes.io/cpu-feature-invpcid=true
                    feature.node.kubernetes.io/cpu-feature-movbe=true
                    feature.node.kubernetes.io/cpu-feature-mpx=true
                    feature.node.kubernetes.io/cpu-feature-pcid=true
                    feature.node.kubernetes.io/cpu-feature-pclmuldq=true
                    feature.node.kubernetes.io/cpu-feature-pdpe1gb=true
                    feature.node.kubernetes.io/cpu-feature-popcnt=true
                    feature.node.kubernetes.io/cpu-feature-rdrand=true
                    feature.node.kubernetes.io/cpu-feature-rdseed=true
                    feature.node.kubernetes.io/cpu-feature-rdtscp=true
                    feature.node.kubernetes.io/cpu-feature-rtm=true
                    feature.node.kubernetes.io/cpu-feature-smap=true
                    feature.node.kubernetes.io/cpu-feature-smep=true
                    feature.node.kubernetes.io/cpu-feature-spec-ctrl=true
                    feature.node.kubernetes.io/cpu-feature-sse4.2=true
                    feature.node.kubernetes.io/cpu-feature-svm=true
                    feature.node.kubernetes.io/cpu-feature-tsc-deadline=true
                    feature.node.kubernetes.io/cpu-feature-vme=true
                    feature.node.kubernetes.io/cpu-feature-x2apic=true
                    feature.node.kubernetes.io/cpu-feature-xgetbv1=true
                    feature.node.kubernetes.io/cpu-feature-xsave=true
                    feature.node.kubernetes.io/cpu-feature-xsavec=true
                    feature.node.kubernetes.io/cpu-feature-xsaveopt=true
                    feature.node.kubernetes.io/cpu-model-Broadwell=true
                    feature.node.kubernetes.io/cpu-model-Broadwell-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Broadwell-noTSX=true
                    feature.node.kubernetes.io/cpu-model-Broadwell-noTSX-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Haswell=true
                    feature.node.kubernetes.io/cpu-model-Haswell-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Haswell-noTSX=true
                    feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS=true
                    feature.node.kubernetes.io/cpu-model-IvyBridge=true
                    feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Nehalem=true
                    feature.node.kubernetes.io/cpu-model-Nehalem-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Opteron_G1=true
                    feature.node.kubernetes.io/cpu-model-Opteron_G2=true
                    feature.node.kubernetes.io/cpu-model-Penryn=true
                    feature.node.kubernetes.io/cpu-model-SandyBridge=true
                    feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Skylake-Client=true
                    feature.node.kubernetes.io/cpu-model-Skylake-Client-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Skylake-Server=true
                    feature.node.kubernetes.io/cpu-model-Skylake-Server-IBRS=true
                    feature.node.kubernetes.io/cpu-model-Westmere=true
                    feature.node.kubernetes.io/cpu-model-Westmere-IBRS=true
                    feature.node.kubernetes.io/cpu-model-kvm32=true
                    feature.node.kubernetes.io/cpu-model-kvm64=true
                    feature.node.kubernetes.io/cpu-model-qemu32=true
                    feature.node.kubernetes.io/cpu-model-qemu64=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-base=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-reset=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-synic=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-time=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush=true
                    feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex=true
                    feature.node.kubernetes.io/network-sriov.capable=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=dev-worker-1
                    kubernetes.io/os=linux
                    kubevirt.io/schedulable=true
                    node-role.kubernetes.io/worker=
                    node.openshift.io/os_id=rhcos
Annotations:        k8s.ovn.org/l3-gateway-config:
                      {"default":{"interface-id":"br-local_dev-worker-1","ip-address":"169.254.33.2/24","mac-address":"a6:01:58:6f:61:41","mode":"local","next-h...
                    k8s.ovn.org/node-chassis-id: c1664ef8-75be-4d2e-9e13-c28b05f8734a
                    k8s.ovn.org/node-join-subnets: {"default":"100.64.4.0/29"}
                    k8s.ovn.org/node-mgmt-port-mac-address: 32:2c:9c:5b:ec:ef
                    k8s.ovn.org/node-subnets: {"default":"10.128.2.0/23"}
                    kubevirt.io/heartbeat: 2020-03-12T15:32:27Z
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-fd3c33dd88edc8353cd41ad77e41934b
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-fd3c33dd88edc8353cd41ad77e41934b
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    node-labeller-feature.node.kubernetes.io/cpu-feature-3dnowprefetch: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-abm: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-adx: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-aes: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-arat: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx2: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx512bw: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx512cd: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx512dq: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx512f: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-avx512vl: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-bmi1: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-bmi2: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-clwb: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-erms: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-f16c: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-fma: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-fsgsbase: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-hle: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-invpcid: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-movbe: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-mpx: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-pcid: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-pclmuldq: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-pdpe1gb: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-popcnt: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-rdrand: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-rdseed: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-rdtscp: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-rtm: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-smap: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-smep: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-spec-ctrl: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-sse4.2: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-svm: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-tsc-deadline: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-vme: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-x2apic: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-xgetbv1: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-xsave: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-xsavec: true
                    node-labeller-feature.node.kubernetes.io/cpu-feature-xsaveopt: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-noTSX: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-noTSX-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Haswell: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G1: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G2: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Penryn: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Client: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Client-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Server: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Server-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Westmere: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-Westmere-IBRS: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-kvm32: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-kvm64: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-qemu32: true
                    node-labeller-feature.node.kubernetes.io/cpu-model-qemu64: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-base: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reset: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-time: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush: true
                    node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex: true
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 12 Mar 2020 06:37:57 -0400
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 12 Mar 2020 11:32:25 -0400   Thu, 12 Mar 2020 07:16:14 -0400   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 12 Mar 2020 11:32:25 -0400   Thu, 12 Mar 2020 07:16:14 -0400   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 12 Mar 2020 11:32:25 -0400   Thu, 12 Mar 2020 07:16:14 -0400   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Thu, 12 Mar 2020 11:32:25 -0400   Thu, 12 Mar 2020 07:16:24 -0400   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.111.16
  Hostname:    dev-worker-1
Capacity:
  cpu:                                   72
  devices.kubevirt.io/kvm:               110
  devices.kubevirt.io/tun:               110
  devices.kubevirt.io/vhost-net:         110
  ephemeral-storage:                     233879108Ki
  hugepages-1Gi:                         16Gi
  hugepages-2Mi:                         0
  memory:                                196797708Ki
  openshift.io/intelnics:                4
  openshift.io/mlxnics:                  6
  ovs-cni.network.kubevirt.io/br-int:    1k
  ovs-cni.network.kubevirt.io/br-local:  1k
  pods:                                  250
Allocatable:
  cpu:                                   71500m
  devices.kubevirt.io/kvm:               110
  devices.kubevirt.io/tun:               110
  devices.kubevirt.io/vhost-net:         110
  ephemeral-storage:                     215542985576
  hugepages-1Gi:                         16Gi
  hugepages-2Mi:                         0
  memory:                                179406092Ki
  openshift.io/intelnics:                4
  openshift.io/mlxnics:                  6
  ovs-cni.network.kubevirt.io/br-int:    1k
  ovs-cni.network.kubevirt.io/br-local:  1k
  pods:                                  250
System Info:
  Machine ID:                             97be96d844e84ccbbc5c8c07c89747b5
  System UUID:                            005b4072-e7dc-e711-906e-00163566263e
  Boot ID:                                4d9ca3f2-b7d9-4a5f-9c6d-5618fb22df8e
  Kernel Version:                         4.18.0-147.5.1.el8_1.x86_64
  OS Image:                               Red Hat Enterprise Linux CoreOS 43.81.202003111633.0 (Ootpa)
  Operating System:                       linux
  Architecture:                           amd64
  Container Runtime Version:              cri-o://1.16.3-28.dev.rhaos4.3.git9aad8e4.el8
  Kubelet Version:                        v1.16.2
  Kube-Proxy Version:                     v1.16.2
Non-terminated Pods:                      (32 in total)
  Namespace                               Name                                                  CPU Requests  CPU Limits  Memory Requests  Memory Limits    AGE
  ---------                               ----                                                  ------------  ----------  ---------------  -------------    ---
  default                                 virt-launcher-vmi-sriov-t9xd7                         6010m (8%)    6010m (8%)  2353232384 (1%)  2353232384 (1%)  14m
  openshift-cluster-node-tuning-operator  tuned-lhmgm                                           10m (0%)      0 (0%)      50Mi (0%)        0 (0%)           4h28m
  openshift-cnv                           bridge-marker-4nhd6                                   100m (0%)     100m (0%)   40Mi (0%)        40Mi (0%)        3h34m
  openshift-cnv                           cdi-apiserver-7b5894bdbb-bnxb5                        0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h34m
  openshift-cnv                           cdi-deployment-b4f97d69f-58jz6                        0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h34m
  openshift-cnv                           cdi-uploadproxy-76c94b65c-x6j24                       0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h34m
  openshift-cnv                           cluster-network-addons-operator-864f9c596d-d86w5      0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h35m
  openshift-cnv                           hco-operator-5495b48ff4-jpk6g                         0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h35m
  openshift-cnv                           kube-cni-linux-bridge-plugin-wsqxd                    60m (0%)      60m (0%)    30Mi (0%)        30Mi (0%)        3h34m
  openshift-cnv                           kubemacpool-mac-controller-manager-9f6fb49dd-j8tw7    100m (0%)     300m (0%)   300Mi (0%)       600Mi (0%)       3h34m
  openshift-cnv                           kubevirt-node-labeller-qgrnt                          0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h33m
  openshift-cnv                           kubevirt-ssp-operator-d885cb85f-x9xrc                 0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h35m
  openshift-cnv                           nmstate-handler-worker-bjnwb                          200m (0%)     200m (0%)   120Mi (0%)       120Mi (0%)       3h34m
  openshift-cnv                           ovs-cni-amd64-fxsht                                   160m (0%)     160m (0%)   70Mi (0%)        70Mi (0%)        3h34m
  openshift-cnv                           virt-api-679d799d99-w95vm                             0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h34m
  openshift-cnv                           virt-controller-75c4595775-t2t8h                      0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h33m
  openshift-cnv                           virt-handler-lfxj2                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h33m
  openshift-cnv                           virt-operator-5df85455dc-lg57h                        0 (0%)        0 (0%)      0 (0%)           0 (0%)           3h35m
  openshift-cnv                           virt-template-validator-5bfb67ff94-fjls8              300m (0%)     300m (0%)   250Mi (0%)       250Mi (0%)       3h33m
  openshift-dns                           dns-default-qcwgj                                     110m (0%)     0 (0%)      70Mi (0%)        512Mi (0%)       4h55m
  openshift-ingress                       router-default-5f9799ff44-b6tjf                       100m (0%)     0 (0%)      256Mi (0%)       0 (0%)           4h15m
  openshift-machine-config-operator       machine-config-daemon-4sdvz                           40m (0%)      0 (0%)      100Mi (0%)       0 (0%)           4h55m
  openshift-marketplace                   certified-operators-7b95fc85b8-4dpfp                  10m (0%)      0 (0%)      100Mi (0%)       0 (0%)           87m
  openshift-marketplace                   community-operators-b4646f5-n75g7                     10m (0%)      0 (0%)      100Mi (0%)       0 (0%)           3h27m
  openshift-marketplace                   hco-catalogsource-config-67c4f7c749-snpxz             10m (0%)      0 (0%)      100Mi (0%)       0 (0%)           27m
  openshift-marketplace                   rh-verified-operators-f69cd59c9-jlnb2                 10m (0%)      0 (0%)      100Mi (0%)       0 (0%)           27m
  openshift-monitoring                    node-exporter-wgnsf                                   112m (0%)     0 (0%)      200Mi (0%)       0 (0%)           4h55m
  openshift-multus                        multus-9vc8h                                          10m (0%)      0 (0%)      150Mi (0%)       0 (0%)           4h56m
  openshift-ovn-kubernetes                ovnkube-node-4q2m9                                    300m (0%)     0 (0%)      900Mi (0%)       0 (0%)           4h56m
  openshift-sriov-network-operator        sriov-cni-bw8dl                                       0 (0%)        0 (0%)      0 (0%)           0 (0%)           4h19m
  openshift-sriov-network-operator        sriov-device-plugin-76979                             0 (0%)        0 (0%)      0 (0%)           0 (0%)           4h8m
  openshift-sriov-network-operator        sriov-network-config-daemon-4p8bm                     0 (0%)        0 (0%)      0 (0%)           0 (0%)           4h21m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                              Requests         Limits
  --------                              --------         ------
  cpu                                   7652m (10%)      7130m (9%)
  memory                                5431851520 (2%)  4054022656 (2%)
  ephemeral-storage                     0 (0%)           0 (0%)
  devices.kubevirt.io/kvm               2                2
  devices.kubevirt.io/tun               1                1
  devices.kubevirt.io/vhost-net         1                1
  openshift.io/intelnics                0                0
  openshift.io/mlxnics                  1                1
  ovs-cni.network.kubevirt.io/br-int    0                0
  ovs-cni.network.kubevirt.io/br-local  0                0
Events:                                 <none>

Comment 3 Fabian Deutsch 2020-03-17 09:41:52 UTC
Daniel, you saw that some priority related annotations were incorrect. Do you think they could have an impact here?

Comment 4 Daniel Belenky 2020-03-17 10:45:28 UTC
I don't think that the priority matters here since a VMI is a user defined workload so it usually means that the priority shouldn't be high.

Is it possible to get access to the node to see it's journal? any logs exists from the container?

Comment 7 Petr Horáček 2020-03-18 12:59:22 UTC
@Vladik, I heard there is a workaround for this. Do you know about that? Would you share it? Any way we can avoid this issue on 2.2?

Comment 9 Vladik Romanovsky 2020-03-18 13:45:18 UTC
Yes. The problem is that we don't take enough vfio related memory overhead into account. That's the reason why Guaranteed containers (with hard limits) are getting OOM and Burstable are running until the node has resources.

VMIs with SRIOV will need to add an additional 1GB overhead. In short, QEMU potentially locks the entire guest RAM and MMIO memory regions to allow DMA. There is a better explanation [1]
Libvirt also added 1GB for any vfio device. [2]
However, it may not be enough and manual adjustment may still be required. To address this I've posted [3]

[1] https://www.redhat.com/archives/libvir-list/2015-November/msg00329.html
[2] https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L13437
[3] https://github.com/kubevirt/kubevirt/pull/3162

Comment 10 Petr Horáček 2020-03-18 14:39:43 UTC
Vladik, regarding https://github.com/kubevirt/kubevirt/pull/3162. Would it be possible to split the part about huge pages out of the PR to get it merged more easily? I think that is the only part we need to backport to 2.3.

Comment 11 Vladik Romanovsky 2020-03-18 14:53:07 UTC
(In reply to Petr Horáček from comment #10)
> Vladik, regarding https://github.com/kubevirt/kubevirt/pull/3162. Would it
> be possible to split the part about huge pages out of the PR to get it
> merged more easily? I think that is the only part we need to backport to 2.3.

I don't see how is it possible. If we will do that we will break huge pages allocation.

Comment 12 Vladik Romanovsky 2020-03-19 00:48:26 UTC
I've added a second PR to take the overhead into account for vfio devices:
https://github.com/kubevirt/kubevirt/pull/3178

Comment 13 Nelly Credi 2020-03-23 07:52:18 UTC
Please add 'fixed in version'

Comment 14 Vladik Romanovsky 2020-03-23 13:24:52 UTC
(In reply to Nelly Credi from comment #13)
> Please add 'fixed in version'

Sure, I will when the code will be merged upstream.

Comment 15 Nelly Credi 2020-03-23 13:53:23 UTC
sorry :)
bad bulk change

Comment 19 errata-xmlrpc 2020-05-04 19:10:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:2011


Note You need to log in before you can comment on or make changes to this bug.