Description of problem: When creating a VMI with dedicated CPUs and hugepages using CNV 2.3.0 (installed via subscription), virt-launcher-* pod entered OOMKilled status. Version-Release number of selected component (if applicable): CNV 2.3.0 How reproducible: 100% Steps to Reproduce: 1. 2. 3. Actual results: oc get pods NAME READY STATUS RESTARTS AGE virt-launcher-vmi-sriov-t9xd7 1/2 OOMKilled 0 11m # oc get vmi NAME AGE PHASE IP NODENAME vmi-sriov 11m Failed 10.128.2.23 dev-worker-1 Expected results: Additional info: Additional info includes 1) VMI spec, 2) Pod yaml and 3)Node status. 1. VMI spec: --- apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: labels: special: vmi-sriov name: vmi-sriov spec: domain: cpu: sockets: 6 cores: 1 threads: 1 dedicatedCpuPlacement: true memory: guest: 2Gi devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk interfaces: - masquerade: {} name: default - name: sriov-net sriov: {} rng: {} machine: type: "" networks: - name: default pod: {} - multus: networkName: default/sriov-mlx # change me name: sriov-net terminationGracePeriodSeconds: 0 volumes: - containerDisk: image: kubevirt/fedora-cloud-container-disk-demo:latest name: containerdisk - cloudInitNoCloud: userData: | #!/bin/bash echo "fedora" |passwd fedora --stdin dhclient eth1 name: cloudinitdisk 2. Pod yaml: # oc get pods -o yaml apiVersion: v1 items: - apiVersion: v1 kind: Pod metadata: annotations: k8s.ovn.org/pod-networks: '{"default":{"ip_address":"10.128.2.23/23","mac_address":"ae:4e:81:80:02:18","gateway_ip":"10.128.2.1"}}' k8s.v1.cni.cncf.io/networks: '[{"name":"sriov-mlx","namespace":"default","mac":"02:a2:53:00:00:0c","interface":"net1"}]' k8s.v1.cni.cncf.io/networks-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.23" ], "mac": "ae:4e:81:80:02:18", "dns": {} },{ "name": "sriov-net", "interface": "net1", "dns": {} }] kubevirt.io/domain: vmi-sriov traffic.sidecar.istio.io/kubevirtInterfaces: k6t-eth0 creationTimestamp: "2020-03-12T15:19:55Z" generateName: virt-launcher-vmi-sriov- labels: kubevirt.io: virt-launcher kubevirt.io/created-by: 2b023657-496f-4b46-9ac1-7968574798be special: vmi-sriov name: virt-launcher-vmi-sriov-t9xd7 namespace: default ownerReferences: - apiVersion: kubevirt.io/v1alpha3 blockOwnerDeletion: true controller: true kind: VirtualMachineInstance name: vmi-sriov uid: 2b023657-496f-4b46-9ac1-7968574798be resourceVersion: "146263" selfLink: /api/v1/namespaces/default/pods/virt-launcher-vmi-sriov-t9xd7 uid: 6a35a910-8d3c-4987-9bff-e66ec4214b0a spec: automountServiceAccountToken: false containers: - command: - /usr/bin/virt-launcher - --qemu-timeout - 5m - --name - vmi-sriov - --uid - 2b023657-496f-4b46-9ac1-7968574798be - --namespace - default - --kubevirt-share-dir - /var/run/kubevirt - --ephemeral-disk-dir - /var/run/kubevirt-ephemeral-disks - --container-disk-dir - /var/run/kubevirt/container-disks - --readiness-file - /var/run/kubevirt-infra/healthy - --grace-period-seconds - "15" - --hook-sidecars - "0" - --less-pvc-space-toleration - "10" env: - name: KUBEVIRT_RESOURCE_NAME_sriov-net value: openshift.io/mlxnics image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499 imagePullPolicy: IfNotPresent name: compute readinessProbe: exec: command: - cat - /var/run/kubevirt-infra/healthy failureThreshold: 5 initialDelaySeconds: 4 periodSeconds: 1 successThreshold: 1 timeoutSeconds: 5 resources: limits: cpu: "6" devices.kubevirt.io/kvm: "1" devices.kubevirt.io/tun: "1" devices.kubevirt.io/vhost-net: "1" memory: 2259016Ki openshift.io/mlxnics: "1" requests: cpu: "6" devices.kubevirt.io/kvm: "1" devices.kubevirt.io/tun: "1" devices.kubevirt.io/vhost-net: "1" memory: 2259016Ki openshift.io/mlxnics: "1" securityContext: capabilities: add: - NET_ADMIN - SYS_NICE privileged: false runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-ephemeral-disks name: ephemeral-disks - mountPath: /var/run/kubevirt/container-disks mountPropagation: HostToContainer name: container-disks - mountPath: /var/run/kubevirt name: virt-share-dir - mountPath: /var/run/libvirt name: libvirt-runtime - mountPath: /sys/devices/ name: pci-devices - mountPath: /var/run/kubevirt-infra name: infra-ready-mount - mountPath: /etc/podnetinfo name: podnetinfo - args: - --copy-path - /var/run/kubevirt-ephemeral-disks/container-disk-data/2b023657-496f-4b46-9ac1-7968574798be/disk_0 command: - /usr/bin/container-disk image: kubevirt/fedora-cloud-container-disk-demo:latest imagePullPolicy: Always name: volumecontainerdisk readinessProbe: exec: command: - /usr/bin/container-disk - --health-check failureThreshold: 5 initialDelaySeconds: 1 periodSeconds: 1 successThreshold: 1 timeoutSeconds: 1 resources: limits: cpu: 10m memory: 40M requests: cpu: 10m memory: 40M terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/kubevirt-ephemeral-disks/container-disk-data/2b023657-496f-4b46-9ac1-7968574798be name: container-disks - mountPath: /usr/bin name: virt-bin-share-dir dnsPolicy: ClusterFirst enableServiceLinks: true hostname: vmi-sriov imagePullSecrets: - name: default-dockercfg-44vbq nodeName: dev-worker-1 nodeSelector: cpumanager: "true" kubevirt.io/schedulable: "true" priority: 0 restartPolicy: Never schedulerName: default-scheduler securityContext: fsGroup: 107 runAsUser: 0 seLinuxOptions: type: spc_t serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - hostPath: path: /sys/devices/ type: "" name: pci-devices - emptyDir: {} name: infra-ready-mount - hostPath: path: /var/run/kubevirt type: "" name: virt-share-dir - hostPath: path: /var/lib/kubevirt/init/usr/bin type: "" name: virt-bin-share-dir - emptyDir: {} name: libvirt-runtime - emptyDir: {} name: ephemeral-disks - hostPath: path: /var/run/kubevirt/container-disks/2b023657-496f-4b46-9ac1-7968574798be type: "" name: container-disks - downwardAPI: defaultMode: 420 items: - fieldRef: apiVersion: v1 fieldPath: metadata.labels path: labels - fieldRef: apiVersion: v1 fieldPath: metadata.annotations path: annotations name: podnetinfo status: conditions: - lastProbeTime: null lastTransitionTime: "2020-03-12T15:18:35Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2020-03-12T15:18:53Z" message: 'containers with unready status: [compute]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2020-03-12T15:18:53Z" message: 'containers with unready status: [compute]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2020-03-12T15:19:54Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://111f69a3e7b584615596a6ca7620c8072c859d53ee11c2634374e6f6f4b2b692 image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499 imageID: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-virt-launcher@sha256:8f7d02f68c2cff7d5937a23d829ace40a474517efacd090b3abe25af4b767499 lastState: {} name: compute ready: false restartCount: 0 started: false state: terminated: containerID: cri-o://111f69a3e7b584615596a6ca7620c8072c859d53ee11c2634374e6f6f4b2b692 exitCode: 137 finishedAt: "2020-03-12T15:18:52Z" reason: OOMKilled startedAt: "2020-03-12T15:18:37Z" - containerID: cri-o://44a4205485d9afcb16092dbdc45cafc6b9ada75245d4a8a852dd9797c4c0834a image: docker.io/kubevirt/fedora-cloud-container-disk-demo:latest imageID: docker.io/kubevirt/fedora-cloud-container-disk-demo@sha256:1d4f6f6d52974db84d2e1a031b6f634254fd97823c05d13d98d124846b001d0a lastState: {} name: volumecontainerdisk ready: true restartCount: 0 started: true state: running: startedAt: "2020-03-12T15:18:44Z" hostIP: 192.168.111.16 phase: Running podIP: 10.128.2.23 podIPs: - ip: 10.128.2.23 qosClass: Guaranteed startTime: "2020-03-12T15:18:35Z" kind: List metadata: resourceVersion: "" selfLink: "" 3. Node status: # oc describe node dev-worker-1 Name: dev-worker-1 Roles: worker Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux cpumanager=true feature.node.kubernetes.io/cpu-feature-3dnowprefetch=true feature.node.kubernetes.io/cpu-feature-abm=true feature.node.kubernetes.io/cpu-feature-adx=true feature.node.kubernetes.io/cpu-feature-aes=true feature.node.kubernetes.io/cpu-feature-arat=true feature.node.kubernetes.io/cpu-feature-avx=true feature.node.kubernetes.io/cpu-feature-avx2=true feature.node.kubernetes.io/cpu-feature-avx512bw=true feature.node.kubernetes.io/cpu-feature-avx512cd=true feature.node.kubernetes.io/cpu-feature-avx512dq=true feature.node.kubernetes.io/cpu-feature-avx512f=true feature.node.kubernetes.io/cpu-feature-avx512vl=true feature.node.kubernetes.io/cpu-feature-bmi1=true feature.node.kubernetes.io/cpu-feature-bmi2=true feature.node.kubernetes.io/cpu-feature-clwb=true feature.node.kubernetes.io/cpu-feature-erms=true feature.node.kubernetes.io/cpu-feature-f16c=true feature.node.kubernetes.io/cpu-feature-fma=true feature.node.kubernetes.io/cpu-feature-fsgsbase=true feature.node.kubernetes.io/cpu-feature-hle=true feature.node.kubernetes.io/cpu-feature-invpcid=true feature.node.kubernetes.io/cpu-feature-movbe=true feature.node.kubernetes.io/cpu-feature-mpx=true feature.node.kubernetes.io/cpu-feature-pcid=true feature.node.kubernetes.io/cpu-feature-pclmuldq=true feature.node.kubernetes.io/cpu-feature-pdpe1gb=true feature.node.kubernetes.io/cpu-feature-popcnt=true feature.node.kubernetes.io/cpu-feature-rdrand=true feature.node.kubernetes.io/cpu-feature-rdseed=true feature.node.kubernetes.io/cpu-feature-rdtscp=true feature.node.kubernetes.io/cpu-feature-rtm=true feature.node.kubernetes.io/cpu-feature-smap=true feature.node.kubernetes.io/cpu-feature-smep=true feature.node.kubernetes.io/cpu-feature-spec-ctrl=true feature.node.kubernetes.io/cpu-feature-sse4.2=true feature.node.kubernetes.io/cpu-feature-svm=true feature.node.kubernetes.io/cpu-feature-tsc-deadline=true feature.node.kubernetes.io/cpu-feature-vme=true feature.node.kubernetes.io/cpu-feature-x2apic=true feature.node.kubernetes.io/cpu-feature-xgetbv1=true feature.node.kubernetes.io/cpu-feature-xsave=true feature.node.kubernetes.io/cpu-feature-xsavec=true feature.node.kubernetes.io/cpu-feature-xsaveopt=true feature.node.kubernetes.io/cpu-model-Broadwell=true feature.node.kubernetes.io/cpu-model-Broadwell-IBRS=true feature.node.kubernetes.io/cpu-model-Broadwell-noTSX=true feature.node.kubernetes.io/cpu-model-Broadwell-noTSX-IBRS=true feature.node.kubernetes.io/cpu-model-Haswell=true feature.node.kubernetes.io/cpu-model-Haswell-IBRS=true feature.node.kubernetes.io/cpu-model-Haswell-noTSX=true feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS=true feature.node.kubernetes.io/cpu-model-IvyBridge=true feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS=true feature.node.kubernetes.io/cpu-model-Nehalem=true feature.node.kubernetes.io/cpu-model-Nehalem-IBRS=true feature.node.kubernetes.io/cpu-model-Opteron_G1=true feature.node.kubernetes.io/cpu-model-Opteron_G2=true feature.node.kubernetes.io/cpu-model-Penryn=true feature.node.kubernetes.io/cpu-model-SandyBridge=true feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS=true feature.node.kubernetes.io/cpu-model-Skylake-Client=true feature.node.kubernetes.io/cpu-model-Skylake-Client-IBRS=true feature.node.kubernetes.io/cpu-model-Skylake-Server=true feature.node.kubernetes.io/cpu-model-Skylake-Server-IBRS=true feature.node.kubernetes.io/cpu-model-Westmere=true feature.node.kubernetes.io/cpu-model-Westmere-IBRS=true feature.node.kubernetes.io/cpu-model-kvm32=true feature.node.kubernetes.io/cpu-model-kvm64=true feature.node.kubernetes.io/cpu-model-qemu32=true feature.node.kubernetes.io/cpu-model-qemu64=true feature.node.kubernetes.io/kvm-info-cap-hyperv-base=true feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies=true feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi=true feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment=true feature.node.kubernetes.io/kvm-info-cap-hyperv-reset=true feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synic=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2=true feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer=true feature.node.kubernetes.io/kvm-info-cap-hyperv-time=true feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush=true feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex=true feature.node.kubernetes.io/network-sriov.capable=true kubernetes.io/arch=amd64 kubernetes.io/hostname=dev-worker-1 kubernetes.io/os=linux kubevirt.io/schedulable=true node-role.kubernetes.io/worker= node.openshift.io/os_id=rhcos Annotations: k8s.ovn.org/l3-gateway-config: {"default":{"interface-id":"br-local_dev-worker-1","ip-address":"169.254.33.2/24","mac-address":"a6:01:58:6f:61:41","mode":"local","next-h... k8s.ovn.org/node-chassis-id: c1664ef8-75be-4d2e-9e13-c28b05f8734a k8s.ovn.org/node-join-subnets: {"default":"100.64.4.0/29"} k8s.ovn.org/node-mgmt-port-mac-address: 32:2c:9c:5b:ec:ef k8s.ovn.org/node-subnets: {"default":"10.128.2.0/23"} kubevirt.io/heartbeat: 2020-03-12T15:32:27Z machineconfiguration.openshift.io/currentConfig: rendered-worker-fd3c33dd88edc8353cd41ad77e41934b machineconfiguration.openshift.io/desiredConfig: rendered-worker-fd3c33dd88edc8353cd41ad77e41934b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done node-labeller-feature.node.kubernetes.io/cpu-feature-3dnowprefetch: true node-labeller-feature.node.kubernetes.io/cpu-feature-abm: true node-labeller-feature.node.kubernetes.io/cpu-feature-adx: true node-labeller-feature.node.kubernetes.io/cpu-feature-aes: true node-labeller-feature.node.kubernetes.io/cpu-feature-arat: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx2: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx512bw: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx512cd: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx512dq: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx512f: true node-labeller-feature.node.kubernetes.io/cpu-feature-avx512vl: true node-labeller-feature.node.kubernetes.io/cpu-feature-bmi1: true node-labeller-feature.node.kubernetes.io/cpu-feature-bmi2: true node-labeller-feature.node.kubernetes.io/cpu-feature-clwb: true node-labeller-feature.node.kubernetes.io/cpu-feature-erms: true node-labeller-feature.node.kubernetes.io/cpu-feature-f16c: true node-labeller-feature.node.kubernetes.io/cpu-feature-fma: true node-labeller-feature.node.kubernetes.io/cpu-feature-fsgsbase: true node-labeller-feature.node.kubernetes.io/cpu-feature-hle: true node-labeller-feature.node.kubernetes.io/cpu-feature-invpcid: true node-labeller-feature.node.kubernetes.io/cpu-feature-movbe: true node-labeller-feature.node.kubernetes.io/cpu-feature-mpx: true node-labeller-feature.node.kubernetes.io/cpu-feature-pcid: true node-labeller-feature.node.kubernetes.io/cpu-feature-pclmuldq: true node-labeller-feature.node.kubernetes.io/cpu-feature-pdpe1gb: true node-labeller-feature.node.kubernetes.io/cpu-feature-popcnt: true node-labeller-feature.node.kubernetes.io/cpu-feature-rdrand: true node-labeller-feature.node.kubernetes.io/cpu-feature-rdseed: true node-labeller-feature.node.kubernetes.io/cpu-feature-rdtscp: true node-labeller-feature.node.kubernetes.io/cpu-feature-rtm: true node-labeller-feature.node.kubernetes.io/cpu-feature-smap: true node-labeller-feature.node.kubernetes.io/cpu-feature-smep: true node-labeller-feature.node.kubernetes.io/cpu-feature-spec-ctrl: true node-labeller-feature.node.kubernetes.io/cpu-feature-sse4.2: true node-labeller-feature.node.kubernetes.io/cpu-feature-svm: true node-labeller-feature.node.kubernetes.io/cpu-feature-tsc-deadline: true node-labeller-feature.node.kubernetes.io/cpu-feature-vme: true node-labeller-feature.node.kubernetes.io/cpu-feature-x2apic: true node-labeller-feature.node.kubernetes.io/cpu-feature-xgetbv1: true node-labeller-feature.node.kubernetes.io/cpu-feature-xsave: true node-labeller-feature.node.kubernetes.io/cpu-feature-xsavec: true node-labeller-feature.node.kubernetes.io/cpu-feature-xsaveopt: true node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell: true node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-noTSX: true node-labeller-feature.node.kubernetes.io/cpu-model-Broadwell-noTSX-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX: true node-labeller-feature.node.kubernetes.io/cpu-model-Haswell-noTSX-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge: true node-labeller-feature.node.kubernetes.io/cpu-model-IvyBridge-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem: true node-labeller-feature.node.kubernetes.io/cpu-model-Nehalem-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G1: true node-labeller-feature.node.kubernetes.io/cpu-model-Opteron_G2: true node-labeller-feature.node.kubernetes.io/cpu-model-Penryn: true node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge: true node-labeller-feature.node.kubernetes.io/cpu-model-SandyBridge-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Client: true node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Client-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Server: true node-labeller-feature.node.kubernetes.io/cpu-model-Skylake-Server-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-Westmere: true node-labeller-feature.node.kubernetes.io/cpu-model-Westmere-IBRS: true node-labeller-feature.node.kubernetes.io/cpu-model-kvm32: true node-labeller-feature.node.kubernetes.io/cpu-model-kvm64: true node-labeller-feature.node.kubernetes.io/cpu-model-qemu32: true node-labeller-feature.node.kubernetes.io/cpu-model-qemu64: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-base: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-frequencies: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-ipi: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reenlightenment: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-reset: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-runtime: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synic2: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-synictimer: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-time: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-tlbflush: true node-labeller-feature.node.kubernetes.io/kvm-info-cap-hyperv-vpindex: true volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Thu, 12 Mar 2020 06:37:57 -0400 Taints: <none> Unschedulable: false Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- MemoryPressure False Thu, 12 Mar 2020 11:32:25 -0400 Thu, 12 Mar 2020 07:16:14 -0400 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Thu, 12 Mar 2020 11:32:25 -0400 Thu, 12 Mar 2020 07:16:14 -0400 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Thu, 12 Mar 2020 11:32:25 -0400 Thu, 12 Mar 2020 07:16:14 -0400 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 12 Mar 2020 11:32:25 -0400 Thu, 12 Mar 2020 07:16:24 -0400 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.111.16 Hostname: dev-worker-1 Capacity: cpu: 72 devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 233879108Ki hugepages-1Gi: 16Gi hugepages-2Mi: 0 memory: 196797708Ki openshift.io/intelnics: 4 openshift.io/mlxnics: 6 ovs-cni.network.kubevirt.io/br-int: 1k ovs-cni.network.kubevirt.io/br-local: 1k pods: 250 Allocatable: cpu: 71500m devices.kubevirt.io/kvm: 110 devices.kubevirt.io/tun: 110 devices.kubevirt.io/vhost-net: 110 ephemeral-storage: 215542985576 hugepages-1Gi: 16Gi hugepages-2Mi: 0 memory: 179406092Ki openshift.io/intelnics: 4 openshift.io/mlxnics: 6 ovs-cni.network.kubevirt.io/br-int: 1k ovs-cni.network.kubevirt.io/br-local: 1k pods: 250 System Info: Machine ID: 97be96d844e84ccbbc5c8c07c89747b5 System UUID: 005b4072-e7dc-e711-906e-00163566263e Boot ID: 4d9ca3f2-b7d9-4a5f-9c6d-5618fb22df8e Kernel Version: 4.18.0-147.5.1.el8_1.x86_64 OS Image: Red Hat Enterprise Linux CoreOS 43.81.202003111633.0 (Ootpa) Operating System: linux Architecture: amd64 Container Runtime Version: cri-o://1.16.3-28.dev.rhaos4.3.git9aad8e4.el8 Kubelet Version: v1.16.2 Kube-Proxy Version: v1.16.2 Non-terminated Pods: (32 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE --------- ---- ------------ ---------- --------------- ------------- --- default virt-launcher-vmi-sriov-t9xd7 6010m (8%) 6010m (8%) 2353232384 (1%) 2353232384 (1%) 14m openshift-cluster-node-tuning-operator tuned-lhmgm 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 4h28m openshift-cnv bridge-marker-4nhd6 100m (0%) 100m (0%) 40Mi (0%) 40Mi (0%) 3h34m openshift-cnv cdi-apiserver-7b5894bdbb-bnxb5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h34m openshift-cnv cdi-deployment-b4f97d69f-58jz6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h34m openshift-cnv cdi-uploadproxy-76c94b65c-x6j24 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h34m openshift-cnv cluster-network-addons-operator-864f9c596d-d86w5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h35m openshift-cnv hco-operator-5495b48ff4-jpk6g 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h35m openshift-cnv kube-cni-linux-bridge-plugin-wsqxd 60m (0%) 60m (0%) 30Mi (0%) 30Mi (0%) 3h34m openshift-cnv kubemacpool-mac-controller-manager-9f6fb49dd-j8tw7 100m (0%) 300m (0%) 300Mi (0%) 600Mi (0%) 3h34m openshift-cnv kubevirt-node-labeller-qgrnt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h33m openshift-cnv kubevirt-ssp-operator-d885cb85f-x9xrc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h35m openshift-cnv nmstate-handler-worker-bjnwb 200m (0%) 200m (0%) 120Mi (0%) 120Mi (0%) 3h34m openshift-cnv ovs-cni-amd64-fxsht 160m (0%) 160m (0%) 70Mi (0%) 70Mi (0%) 3h34m openshift-cnv virt-api-679d799d99-w95vm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h34m openshift-cnv virt-controller-75c4595775-t2t8h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h33m openshift-cnv virt-handler-lfxj2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h33m openshift-cnv virt-operator-5df85455dc-lg57h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3h35m openshift-cnv virt-template-validator-5bfb67ff94-fjls8 300m (0%) 300m (0%) 250Mi (0%) 250Mi (0%) 3h33m openshift-dns dns-default-qcwgj 110m (0%) 0 (0%) 70Mi (0%) 512Mi (0%) 4h55m openshift-ingress router-default-5f9799ff44-b6tjf 100m (0%) 0 (0%) 256Mi (0%) 0 (0%) 4h15m openshift-machine-config-operator machine-config-daemon-4sdvz 40m (0%) 0 (0%) 100Mi (0%) 0 (0%) 4h55m openshift-marketplace certified-operators-7b95fc85b8-4dpfp 10m (0%) 0 (0%) 100Mi (0%) 0 (0%) 87m openshift-marketplace community-operators-b4646f5-n75g7 10m (0%) 0 (0%) 100Mi (0%) 0 (0%) 3h27m openshift-marketplace hco-catalogsource-config-67c4f7c749-snpxz 10m (0%) 0 (0%) 100Mi (0%) 0 (0%) 27m openshift-marketplace rh-verified-operators-f69cd59c9-jlnb2 10m (0%) 0 (0%) 100Mi (0%) 0 (0%) 27m openshift-monitoring node-exporter-wgnsf 112m (0%) 0 (0%) 200Mi (0%) 0 (0%) 4h55m openshift-multus multus-9vc8h 10m (0%) 0 (0%) 150Mi (0%) 0 (0%) 4h56m openshift-ovn-kubernetes ovnkube-node-4q2m9 300m (0%) 0 (0%) 900Mi (0%) 0 (0%) 4h56m openshift-sriov-network-operator sriov-cni-bw8dl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h19m openshift-sriov-network-operator sriov-device-plugin-76979 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h8m openshift-sriov-network-operator sriov-network-config-daemon-4p8bm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4h21m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 7652m (10%) 7130m (9%) memory 5431851520 (2%) 4054022656 (2%) ephemeral-storage 0 (0%) 0 (0%) devices.kubevirt.io/kvm 2 2 devices.kubevirt.io/tun 1 1 devices.kubevirt.io/vhost-net 1 1 openshift.io/intelnics 0 0 openshift.io/mlxnics 1 1 ovs-cni.network.kubevirt.io/br-int 0 0 ovs-cni.network.kubevirt.io/br-local 0 0 Events: <none>
Daniel, you saw that some priority related annotations were incorrect. Do you think they could have an impact here?
I don't think that the priority matters here since a VMI is a user defined workload so it usually means that the priority shouldn't be high. Is it possible to get access to the node to see it's journal? any logs exists from the container?
@Vladik, I heard there is a workaround for this. Do you know about that? Would you share it? Any way we can avoid this issue on 2.2?
Yes. The problem is that we don't take enough vfio related memory overhead into account. That's the reason why Guaranteed containers (with hard limits) are getting OOM and Burstable are running until the node has resources. VMIs with SRIOV will need to add an additional 1GB overhead. In short, QEMU potentially locks the entire guest RAM and MMIO memory regions to allow DMA. There is a better explanation [1] Libvirt also added 1GB for any vfio device. [2] However, it may not be enough and manual adjustment may still be required. To address this I've posted [3] [1] https://www.redhat.com/archives/libvir-list/2015-November/msg00329.html [2] https://github.com/libvirt/libvirt/blob/master/src/qemu/qemu_domain.c#L13437 [3] https://github.com/kubevirt/kubevirt/pull/3162
Vladik, regarding https://github.com/kubevirt/kubevirt/pull/3162. Would it be possible to split the part about huge pages out of the PR to get it merged more easily? I think that is the only part we need to backport to 2.3.
(In reply to Petr Horáček from comment #10) > Vladik, regarding https://github.com/kubevirt/kubevirt/pull/3162. Would it > be possible to split the part about huge pages out of the PR to get it > merged more easily? I think that is the only part we need to backport to 2.3. I don't see how is it possible. If we will do that we will break huge pages allocation.
I've added a second PR to take the overhead into account for vfio devices: https://github.com/kubevirt/kubevirt/pull/3178
Please add 'fixed in version'
(In reply to Nelly Credi from comment #13) > Please add 'fixed in version' Sure, I will when the code will be merged upstream.
sorry :) bad bulk change
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:2011