Bug 2212590

Summary: 1 vcpu realtime VM hangs with CNV
Product: Container Native Virtualization (CNV) Reporter: Germano Veit Michel <gveitmic>
Component: NetworkingAssignee: Marcelo Tosatti <mtosatti>
Status: NEW --- QA Contact: Nir Rozen <nrozen>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.13.0CC: fdeutsch, germano, mtosatti, ngu, vromanso
Target Milestone: ---   
Target Release: 4.15.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Germano Veit Michel 2023-06-06 01:20:07 UTC
Description of problem:

Virtual Machines configured with realtime and hugepages can't even stay 1 minute up, killed by oom-killer due to cgroup limits.

Version-Release number of selected component (if applicable):
CNV 4.13
OCP 4.13.1

How reproducible:
Always

Steps to Reproduce:
1. Setup a VM for low latency work (mainly hugepages and realtime required)

$ oc get vm fedora-1 -o yaml | yq '.spec.template.spec.domain'
cpu:
  cores: 1
  dedicatedCpuPlacement: true
  numa:
    guestMappingPassthrough: {}
  realtime: {}
  sockets: 1
  threads: 1
devices:
  disks:
    - disk:
        bus: virtio
      name: rootdisk
    - disk:
        bus: virtio
      name: cloudinitdisk
  interfaces:
    - macAddress: "02:30:44:00:00:00"
      masquerade: {}
      model: virtio
      name: default
  networkInterfaceMultiqueue: true
  rng: {}
features:
  acpi: {}
  smm:
    enabled: true
firmware:
  bootloader:
    efi: {}
machine:
  type: pc-q35-rhel9.2.0
memory:
  hugepages:
    pageSize: 1Gi
resources:
  limits:
    cpu: "1"
    memory: 22Gi
  requests:
    cpu: "1"
    memory: 22Gi

2. Once the pod starts, because its using hugepages, it fills a limit of around 300M for qemu-kvm

    resources:
      limits:
        cpu: "1"
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        hugepages-1Gi: 22Gi
        memory: "299892737"
      requests:
        cpu: "1"
        devices.kubevirt.io/kvm: "1"
        devices.kubevirt.io/tun: "1"
        devices.kubevirt.io/vhost-net: "1"
        ephemeral-storage: 50M
        hugepages-1Gi: 22Gi
        memory: "299892737"

3. Which is not enough if realtime: {} is enabled.

Jun 06 01:10:33 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46380 (qemu-kvm) total-vm:24321492kB, anon-rss:243796kB, file-rss:21688kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46841 (qemu-kvm) total-vm:24321492kB, anon-rss:244272kB, file-rss:21524kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997
Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46732 (virt-launcher) total-vm:1395200kB, anon-rss:9840kB, file-rss:39656kB, shmem-rss:0kB, UID:107 pgtables:308kB oom_score_adj:-997
Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47598 (qemu-kvm) total-vm:24321492kB, anon-rss:244544kB, file-rss:21472kB, shmem-rss:4kB, UID:107 pgtables:1312kB oom_score_adj:-997
Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47475 (virt-launcher) total-vm:1395132kB, anon-rss:10012kB, file-rss:39908kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
Jun 06 01:12:36 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48391 (qemu-kvm) total-vm:24321492kB, anon-rss:244052kB, file-rss:21528kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48867 (qemu-kvm) total-vm:24321492kB, anon-rss:244284kB, file-rss:21604kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48744 (virt-launcher) total-vm:1395132kB, anon-rss:9992kB, file-rss:39692kB, shmem-rss:0kB, UID:107 pgtables:320kB oom_score_adj:-997
Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49658 (qemu-kvm) total-vm:24321492kB, anon-rss:244016kB, file-rss:21464kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49496 (virt-launcher) total-vm:1395204kB, anon-rss:10444kB, file-rss:39616kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
Jun 06 01:14:40 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50436 (qemu-kvm) total-vm:24305052kB, anon-rss:244312kB, file-rss:21548kB, shmem-rss:4kB, UID:107 pgtables:1304kB oom_score_adj:-997
Jun 06 01:14:57 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50879 (qemu-kvm) total-vm:24305052kB, anon-rss:244108kB, file-rss:21536kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51374 (qemu-kvm) total-vm:24321492kB, anon-rss:244108kB, file-rss:21568kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51239 (virt-launcher) total-vm:1395204kB, anon-rss:10212kB, file-rss:39648kB, shmem-rss:0kB, UID:107 pgtables:316kB oom_score_adj:-997
Jun 06 01:16:09 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 52120 (qemu-kvm) total-vm:24321492kB, anon-rss:244636kB, file-rss:21428kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997
 
4. The memory situation is as follows:

Jun 06 01:16:27 worker-4.toca.local kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod204d7fbc_3a3b_4c8f_befe_0810d6f18231.slice:
Jun 06 01:16:27 worker-4.toca.local kernel: anon 270708736
                                            file 65536
                                            kernel 28831744
                                            kernel_stack 802816
                                            pagetables 2125824
                                            percpu 100800
                                            sock 0
                                            vmalloc 23465984
                                            shmem 61440
                                            zswap 0
                                            zswapped 0
                                            file_mapped 16384
                                            file_dirty 0
                                            file_writeback 4096
                                            swapcached 0
                                            anon_thp 0
                                            file_thp 0
                                            shmem_thp 0
                                            inactive_anon 262545408
                                            active_anon 8224768
                                            inactive_file 4096
                                            active_file 0
                                            unevictable 0
                                            slab_reclaimable 787736
                                            slab_unreclaimable 1200816
                                            slab 1988552
                                            workingset_refault_anon 0
                                            workingset_refault_file 0
                                            workingset_activate_anon 0
                                            workingset_activate_file 0
                                            workingset_restore_anon 0
                                            workingset_restore_file 0
                                            workingset_nodereclaim 0
                                            pgscan 456
                                            pgsteal 454
                                            pgscan_kswapd 0
                                            pgscan_direct 456
                                            pgsteal_kswapd 0
                                            pgsteal_direct 454
                                            pgfault 90828
                                            pgmajfault 0
                                            pgrefill 5
                                            pgactivate 1987
                                            pgdeactivate 5
                                            pglazyfree 0
                                            pglazyfreed 0
                                            zswpin 0
                                            zswpout 0
                                            thp_fault_alloc 0
                                            thp_collapse_alloc 0
Jun 06 01:16:27 worker-4.toca.local kernel: Tasks state (memory values in pages):
Jun 06 01:16:27 worker-4.toca.local kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Jun 06 01:16:27 worker-4.toca.local kernel: [  52443]     0 52443     2076      469    53248        0         -1000 conmon
Jun 06 01:16:27 worker-4.toca.local kernel: [  52455]   107 52455   274764     3386   159744        0          -997 virt-launcher-m
Jun 06 01:16:27 worker-4.toca.local kernel: [  52482]   107 52482   348800    12472   327680        0          -997 virt-launcher
Jun 06 01:16:27 worker-4.toca.local kernel: [  52487]   107 52487   174567     5928   217088        0          -997 virtqemud
Jun 06 01:16:27 worker-4.toca.local kernel: [  52488]   107 52488    26133     3930   106496        0          -997 virtlogd
Jun 06 01:16:27 worker-4.toca.local kernel: [  52588]   107 52588  6080373    66350  1339392        0          -997 qemu-kvm

Actual results:
* VMs won't stay up

Expected results:
* VM is up

Additional info:
* it calculates the exat same memory limit (non HP one) with realtime or not.
* realtime appears to need more memory
* changing the VM limits make no difference, as they end up in the 1G HP limits and qemu still gets the auto-calculated ~300M (depends on VM config)

Comment 3 Kedar Bidarkar 2023-06-07 12:37:52 UTC
Currently real-time VM is not productized yet, hence targeting it for 4.15 as not version seen beyond it and possibly it could get fixed in 4.16+

Comment 4 Fabian Deutsch 2023-06-14 12:36:10 UTC
Relates to RT, and therefore moving to network which owns https://issues.redhat.com/browse/CNV-12970