This bug has been migrated to another issue tracking site. It has been closed here and may no longer be being monitored.

If you would like to get updates for this issue, or to participate in it, you may do so at Red Hat Issue Tracker .
Bug 2144850 - Incorrect guest cpu cache topology/size
Summary: Incorrect guest cpu cache topology/size
Keywords:
Status: CLOSED MIGRATED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libvirt
Version: 9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Virtualization Maintenance
QA Contact: Luyao Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-11-22 14:25 UTC by Lukáš Doktor
Modified: 2023-07-07 20:38 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-07-07 20:38:45 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Guest with pinned CPUs (3.42 KB, text/plain)
2022-11-22 14:25 UTC, Lukáš Doktor
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker   RHEL-742 0 None None None 2023-07-07 20:38:44 UTC
Red Hat Issue Tracker RHELPLAN-140179 0 None None None 2022-11-22 14:59:49 UTC

Description Lukáš Doktor 2022-11-22 14:25:27 UTC
Created attachment 1926406 [details]
Guest with pinned CPUs

Description of problem:
When trying to match my CPU numa topology I noticed I'm not able to properly configure the CPU caches. With the recommended '<cache mode="passthrough"/>' it sets the sizes correctly, but the topology is different (guest reports L2 cache shared between 2 CPUs while host has separate ones) and without the '<cache mode="passthrough"/>' it the topology is correct but the sizes are wrong (bigger than on host).

Version-Release number of selected component (if applicable):
libvirt-8.0.0-8.el9_0.x86_64
qemu-kvm-core-6.2.0-11.el9_0.2.x86_64
* also tested with the latest upstream qemu-kvm and on RHEL8 with the same results

How reproducible:
Always

Steps to Reproduce:
1. Create a guest with a fixed and well defined CPU mapping
2. Compare "lstopo" outputs from host and guest

Actual results (with cache mode=passthrough):
Machine (19GB total)
  L3 L#0 (14MB)
    Package L#0
      NUMANode L#0 (P#0 9721MB) 
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB)
        Core L#0 + PU L#0 (P#0)
        Core L#1 + PU L#1 (P#1) 
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB)
        Core L#2 + PU L#2 (P#2)
        Core L#3 + PU L#3 (P#3) 
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#4 + PU L#4 (P#4)
    Package L#1
      NUMANode L#1 (P#1 10070MB)
      L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB)
        Core L#5 + PU L#5 (P#5)
        Core L#6 + PU L#6 (P#6) 
      L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB)
        Core L#7 + PU L#7 (P#7)
        Core L#8 + PU L#8 (P#8) 
      L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#9 + PU L#9 (P#9)

Actual results (without cache mode=passthrough):
Machine (19GB total)
  Package L#0
    NUMANode L#0 (P#0 9758MB)
    L3 L#0 (16MB)
      L2 L#0 (4096KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (4096KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L2 L#2 (4096KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L2 L#3 (4096KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      L2 L#4 (4096KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
  Package L#1
    NUMANode L#1 (P#1 10034MB)
    L3 L#1 (16MB)
      L2 L#5 (4096KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
      L2 L#6 (4096KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
      L2 L#7 (4096KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
      L2 L#8 (4096KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
      L2 L#9 (4096KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)

Expected results:
Machine (19GB total)
  Package L#0
    NUMANode L#0 (P#0 9758MB)
    L3 L#0 (14MB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
      L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
      L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#4)
  Package L#1
    NUMANode L#1 (P#1 10034MB)
    L3 L#1 (14MB)
      L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#5)
      L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#6)
      L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#7)
      L2 L#8 (1024KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#8)
      L2 L#9 (1024KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#9)

Additional info:
Host's lstopo is:
Machine (31GB total)
  Package L#0
    NUMANode L#0 (P#0 15GB)
    L3 L#0 (14MB)
      L2 L#0 (1024KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
      L2 L#1 (1024KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#2)
      L2 L#2 (1024KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#4)
      L2 L#3 (1024KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#6)
      L2 L#4 (1024KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 (P#8)
      L2 L#5 (1024KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 (P#10)
      L2 L#6 (1024KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 (P#12)
      L2 L#7 (1024KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 (P#14)
      L2 L#8 (1024KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 (P#16)
      L2 L#9 (1024KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 (P#18)
...
  Package L#1
    NUMANode L#1 (P#1 16GB)
    L3 L#1 (14MB)
      L2 L#10 (1024KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10 + PU L#10 (P#1)
      L2 L#11 (1024KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11 + PU L#11 (P#3)
      L2 L#12 (1024KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12 + PU L#12 (P#5)
      L2 L#13 (1024KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13 + PU L#13 (P#7)
      L2 L#14 (1024KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14 + PU L#14 (P#9)
      L2 L#15 (1024KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15 + PU L#15 (P#11)
      L2 L#16 (1024KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core L#16 + PU L#16 (P#13)
      L2 L#17 (1024KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core L#17 + PU L#17 (P#15)
      L2 L#18 (1024KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core L#18 + PU L#18 (P#17)
      L2 L#19 (1024KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core L#19 + PU L#19 (P#19)

Comment 1 Michal Privoznik 2022-11-22 16:45:50 UTC
As discussed earlier with Lukas, I think the problem here is that '<cache mode="passthrough"/>' inherits cache topology, but the vCPU topology does not necessarily match the host one. But also it remains to be shown that this has a performance impact.

Comment 2 Lukáš Doktor 2022-11-23 06:55:43 UTC
On request adding some details...

Host CPU: Intel Xeon 4210R
Similar issue was observed on RHEL8 on Intel Xeon E5-2640 where it also bundled the L2 caches per 2 cores with the correct sizes when cache passthrough was on and correct topology but 20MB L3 cache (while there is only 16MB on host)  and 4MB L2 (instead of 256KB) with cache passthrough off.

Host lscpu:
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  40
  On-line CPU(s) list:   0-19
  Off-line CPU(s) list:  20-39
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel
  Model name:            Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
    BIOS Model name:     Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  10
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4800.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe sysc
                         all nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pcl
                         mulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_de
                         adline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssb
                         d mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erm
                         s invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec
                          xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flu
                         sh_l1d arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   640 KiB (20 instances)
  L1i:                   640 KiB (20 instances)
  L2:                    20 MiB (20 instances)
  L3:                    27.5 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled


--------------------
Resulting qemu cmdline without cache passthrough:

2022-11-23 06:24:56.677+0000: starting up libvirt version: 8.0.0, package: 8.el9_0 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2022-03-31-05:40:59, ), qemu version: 6.2.0qemu-kvm-6.2.0-11.el9_0.2, kernel: 5.14.0-70.13.1.el9_0.x86_64, hostname: XXXXXX
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-1-vm1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-vm1/.config \
/usr/libexec/qemu-kvm \
-name guest=vm1,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-vm1/master-key.aes"}' \
-machine pc-q35-rhel9.0.0,usb=off,vmport=off,dump-guest-core=off \
-accel kvm \
-cpu host,migratable=on,kvm-hint-dedicated=on,kvm-poll-control=on \
-m 20480 \
-overcommit mem-lock=off \
-smp 10,sockets=2,dies=1,cores=5,threads=1 \
-object '{"qom-type":"iothread","id":"iothread1"}' \
-object '{"qom-type":"iothread","id":"iothread2"}' \
-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":10737418240,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-4,memdev=ram-node0 \
-object '{"qom-type":"memory-backend-ram","id":"ram-node1","size":10737418240,"host-nodes":[1],"policy":"bind"}' \
-numa node,nodeid=1,cpus=5-9,memdev=ram-node1 \
-numa dist,src=0,dst=0,val=10 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=1,dst=0,val=21 \
-numa dist,src=1,dst=1,val=10 \
-uuid 48ef38be-6a5e-11ed-862b-fa163e034221 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/boot.raw","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,iothread=iothread1,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fds=34:36:37:38:39,id=hostnet0,vhost=on,vhostfds=40:41:42:43:44 \
-device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=on,csum=off,gso=off,host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,guest_csum=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,mq=on,vectors=12,rx_queue_size=512,tx_queue_size=512,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:d2:e0:4a,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=32,server=on,wait=off \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 127.0.0.1:0,audiodev=audio1 \
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)

It's "lscpu" output:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  10
  On-line CPU(s) list:   0-9
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Red Hat
  Model name:            Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
    BIOS Model name:     RHEL-9.0.0 PC (Q35 + ICH9, 2009)
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  5
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4788.74
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscal
                         l nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_go
                         od nopl xtopology cpuid tsc_known_freq pni pclmulqdq vm
                         x ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe p
                         opcnt tsc_deadline_timer aes xsave avx f16c rdrand hype
                         rvisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_si
                         ngle ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi
                          flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
                         avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed
                          adx smap clflushopt clwb avx512cd avx512bw avx512vl xs
                         aveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512
                         _vnni md_clear arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   320 KiB (10 instances)
  L1i:                   320 KiB (10 instances)
  L2:                    40 MiB (10 instances)
  L3:                    32 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-4
  NUMA node1 CPU(s):     5-9
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin
                         g
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled


------------------------------

Qemu cmdline with cache passthrough on:

2022-11-23 06:31:31.469+0000: starting up libvirt version: 8.0.0, package: 8.el9_0 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2022-03-31-05:40:59, ), qemu version: 6.2.0qemu-kvm-6.2.0-11.el9_0.2, kernel: 5.14.0-70.13.1.el9_0.x86_64, hostname: XXXXXX
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-2-vm1 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-vm1/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-vm1/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-vm1/.config \
/usr/libexec/qemu-kvm \
-name guest=vm1,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-2-vm1/master-key.aes"}' \
-machine pc-q35-rhel9.0.0,usb=off,vmport=off,dump-guest-core=off \
-accel kvm \
-cpu host,migratable=on,kvm-hint-dedicated=on,kvm-poll-control=on,host-cache-info=on,l3-cache=off \
-m 20480 \
-overcommit mem-lock=off \
-smp 10,sockets=2,dies=1,cores=5,threads=1 \
-object '{"qom-type":"iothread","id":"iothread1"}' \
-object '{"qom-type":"iothread","id":"iothread2"}' \
-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":10737418240,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-4,memdev=ram-node0 \
-object '{"qom-type":"memory-backend-ram","id":"ram-node1","size":10737418240,"host-nodes":[1],"policy":"bind"}' \
-numa node,nodeid=1,cpus=5-9,memdev=ram-node1 \
-numa dist,src=0,dst=0,val=10 \
-numa dist,src=0,dst=1,val=21 \
-numa dist,src=1,dst=0,val=21 \
-numa dist,src=1,dst=1,val=10 \
-uuid 48ef38be-6a5e-11ed-862b-fa163e034221 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=33,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=18,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=19,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=20,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=21,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/boot.raw","aio":"native","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device virtio-blk-pci,iothread=iothread1,bus=pci.4,addr=0x0,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fds=34:36:37:38:39,id=hostnet0,vhost=on,vhostfds=40:41:42:43:44 \
-device virtio-net-pci,tx=bh,ioeventfd=on,event_idx=on,csum=off,gso=off,host_tso4=off,host_tso6=off,host_ecn=off,host_ufo=off,mrg_rxbuf=off,guest_csum=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,mq=on,vectors=12,rx_queue_size=512,tx_queue_size=512,host_mtu=9000,netdev=hostnet0,id=net0,mac=52:54:00:d2:e0:4a,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev socket,id=charchannel0,fd=32,server=on,wait=off \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-audiodev '{"id":"audio1","driver":"none"}' \
-vnc 127.0.0.1:0,audiodev=audio1 \
-device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  10
  On-line CPU(s) list:   0-9
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Red Hat
  Model name:            Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
    BIOS Model name:     RHEL-9.0.0 PC (Q35 + ICH9, 2009)
    CPU family:          6
    Model:               85
    Thread(s) per core:  1
    Core(s) per socket:  5
    Socket(s):           2
    Stepping:            7
    BogoMIPS:            4788.74
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscal
                         l nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_go
                         od nopl xtopology cpuid tsc_known_freq pni pclmulqdq vm
                         x ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe p
                         opcnt tsc_deadline_timer aes xsave avx f16c rdrand hype
                         rvisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_si
                         ngle ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi
                          flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 
                         avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed
                          adx smap clflushopt clwb avx512cd avx512bw avx512vl xs
                         aveopt xsavec xgetbv1 xsaves arat umip pku ospke avx512
                         _vnni md_clear arch_capabilities
Virtualization features: 
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
Caches (sum of all):     
  L1d:                   192 KiB (6 instances)
  L1i:                   192 KiB (6 instances)
  L2:                    6 MiB (6 instances)
  L3:                    13.8 MiB (1 instance)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-4
  NUMA node1 CPU(s):     5-9
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin
                         g
  Srbds:                 Not affected
  Tsx async abort:       Mitigation; TSX disabled

Comment 3 Lukáš Doktor 2022-11-23 16:06:41 UTC
(In reply to Michal Privoznik from comment #1)
> As discussed earlier with Lukas, I think the problem here is that '<cache
> mode="passthrough"/>' inherits cache topology, but the vCPU topology does
> not necessarily match the host one. But also it remains to be shown that
> this has a performance impact.

I tried few benchmarks and with default setting there are differences, which is understandable as it's trying to utilize full caches so the sizes are different. When one forces the sizes to match (either way) the results are alike and match the host numbers.

On the other hand a brief searching lead me to https://www.phoronix.com/news/Linux-5.16-Sched-Core I haven't experimented with that but based on the description it might result in the scheduler picking incorrect CPUs based on the incorrect topology. Looking at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_virtualization/optimizing-virtual-machine-performance-in-rhel_configuring-and-managing-virtualization where we promote the use of cache mode passthrough that leads to incorrect topology seems less ideal in this sight.

Anyway don't take those words for granted, it's way out of my comfort zone and is based on trials and assumptions.


Note You need to log in before you can comment on or make changes to this bug.