Bug 1351160
Summary: | qemu-kvm does not expose expected cpu topology to guest with numa defined | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Timothy Rees <trees> |
Component: | qemu-kvm-rhev | Assignee: | Eduardo Habkost <ehabkost> |
Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.2 | CC: | ailan, cgiunta, djdumas, drjones, fdanapfe, knoel, linux, mkoch, peter.engel, rsibley, srao, trees, virt-maint |
Target Milestone: | rc | ||
Target Release: | 7.3 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2016-07-02 00:13:01 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1018952 |
Description
Timothy Rees
2016-06-29 11:27:19 UTC
Please provide the qemu command lines ('pgrep -a qemu' while the guest is running). Also, we should split scenarios A and B into two separate bugs; Scenario A deals with the topology change, and B with the unexpected guest cpu model (I think B is likely not-a-bug, but I'll allow others to respond) (In reply to Andrew Jones from comment #2) > Please provide the qemu command lines ('pgrep -a qemu' while the guest is > running). > > Also, we should split scenarios A and B into two separate bugs; Scenario A > deals with the topology change, and B with the unexpected guest cpu model (I > think B is likely not-a-bug, but I'll allow others to respond) [root@hypervisor ~]# pgrep -a qemu 103617 /usr/libexec/qemu-kvm -name rhel7.0-2 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu host -m 502235 -realtime mlock=off -smp 120,sockets=4,cores=15,threads=2 -numa node,nodeid=0,cpus=0-14,cpus=60-74,mem=128000 -numa node,nodeid=1,cpus=15-29,cpus=75-89,mem=128000 -numa node,nodeid=2,cpus=30-44,cpus=90-104,mem=128000 -numa node,nodeid=3,cpus=45-59,cpus=105-119,mem=118235 -uuid 0174311e-bf55-41c9-be80-d097bc3f73e4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7.0-2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/vg_sys_r1/lv_vm_rhel7,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 -drive file=/dev/vg_sys_r1/lv_vm_rhel7_usr_sap,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/dev/vg_sys_r1/lv_hana,if=none,id=drive-virtio-disk3,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk3,id=virtio-disk3 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0e:c7:25,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-rhel7.0-2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on (In reply to Timothy Rees from comment #0) [...] > Test with 2 scenarios: > A: Guest cpu configured with mode=host-passthrough > B: Guest cpu configured with mode=custom match=exact [...] > SCENARIO A: [...] > 3. Boot and login to the guest: > [root@guestvm ~]# lscpu [...] > CPU(s): 120 > On-line CPU(s) list: 0-119 > Thread(s) per core: 1 > Core(s) per socket: 16 > Socket(s): 4 > NUMA node(s): 4 [...] > Actual results: > Guest shows Sockets/Cores/Threads as 4/16/1 > > Expected results: > Per the host output and xml configuration, guest should show > Sockets/Cores/Threads as 4/15/2 This is unexpected, I will investigate. But note that each socket still have 15 cores each. The only difference is that the guest assumes that the _maximum_ number of cores per socket is 16, the _actual_ number of cores per socket is exactly the one requested. Core ID 15 won't be present in any of the sockets, only core IDs 0-14. This way, the CPU core topology and NUMA topology still match the host as expected. I would like to understand what are the real world problems that are caused by this behavior. This way, we can look for workarounds if necessary, and will be able to justify changing QEMU behavior upstream. > > SCENARIO B: > 3. Boot and login to the guest: > [root@guestvm ~]# lscpu [...] > CPU(s): 120 > On-line CPU(s) list: 0-119 > Thread(s) per core: 1 > Core(s) per socket: 16 > Socket(s): 4 > NUMA node(s): 4 [...] > Actual results: > Guest shows Sockets/Cores/Threads as 4/16/1 See comment about scenario A above. > Guest CPU Model does not match host CPU Model > Guest CPU Caches do not match host CPU caches > This is expected. CPU model will match exactly only if using host-passthrough. CPU caches will match the host only if using "-cpu host" (host-passthrough on libvirt) and "host-cache-info=on" (not supported by libvirt yet). Note: Old versions of QEMU (including the current 7.2 qemu-kvm-rhev version) had a bug that made it forward host CPU cache information unconditionally, but this will be fixed in 7.3. See bug 1184125 Like on scenario A, I would like to understand what are the real world problems that are caused by this. This way, we can look for workarounds if necessary. (In reply to Eduardo Habkost from comment #4) > (In reply to Timothy Rees from comment #0) [...] > > Actual results: > > Guest shows Sockets/Cores/Threads as 4/16/1 > > > > Expected results: > > Per the host output and xml configuration, guest should show > > Sockets/Cores/Threads as 4/15/2 > > This is unexpected, I will investigate. > I have reproduced the issue, and noticed that the configuration shown on the bug description is confusing lscpu because you are placing CPU threads from the same core in different NUMA nodes. In other words, this: <topology sockets='4' cores='15' threads='2'/> means that CPUs 0-29 will be in socket 0, 30-59 in socket 1, 60-89 in socket 2, 90-119 in socket 3. Inside socket 0: CPUs 0-1 will be in core 0 CPUs 2-3 will be in core 1 [...] CPUs 12-13 will be in core 6 CPUs 14-15 will be in core 7 [...] CPUs 26-27 will be in core 13 CPUs 28-29 will be in core 14 The same pattern will repeat in the other sockets. (Note that the CPU numbers above won't match the host because the CPU numbers in the host are arbitrary. We order CPUs in the ACPI tables by socket/core/thread IDs (more exactly, by APIC ID, which encodes the socket/core/thread IDs), and the host seems to order them in a different way. You can check what's the exact ordering using 'lscpu -e' in the host.) However: <numa> <cell id='0' cpus='0-14,60-74' memory='131072000' unit='KiB'/> <cell id='1' cpus='15-29,75-89' memory='131072000' unit='KiB'/> <cell id='2' cpus='30-44,90-104' memory='131072000' unit='KiB'/> <cell id='3' cpus='45-59,105-119' memory='121072640' unit='KiB'/> </numa> will place one thread of core 7 (CPU14) in NUMA node 0, and another thread (CPU 15) in NUMA node 1. This configuration doesn't make sense. I don't know what's the exact host NUMA topology in the host, but I assume you want to place all cores from the same socket inside the same NUMA node. Probably something like: <numa> <cell id='0' cpus='0-29' memory='131072000' unit='KiB'/> <cell id='1' cpus='30-59' memory='131072000' unit='KiB'/> <cell id='2' cpus='60-89' memory='131072000' unit='KiB'/> <cell id='3' cpus='90-119' memory='121072640' unit='KiB'/> </numa> To confirm you are reproducing exactly the same topology from the host, you can compare the data shown in 'lscpu -e' in the host and in the guest (ignoring the first column, that is an arbitrary CPU number). Discussion was started in qemu-devel to see if we can provide mechanisms to make this less confusing to configure: http://article.gmane.org/gmane.comp.emulators.qemu/423961 |