Bug 1351160
| Summary: | qemu-kvm does not expose expected cpu topology to guest with numa defined | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Timothy Rees <trees> |
| Component: | qemu-kvm-rhev | Assignee: | Eduardo Habkost <ehabkost> |
| Status: | CLOSED NOTABUG | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.2 | CC: | ailan, cgiunta, djdumas, drjones, fdanapfe, knoel, linux, mkoch, peter.engel, rsibley, srao, trees, virt-maint |
| Target Milestone: | rc | ||
| Target Release: | 7.3 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2016-07-02 00:13:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1018952 | ||
Please provide the qemu command lines ('pgrep -a qemu' while the guest is running).
Also, we should split scenarios A and B into two separate bugs; Scenario A deals with the topology change, and B with the unexpected guest cpu model (I think B is likely not-a-bug, but I'll allow others to respond)
(In reply to Andrew Jones from comment #2) > Please provide the qemu command lines ('pgrep -a qemu' while the guest is > running). > > Also, we should split scenarios A and B into two separate bugs; Scenario A > deals with the topology change, and B with the unexpected guest cpu model (I > think B is likely not-a-bug, but I'll allow others to respond) [root@hypervisor ~]# pgrep -a qemu 103617 /usr/libexec/qemu-kvm -name rhel7.0-2 -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off,vmport=off -cpu host -m 502235 -realtime mlock=off -smp 120,sockets=4,cores=15,threads=2 -numa node,nodeid=0,cpus=0-14,cpus=60-74,mem=128000 -numa node,nodeid=1,cpus=15-29,cpus=75-89,mem=128000 -numa node,nodeid=2,cpus=30-44,cpus=90-104,mem=128000 -numa node,nodeid=3,cpus=45-59,cpus=105-119,mem=118235 -uuid 0174311e-bf55-41c9-be80-d097bc3f73e4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7.0-2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/dev/vg_sys_r1/lv_vm_rhel7,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk1,id=virtio-disk1,bootindex=1 -drive file=/dev/vg_sys_r1/lv_vm_rhel7_usr_sap,if=none,id=drive-virtio-disk2,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/dev/vg_sys_r1/lv_hana,if=none,id=drive-virtio-disk3,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk3,id=virtio-disk3 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0e:c7:25,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-rhel7.0-2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on (In reply to Timothy Rees from comment #0) [...] > Test with 2 scenarios: > A: Guest cpu configured with mode=host-passthrough > B: Guest cpu configured with mode=custom match=exact [...] > SCENARIO A: [...] > 3. Boot and login to the guest: > [root@guestvm ~]# lscpu [...] > CPU(s): 120 > On-line CPU(s) list: 0-119 > Thread(s) per core: 1 > Core(s) per socket: 16 > Socket(s): 4 > NUMA node(s): 4 [...] > Actual results: > Guest shows Sockets/Cores/Threads as 4/16/1 > > Expected results: > Per the host output and xml configuration, guest should show > Sockets/Cores/Threads as 4/15/2 This is unexpected, I will investigate. But note that each socket still have 15 cores each. The only difference is that the guest assumes that the _maximum_ number of cores per socket is 16, the _actual_ number of cores per socket is exactly the one requested. Core ID 15 won't be present in any of the sockets, only core IDs 0-14. This way, the CPU core topology and NUMA topology still match the host as expected. I would like to understand what are the real world problems that are caused by this behavior. This way, we can look for workarounds if necessary, and will be able to justify changing QEMU behavior upstream. > > SCENARIO B: > 3. Boot and login to the guest: > [root@guestvm ~]# lscpu [...] > CPU(s): 120 > On-line CPU(s) list: 0-119 > Thread(s) per core: 1 > Core(s) per socket: 16 > Socket(s): 4 > NUMA node(s): 4 [...] > Actual results: > Guest shows Sockets/Cores/Threads as 4/16/1 See comment about scenario A above. > Guest CPU Model does not match host CPU Model > Guest CPU Caches do not match host CPU caches > This is expected. CPU model will match exactly only if using host-passthrough. CPU caches will match the host only if using "-cpu host" (host-passthrough on libvirt) and "host-cache-info=on" (not supported by libvirt yet). Note: Old versions of QEMU (including the current 7.2 qemu-kvm-rhev version) had a bug that made it forward host CPU cache information unconditionally, but this will be fixed in 7.3. See bug 1184125 Like on scenario A, I would like to understand what are the real world problems that are caused by this. This way, we can look for workarounds if necessary. (In reply to Eduardo Habkost from comment #4) > (In reply to Timothy Rees from comment #0) [...] > > Actual results: > > Guest shows Sockets/Cores/Threads as 4/16/1 > > > > Expected results: > > Per the host output and xml configuration, guest should show > > Sockets/Cores/Threads as 4/15/2 > > This is unexpected, I will investigate. > I have reproduced the issue, and noticed that the configuration shown on the bug description is confusing lscpu because you are placing CPU threads from the same core in different NUMA nodes. In other words, this: <topology sockets='4' cores='15' threads='2'/> means that CPUs 0-29 will be in socket 0, 30-59 in socket 1, 60-89 in socket 2, 90-119 in socket 3. Inside socket 0: CPUs 0-1 will be in core 0 CPUs 2-3 will be in core 1 [...] CPUs 12-13 will be in core 6 CPUs 14-15 will be in core 7 [...] CPUs 26-27 will be in core 13 CPUs 28-29 will be in core 14 The same pattern will repeat in the other sockets. (Note that the CPU numbers above won't match the host because the CPU numbers in the host are arbitrary. We order CPUs in the ACPI tables by socket/core/thread IDs (more exactly, by APIC ID, which encodes the socket/core/thread IDs), and the host seems to order them in a different way. You can check what's the exact ordering using 'lscpu -e' in the host.) However: <numa> <cell id='0' cpus='0-14,60-74' memory='131072000' unit='KiB'/> <cell id='1' cpus='15-29,75-89' memory='131072000' unit='KiB'/> <cell id='2' cpus='30-44,90-104' memory='131072000' unit='KiB'/> <cell id='3' cpus='45-59,105-119' memory='121072640' unit='KiB'/> </numa> will place one thread of core 7 (CPU14) in NUMA node 0, and another thread (CPU 15) in NUMA node 1. This configuration doesn't make sense. I don't know what's the exact host NUMA topology in the host, but I assume you want to place all cores from the same socket inside the same NUMA node. Probably something like: <numa> <cell id='0' cpus='0-29' memory='131072000' unit='KiB'/> <cell id='1' cpus='30-59' memory='131072000' unit='KiB'/> <cell id='2' cpus='60-89' memory='131072000' unit='KiB'/> <cell id='3' cpus='90-119' memory='121072640' unit='KiB'/> </numa> To confirm you are reproducing exactly the same topology from the host, you can compare the data shown in 'lscpu -e' in the host and in the guest (ignoring the first column, that is an arbitrary CPU number). Discussion was started in qemu-devel to see if we can provide mechanisms to make this less confusing to configure: http://article.gmane.org/gmane.comp.emulators.qemu/423961 |
Description of problem: When creating a single large VM with qemu-kvm to replicate the host CPU and NUMA topology, the CPU topology does not display correctly on the guest. We are currently testing SAP HANA in a dedicated large VM on a single RHEV host and it is vital that qemu shows the correct CPU topology, NUMA topology and proc siblings information as part of the performance testing. Test with 2 scenarios: A: Guest cpu configured with mode=host-passthrough B: Guest cpu configured with mode=custom match=exact Version-Release number of selected component (if applicable): libvirt-1.2.17-13.el7_2.4.x86_64 libvirt-client-1.2.17-13.el7_2.4.x86_64 qemu-kvm-rhev-2.3.0-31.el7_2.13.x86_64 How reproducible: 100% SCENARIO A: Steps to Reproduce: 1. Create a VM on the host using maximum number of processors available 2. Edit the XML and add in the following elements: ............ <cpu mode='host-passthrough'> <topology sockets='4' cores='15' threads='2'/> <model fallback='forbid'/> <numa> <cell id='0' cpus='0-14,60-74' memory='131072000' unit='KiB'/> <cell id='1' cpus='15-29,75-89' memory='131072000' unit='KiB'/> <cell id='2' cpus='30-44,90-104' memory='131072000' unit='KiB'/> <cell id='3' cpus='45-59,105-119' memory='121072640' unit='KiB'/> </numa> </cpu> ............ 3. Boot and login to the guest: [root@guestvm ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 120 On-line CPU(s) list: 0-119 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz Stepping: 7 CPU MHz: 2493.988 BogoMIPS: 4987.97 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,60-74 NUMA node1 CPU(s): 15-29,75-89 NUMA node2 CPU(s): 30-44,90-104 NUMA node3 CPU(s): 45-59,105-119 # numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 0 size: 127999 MB node 0 free: 125072 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 1 size: 128000 MB node 1 free: 125162 MB node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 node 2 size: 128000 MB node 2 free: 125220 MB node 3 cpus: 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 node 3 size: 118235 MB node 3 free: 115634 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10 Actual results: Guest shows Sockets/Cores/Threads as 4/16/1 Expected results: Per the host output and xml configuration, guest should show Sockets/Cores/Threads as 4/15/2 SCENARIO B: Steps to Reproduce: 1. Create a VM on the host using maximum number of processors available 2. Edit the XML and add in the following elements, using the same number of sockets/cores/threads as defined on the host: ............ <cpu mode='custom' match='exact'> <topology sockets='4' cores='15' threads='2'/> <model fallback='forbid'/> <numa> <cell id='0' cpus='0-14,60-74' memory='131072000' unit='KiB'/> <cell id='1' cpus='15-29,75-89' memory='131072000' unit='KiB'/> <cell id='2' cpus='30-44,90-104' memory='131072000' unit='KiB'/> <cell id='3' cpus='45-59,105-119' memory='121072640' unit='KiB'/> </numa> </cpu> ............ 3. Boot and login to the guest: [root@guestvm ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 120 On-line CPU(s) list: 0-119 Thread(s) per core: 1 Core(s) per socket: 16 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 13 Model name: QEMU Virtual CPU version 2.3.0 Stepping: 3 CPU MHz: 2493.988 BogoMIPS: 4987.97 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K NUMA node0 CPU(s): 0-14,60-74 NUMA node1 CPU(s): 15-29,75-89 NUMA node2 CPU(s): 30-44,90-104 NUMA node3 CPU(s): 45-59,105-119 # numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 0 size: 127999 MB node 0 free: 125072 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 1 size: 128000 MB node 1 free: 125162 MB node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 node 2 size: 128000 MB node 2 free: 125220 MB node 3 cpus: 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 node 3 size: 118235 MB node 3 free: 115634 MB node distances: node 0 1 2 3 0: 10 20 20 20 1: 20 10 20 20 2: 20 20 10 20 3: 20 20 20 10 Actual results: Guest shows Sockets/Cores/Threads as 4/16/1 Guest CPU Model does not match host CPU Model Guest CPU Caches do not match host CPU caches Expected results: Per the host output and xml configuration, guest should show Sockets/Cores/Threads as 4/15/2, cpu model and caches should match that of the host. Additional info: Host lscpu output [root@hypervisor ~]# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 120 On-line CPU(s) list: 0-119 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-4880 v2 @ 2.50GHz Stepping: 7 CPU MHz: 1823.925 BogoMIPS: 4996.38 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,60-74 NUMA node1 CPU(s): 15-29,75-89 NUMA node2 CPU(s): 30-44,90-104 NUMA node3 CPU(s): 45-59,105-119 [root@hypervisor ~]# numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 node 0 size: 130943 MB node 0 free: 116339 MB node 1 cpus: 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 node 1 size: 131072 MB node 1 free: 126032 MB node 2 cpus: 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 node 2 size: 131072 MB node 2 free: 127861 MB node 3 cpus: 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 node 3 size: 131072 MB node 3 free: 126456 MB node distances: node 0 1 2 3 0: 10 11 11 11 1: 11 10 11 11 2: 11 11 10 11 3: 11 11 11 10 * We have also seen the same behavior exhibited on another server, this is also an IvyBridge box however this has 8 CPUs. * Taking away the NUMA topology from the XML in each of the scenarios will results in the correct Sockets/Cores/Threads showing in the virtual machine, however there is then no NUMA topology defined either in the guest. * This work is taking place with our partner SAP to certify HANA workloads on RHEV. * I will attach full xml files for each scenario.