Bug 1188205

Summary: hotplugged vcpu is not consistent with guest NUMA topology
Product: Red Hat Enterprise Linux 7 Reporter: Jincheng Miao <jmiao>
Component: qemu-kvmAssignee: Igor Mammedov <imammedo>
Status: CLOSED DEFERRED QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.1CC: dyuan, ehabkost, hhuang, honzhang, huding, imammedo, juzhang, lmiksik, mrezanin, mzhan, rbalakri, virt-bugs, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1188200 Environment:
Last Closed: 2015-09-18 18:03:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1188200    
Bug Blocks:    
Attachments:
Description Flags
SRAT table for RHEL none

Description Jincheng Miao 2015-02-02 10:40:05 UTC
Created attachment 987056 [details]
SRAT table for RHEL

+++ This bug was initially created as a clone of Bug #1188200 +++

Description of problem:
Prepare guest which has NUMA topology, and set current vcpus less than maxcpus.
Hotplugging vcpu is not not consistent with guest NUMA topology user specified in cmdline.

version:
libvirt-1.2.8-15.el7.x86_64
qemu-kvm-1.5.3-85.el7.x86_64
3.10.0-223.el7.x86_64

guest:
kernel-3.10.0-223.el7.x86_64

How reproducible:
100%

Step to reproduce:
1. setup guest NUMA topology: 2 nodes, each node has 2 vcpus
# virsh edit rhel7
...
  <vcpu placement='auto'>4</vcpu>
...
  <cpu>
    <numa>
      <cell id='0' cpus='0-1' memory='1048576'/>
      <cell id='1' cpus='2-3' memory='1048576'/>
    </numa>
  </cpu>

the qemu cmdline is:
/usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=1024 -numa node,nodeid=1,cpus=2-3,mem=1024 -uuid 1edfafc5-a55a-4396-9595-46e590bfc79a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/jmiao/r71.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ab:7e:68,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on


2. check in guest
<guest> # numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 1023 MB
node 0 free: 653 MB
node 1 cpus:
node 1 size: 1023 MB
node 1 free: 995 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

3. hotplug vcpu2 and vcpu3
# virsh setvcpus rhel7 4

the QMP commands libvirtd used are:
 53.925 > 0x7fa8202381e0 {"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"}
 53.943 < 0x7fa8202381e0 {"return": {}, "id": "libvirt-10"}
 53.944 > 0x7fa8202381e0 {"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"}
 53.961 < 0x7fa8202381e0 {"return": {}, "id": "libvirt-11"}
 53.962 > 0x7fa8202381e0 {"execute":"query-cpus","id":"libvirt-12"}
 53.971 < 0x7fa8202381e0 {"return": [{"current": true, "CPU": 0, "pc": -2130073539, "halted": false, "thread_id": 23454}, {"current": false, "CPU": 1, "pc": -2127686014, "halted": false, "thread_id": 23456}, {"current": false, "CPU": 2, "pc": 4294967280, "halted": false, "thread_id": 23491}, {"current": false, "CPU": 3, "pc": 4294967280, "halted": false, "thread_id": 23492}], "id": "libvirt-12"}

4. checking in guest
<guest> # numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2
node 0 size: 1023 MB
node 0 free: 644 MB
node 1 cpus: 3
node 1 size: 1023 MB
node 1 free: 993 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10

As we could see, vcpu2 is not located in guest NUMA node1.


Additional info:
The SRAT table is attached, and is generated by:
[root@localhost ~]# acpidump > acpidump.bin

[root@localhost ~]# acpixtract ./acpidump.bin 

Intel ACPI Component Architecture
ACPI Binary Table Extraction Utility version 20140926-64 [Sep 29 2014]
Copyright (c) 2000 - 2014 Intel Corporation

Acpi table [DSDT] - 2807 bytes written to dsdt.dat
Acpi table [SSDT] - 3356 bytes written to ssdt.dat

[root@localhost ~]# iasl -e ssdt.dat -d dsdt.dat 

Intel ACPI Component Architecture
ASL Optimizing Compiler version 20140926-64 [Sep 29 2014]
Copyright (c) 2000 - 2014 Intel Corporation

Loading Acpi table from file   dsdt.dat - Length 00002807 (000AF7)
ACPI: DSDT 0x0000000000000000 000AF7 (v01 BOCHS  BXPCDSDT 00000001 BXPC 00000001)
Acpi table [DSDT] successfully installed and loaded
Loading Acpi table from file   ssdt.dat - Length 00003356 (000D1C)
ACPI: SSDT 0x0000000000000000 000D1C (v01 BOCHS  BXPCSSDT 00000001 BXPC 00000001)
Acpi table [SSDT] successfully installed and loaded
Pass 1 parse of [SSDT]
Pass 2 parse of [SSDT]
Pass 1 parse of [DSDT]
Pass 2 parse of [DSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)

Parsing completed

Found 3 external control methods, reparsing with new information
Pass 1 parse of [DSDT]
Pass 2 parse of [DSDT]
Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions)

Parsing completed
Disassembly completed
ASL Output:    dsdt.dsl - 30129 bytes

Comment 1 Eduardo Habkost 2015-02-03 14:31:00 UTC
Can you please check if the fix for bug 1162080 at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8682975 affects this bug too?

Comment 2 Eduardo Habkost 2015-02-05 17:09:17 UTC
Scratch build was tested for bug 1188200. Clearing needinfo.

Comment 4 Igor Mammedov 2015-09-03 15:25:55 UTC
pls retest with latest qemu-kvm

Comment 7 huiqingding 2015-09-11 03:54:02 UTC
Test this bug using lastest qemu-kvm: qemu-kvm-1.5.3-103.el7.x86_64, found this bug is not fixed.

1. boot guest with two numa nodes:
/usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.0.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0-1,mem=1024 -numa node,nodeid=1,cpus=2-3,mem=1024 -uuid 1edfafc5-a55a-4396-9595-46e590bfc79a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/home/RHEL-Server-7.2-64-virtio.qcow2,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -net none -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on -qmp tcp:0:4445,server,nowait -monitor stdio -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=4e:63:28:bc:b1:25

2. check the numa topology inside guest:
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 1023 MB
node 0 free: 48 MB
node 1 cpus:
node 1 size: 1023 MB
node 1 free: 965 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

3. hotplug two vcpus:
{"execute":"qmp_capabilities"}
{"return": {}}
{"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"}
{"return": {}, "id": "libvirt-10"}
{"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"}
{"return": {}, "id": "libvirt-11"}

4. check the numa topology inside guest:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 1023 MB
node 0 free: 40 MB
node 1 cpus:
node 1 size: 1023 MB
node 1 free: 942 MB
node distances:
node   0   1 
  0:  10  20 
  1:  20  10 

vcpu2 and vcpu3 are not located in guest NUMA node1.

Comment 8 huiqingding 2015-09-11 03:57:20 UTC
Based on the result of comment #7, set this bug to be "ASSIGNED".

If I'm wrong, please fix me. Thanks.

Comment 9 juzhang 2015-09-17 08:54:46 UTC
Hi Igor,

Could you have a look comment7 and add comment?

Best Regards,
Junyi

Comment 10 Eduardo Habkost 2015-09-18 18:03:43 UTC
Guest-side NUMA topology is useless without corresponding host-side NUMA binding setup (which is not supported in qemu-kvm-1.5.3), so no real world use cases are affected.

Closing the qemu-kvm bug as DEFERRED (as it is already fixed upstream and on qemu-kvm-rhev).

Comment 11 juzhang 2015-09-22 02:02:38 UTC
Hi Mrezanin,

Could you please remove this bz from RHEL7.2 qemu-kvm erratum according to comment10?

Best Regards,
Junyi