Bug 1188200
| Summary: | hotplugged vcpu is not consistent with guest NUMA topology | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Jincheng Miao <jmiao> | ||||||
| Component: | qemu-kvm-rhev | Assignee: | Igor Mammedov <imammedo> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 7.1 | CC: | dyuan, ehabkost, hhuang, honzhang, huding, imammedo, juzhang, lhuang, mrezanin, mzhan, virt-maint | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | |||||||||
| : | 1188205 (view as bug list) | Environment: | |||||||
| Last Closed: | 2015-12-04 16:26:39 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1162080 | ||||||||
| Bug Blocks: | 1188205 | ||||||||
| Attachments: |
|
||||||||
Can you please check if the fix for bug 1162080 at http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8682975 affects this bug too? (In reply to Eduardo Habkost from comment #1) > Can you please check if the fix for bug 1162080 at > http://brewweb.devel.redhat.com/brew/taskinfo?taskID=8682975 affects this > bug too? Hi Eduardo, I tested the package, it doesn't meet this problem: 1. check NUMA topology <in guest># numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 1023 MB node 0 free: 634 MB node 1 cpus: node 1 size: 1023 MB node 1 free: 996 MB node distances: node 0 1 0: 10 20 1: 20 10 2. hotplug 2 vcpus # virsh setvcpus rhel7 4 3. recheck NUMA topology <in guest># numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 1023 MB node 0 free: 633 MB node 1 cpus: 2 3 node 1 size: 1023 MB node 1 free: 993 MB node distances: node 0 1 0: 10 20 1: 20 10 One more question, if I specified the cpus are overlapped, qemu also exports wrong NUMA topology: guest NUMA node 0: 0-3,8-11,17 guest NUMA node 1: 4-7,12-17 the qemu cmdline likes: # /usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,maxcpus=19,sockets=19,cores=1,threads=1 -object memory-backend-ram,size=1024M,id=ram-node0,host-nodes=0,policy=preferred -numa node,nodeid=0,cpus=0-3,cpus=8-11,cpus=17,memdev=ram-node0 -object memory-backend-ram,size=1024M,id=ram-node1,host-nodes=0,policy=preferred -numa node,nodeid=1,cpus=4-7,cpus=12-17,memdev=ram-node1 -uuid 1edfafc5-a55a-4396-9595-46e590bfc79a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/jmiao/r71.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=24 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ab:7e:68,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on then I hotplugged all 19 vcpus, and checking in guest: <in guest># numactl --hard # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 8 9 10 11 node 0 size: 1023 MB node 0 free: 593 MB node 1 cpus: 4 5 6 7 12 13 14 node 1 size: 1023 MB node 1 free: 983 MB node distances: node 0 1 0: 10 20 1: 20 10 we could see vcpu #15 #16 #17 are gone. (In reply to Jincheng Miao from comment #3) > we could see vcpu #15 #16 #17 are gone. Is this using the latest official build, or the scratch build from comment #1? (In reply to Eduardo Habkost from comment #4) > (In reply to Jincheng Miao from comment #3) > > we could see vcpu #15 #16 #17 are gone. > > Is this using the latest official build, or the scratch build from comment > #1? I tested it on scratch build. Confirming that fix for bug 1162080 should fix issue reported here. In linux kernel: acpi_numa_processor_affinity_init() ... if ((pa->flags & ACPI_SRAT_CPU_ENABLED) == 0) return; so it will bail out early without initializing NUMA stuff if CPU is marked as disable in SRAT. Also found a issue like this with machine type rhel6.6.0 and rhel6.5.0:
1.prepare a happy vm have settings like this:
...
<vcpu placement='static' current='2'>4</vcpu>
...
<os>
<type arch='x86_64' machine='rhel6.6.0'>hvm</type>
<boot dev='hd'/>
</os>
...
<cpu>
<numa>
<cell id='0' cpus='0,2' memory='512000'/>
<cell id='1' cpus='1,3' memory='512000'/>
</numa>
</cpu>
2. start it and check the numa structure in vm:
# virsh start test3
Domain test3 started
IN GUEST:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 499 MB
node 0 free: 410 MB
node 1 cpus: 1
node 1 size: 499 MB
node 1 free: 443 MB
node distances:
node 0 1
0: 10 20
1: 20 10
3. hot-plug vcpu to 3:
# virsh setvcpus test3 3
IN GUEST:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 499 MB
node 0 free: 400 MB
node 1 cpus: 1 2
node 1 size: 499 MB
node 1 free: 442 MB
node distances:
node 0 1
0: 10 20
1: 20 10
4.hot-plug vcpu to 4:
# virsh setvcpus test3 4
IN GUEST:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0
node 0 size: 499 MB
node 0 free: 400 MB
node 1 cpus: 1 2 3
node 1 size: 499 MB
node 1 free: 442 MB
node distances:
node 0 1
0: 10 20
1: 20 10
and write here to track if it will be fixed after bug 1162080 fixed.
(In reply to Luyao Huang from comment #7) > and write here to track if it will be fixed after bug 1162080 fixed. bug 1162080 has been fixed, please retest. (In reply to Igor Mammedov from comment #8) > (In reply to Luyao Huang from comment #7) > > and write here to track if it will be fixed after bug 1162080 fixed. > > bug 1162080 has been fixed, please retest. retest with qemu-kvm-rhev-2.2.0-5.el7.x86_64 and still get the same result (In reply to Luyao Huang from comment #9) > (In reply to Igor Mammedov from comment #8) > > (In reply to Luyao Huang from comment #7) > > > and write here to track if it will be fixed after bug 1162080 fixed. > > > > bug 1162080 has been fixed, please retest. > > retest with qemu-kvm-rhev-2.2.0-5.el7.x86_64 and still get the same result could you provide qemu command line that was used? Still waiting for information requested on comment #10. (In reply to Eduardo Habkost from comment #11) > Still waiting for information requested on comment #10. Sorry for the delay,I think i missed this comment during check mail box. This is my VM cmdline: LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=spice /usr/libexec/qemu-kvm -name test3 -S -machine rhel6.5.0,accel=kvm,usb=off -m 1000 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0,cpus=2,mem=500 -numa node,nodeid=1,cpus=1,cpus=3,mem=500 -uuid 7347d748-f7ce-448f-8d49-3d29c9bcac30 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -device usb-ccid,id=ccid0 -drive file=/var/lib/libvirt/images/r7_latest.img,if=none,id=drive-ide0-0-0,format=raw -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -netdev tap,fd=24,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:1a:cb:3f,bus=pci.0,addr=0x8 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/r6.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=9,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5900,addr=0.0.0.0,disable-ticketing,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=134217728,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on it works for me with latest 7.2 qemu-kvm-rhev-2.3.0 where we've got fix via rebase and 7.1-z stream qemu-kvm-rhev-2.1.2-23 where it was backported via bug 1191385 So please retest and close bug. Hi, Igor: Test latest qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-22.el7.x86_64 and the kernel is 3.10.0-313.el7.x86_64. Do test same as comment #0 and comment #3, the result is ok. But test comment #7, the result is failed. Best regards Huiqing The detailed steps of failed as following: 1. boot guest with "-machine rhel6.5.0" /usr/libexec/qemu-kvm -name test3 -S -machine rhel6.5.0,accel=kvm,usb=off -m 1000 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0,cpus=2,mem=500 -numa node,nodeid=1,cpus=1,cpus=3,mem=500 -uuid 7347d748-f7ce-448f-8d49-3d29c9bcac30 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test3.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -device usb-ccid,id=ccid0 -drive file=/home/RHEL-Server-7.2-64-virtio.qcow2,if=none,id=drive-ide0-0-0,format=qcow2 -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/r6.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=9,chardev=charchannel1,id=channel1,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5901,disable-ticketing,seamless-migration=on -k en-us -device qxl-vga,id=video0,ram_size=134217728,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg timestamp=on -qmp tcp:0:4445,server,nowait -monitor stdio -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=4e:63:28:bc:b1:25 2. check numa topology inside guest: # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 499 MB node 0 free: 99 MB node 1 cpus: 1 node 1 size: 499 MB node 1 free: 97 MB node distances: node 0 1 0: 10 20 3. hotplug two vcpus: {"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"} {"return": {}, "id": "libvirt-10"} {"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"} {"return": {}, "id": "libvirt-11"} 4. check numa topology inside guest: # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 499 MB node 0 free: 114 MB node 1 cpus: 1 2 3 node 1 size: 499 MB node 1 free: 81 MB node distances: node 0 1 0: 10 20 1: 20 10 after step4, vcpu 0 and 2 should be in node0 and vcpu 1 and 3 should be node1. (In reply to huiqingding from comment #17) > Hi, Igor: > > Test latest qemu-kvm-rhev: qemu-kvm-rhev-2.3.0-22.el7.x86_64 and the kernel > is 3.10.0-313.el7.x86_64. > > Do test same as comment #0 and comment #3, the result is ok. > But test comment #7, the result is failed. it works for me as expected with qemu-kvm-rhev-2.3.0-22.el7.x86_64 and 7.2 guest: /usr/libexec/qemu-kvm -enable-kvm -m 4G -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa node,nodeid=0,cpus=0,cpus=2 -numa node,nodeid=1,cpus=1,cpus=3 /dev/slow/rhel72 -monitor stdio -machine rhel6.5.0,accel=kvm,usb=off > > Best regards > Huiqing > > The detailed steps of failed as following: > 1. boot guest with "-machine rhel6.5.0" > /usr/libexec/qemu-kvm -name test3 -S -machine rhel6.5.0,accel=kvm,usb=off -m > 1000 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -numa > node,nodeid=0,cpus=0,cpus=2,mem=500 -numa > node,nodeid=1,cpus=1,cpus=3,mem=500 -uuid > 7347d748-f7ce-448f-8d49-3d29c9bcac30 -no-user-config -nodefaults -chardev > socket,id=charmonitor,path=/var/lib/libvirt/qemu/test3.monitor,server,nowait > -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown > -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x6 -device > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -device > usb-ccid,id=ccid0 -drive > file=/home/RHEL-Server-7.2-64-virtio.qcow2,if=none,id=drive-ide0-0-0, > format=qcow2 -device > ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 > -chardev pty,id=charserial0 -device > isa-serial,chardev=charserial0,id=serial0 -chardev > socket,id=charchannel0,path=/var/lib/libvirt/qemu/r6.agent,server,nowait > -device > virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel0,id=channel0, > name=org.qemu.guest_agent.0 -chardev spicevmc,id=charchannel1,name=vdagent > -device > virtserialport,bus=virtio-serial0.0,nr=9,chardev=charchannel1,id=channel1, > name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice > port=5901,disable-ticketing,seamless-migration=on -k en-us -device > qxl-vga,id=video0,ram_size=134217728,vram_size=67108864,vgamem_mb=16,bus=pci. > 0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -msg > timestamp=on -qmp tcp:0:4445,server,nowait -monitor stdio > -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device > virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=4e:63:28:bc:b1:25 > > 2. check numa topology inside guest: > # numactl --hard > available: 2 nodes (0-1) > node 0 cpus: 0 > node 0 size: 499 MB > node 0 free: 99 MB > node 1 cpus: 1 > node 1 size: 499 MB > node 1 free: 97 MB > node distances: > node 0 1 > 0: 10 20 > > 3. hotplug two vcpus: > {"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"} > {"return": {}, "id": "libvirt-10"} > {"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"} > {"return": {}, "id": "libvirt-11"} > > 4. check numa topology inside guest: > # numactl --hard > available: 2 nodes (0-1) > node 0 cpus: 0 > node 0 size: 499 MB > node 0 free: 114 MB > node 1 cpus: 1 2 3 > node 1 size: 499 MB > node 1 free: 81 MB > node distances: > node 0 1 > 0: 10 20 > 1: 20 10 > > after step4, vcpu 0 and 2 should be in node0 and vcpu 1 and 3 should be > node1. huiqingding, Could you dump SRAT ACPI table from guest for the case where it doesn't work for you and attach complete guest's log. I have test it on again and still can reproduce it with qemu-kvm-rhev-2.3.0-21.el7.x86_64 libvirt-1.2.17-8.el7.x86_64 and guest kernel is 3.10.0-314.el7.x86_64:
1.
# virsh dumpxml rhel7.0-rhel
...
<vcpu placement='auto' current='3'>4</vcpu>
...
<os>
<type arch='x86_64' machine='rhel6.6.0'>hvm</type>
<boot dev='hd'/>
</os>
...
<cpu mode='custom' match='exact'>
<model fallback='allow'>Opteron_G5</model>
<numa>
<cell id='0' cpus='0,2' memory='2024448' unit='KiB'/>
<cell id='1' cpus='1,3' memory='2024448' unit='KiB'/>
</numa>
</cpu>
...
2. start guest:
# virsh start rhel7.0-rhel
Domain rhel7.0-rhel started
3. hot-plug cpu:
# virsh setvcpus rhel7.0-rhel 4
4. check in guest:
IN GUEST:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 2 3
node 0 size: 1976 MB
node 0 free: 1420 MB
node 1 cpus: 1
node 1 size: 1976 MB
node 1 free: 1691 MB
node distances:
node 0 1
0: 10 20
1: 20 10
5. and i notice this in guest dmesg :
[ 20.508940] smpboot: Booting Node 1 Processor 3 APIC 0x3
6. i will attach the srat.dsl:
7. guest cmdline:
/usr/libexec/qemu-kvm -name rhel7.0-rhel -S -machine rhel6.6.0,accel=kvm,usb=off -cpu Opteron_G5 -m 3954 -realtime mlock=off -smp 3,maxcpus=4,sockets=4,cores=1,threads=1 -object iothread,id=iothread1 -numa node,nodeid=0,cpus=0,cpus=2,mem=1977 -numa node,nodeid=1,cpus=1,cpus=3,mem=1977 -uuid 67c7a123-5415-4136-af62-a2ee098ba6cd -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-rhel7.0-rhel/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/fs/r7_ext4.raw,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:af:19:fb,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -chardev socket,id=charchannel1,path=/var/lib/libvirt/qemu/r6.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0 -spice port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev pty,id=charredir0 -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on
Created attachment 1073470 [details]
srat dump 1
(In reply to Igor Mammedov from comment #19) > huiqingding, > > Could you dump SRAT ACPI table from guest for the case where it doesn't work > for you and attach complete guest's log. After hotplug two vcpus as comment #17, inside guest: # acpidump > acpidump.bin [root@dhcp-10-17 ~]# acpixtract ./acpidump.bin Intel ACPI Component Architecture ACPI Binary Table Extraction Utility version 20150619-64 Copyright (c) 2000 - 2015 Intel Corporation Acpi table [DSDT] - 4405 bytes written to dsdt.dat Acpi table [SSDT] - 2364 bytes written to ssdt.dat [root@dhcp-10-17 ~]# iasl -e ssdt.dat -d dsdt.dat Intel ACPI Component Architecture ASL+ Optimizing Compiler version 20150619-64 Copyright (c) 2000 - 2015 Intel Corporation Reading ACPI table from file dsdt.dat - Length 00004405 (0x001135) ACPI: DSDT 0x0000000000000000 001135 (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001) Acpi table [DSDT] successfully installed and loaded Reading ACPI table from file ssdt.dat - Length 00002364 (0x00093C) ACPI: SSDT 0x0000000000000000 00093C (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001) Acpi table [SSDT] successfully installed and loaded Pass 1 parse of [SSDT] Pass 2 parse of [SSDT] Pass 1 parse of [DSDT] Pass 2 parse of [DSDT] Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions) Parsing completed Found 2 external control methods, reparsing with new information Pass 1 parse of [DSDT] Pass 2 parse of [DSDT] Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions) Parsing completed Disassembly completed ASL Output: dsdt.dsl - 48923 bytes [root@dhcp-10-17 ~]# PS: the info of the host is as following: # lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 4 Vendor ID: AuthenticAMD CPU family: 21 Model: 1 Model name: AMD Opteron(TM) Processor 6272 Stepping: 2 CPU MHz: 2100.093 BogoMIPS: 4199.78 Virtualization: AMD-V L1d cache: 16K L1i cache: 64K L2 cache: 2048K L3 cache: 6144K NUMA node0 CPU(s): 0,2,4,6,8,10,12,14 NUMA node1 CPU(s): 16,18,20,22,24,26,28,30 NUMA node2 CPU(s): 1,3,5,7,9,11,13,15 NUMA node3 CPU(s): 17,19,21,23,25,27,29,31 Reproduect this bug using:
kernel-3.10.0-316.el7.x86_64
qemu-kvm-rhev-2.1.2-23.el7.x86_64
The detailed steps of failed as following:
1. boot guest with two numa nodes
/usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-file,prealloc=yes,mem-path=/home/kvm_hugepage,size=1024M,id=ram-node0,host-nodes=1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,prealloc=yes,mem-path=/home/kvm_hugepage,size=1024M,id=ram-node1,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid 1edfafc5-a55a-4396-9595-46e590bfc79a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/rhel7.2.raw,if=none,id=drive-ide0-0-0,format=raw -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc :0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-pci,netdev=hostnet0,id=virtio-net-pci0,mac=4e:63:28:bc:b1:25 -monitor stdio -qmp tcp:0:4445,server,nowait
2. check numa topology inside guest:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1
node 0 size: 1023 MB
node 0 free: 605 MB
node 1 cpus:
node 1 size: 1023 MB
node 1 free: 996 MB
node distances:
node 0 1
0: 10 20
1: 20 10
3. hotplug two vcpus:
{"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"}
{"return": {}, "id": "libvirt-10"}
{"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"}
{"return": {}, "id": "libvirt-11"}
4. check numa topology inside guest:
# numactl --hard
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3
node 0 size: 1023 MB
node 0 free: 605 MB
node 1 cpus:
node 1 size: 1023 MB
node 1 free: 996 MB
node distances:
node 0 1
0: 10 20
1: 20 10
after step4, vcpu 2 and 3 should be in node1.
Test this bug using: kernel-3.10.0-316.el7.x86_64 qemu-kvm-rhev-2.3.0-23.el7.x86_64 Test machine type "-rhel7.1.0" and "-rhel7.2.0". The test steps is same as comment #24, the results are pass. After step4, vcpu 2 and 3 is in node1: available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 1023 MB node 0 free: 604 MB node 1 cpus: 2 3 node 1 size: 1023 MB node 1 free: 993 MB node distances: node 0 1 0: 10 20 1: 20 10 Based on comment #22 and comment 25, I think this bug has been fixed. thanks. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2546.html |
Created attachment 987053 [details] SRAT table Description of problem: Prepare guest which has NUMA topology, and set current vcpus less than maxcpus. Hotplugging vcpu is not not consistent with guest NUMA topology user specified in cmdline. version: libvirt-1.2.8-15.el7.x86_64 qemu-kvm-rhev-2.1.2-21.el7.x86_64 3.10.0-223.el7.x86_64 How reproducible: 100% Step to reproduce: 1. setup guest NUMA topology: 2 nodes, each node has 2 vcpus # virsh edit rhel7 ... <vcpu placement='auto'>4</vcpu> ... <cpu> <numa> <cell id='0' cpus='0-1' memory='1048576'/> <cell id='1' cpus='2-3' memory='1048576'/> </numa> </cpu> the qemu cmdline is: /usr/libexec/qemu-kvm -name rhel7 -S -machine pc-i440fx-rhel7.1.0,accel=kvm,usb=off -m 2048 -realtime mlock=off -smp 2,maxcpus=4,sockets=4,cores=1,threads=1 -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages1G/libvirt/qemu,size=1024M,id=ram-node0,host-nodes=1,policy=bind -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 -object memory-backend-file,prealloc=yes,mem-path=/dev/hugepages2M/libvirt/qemu,size=1024M,id=ram-node1,host-nodes=1,policy=bind -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 -uuid 1edfafc5-a55a-4396-9595-46e590bfc79a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive file=/mnt/jmiao/r71.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:ab:7e:68,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on 2. check in guest <guest> # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 node 0 size: 1023 MB node 0 free: 653 MB node 1 cpus: node 1 size: 1023 MB node 1 free: 995 MB node distances: node 0 1 0: 10 20 1: 20 10 3. hotplug vcpu2 and vcpu3 # virsh setvcpus rhel7 4 the QMP commands libvirtd used are: 53.925 > 0x7fa8202381e0 {"execute":"cpu-add","arguments":{"id":2},"id":"libvirt-10"} 53.943 < 0x7fa8202381e0 {"return": {}, "id": "libvirt-10"} 53.944 > 0x7fa8202381e0 {"execute":"cpu-add","arguments":{"id":3},"id":"libvirt-11"} 53.961 < 0x7fa8202381e0 {"return": {}, "id": "libvirt-11"} 53.962 > 0x7fa8202381e0 {"execute":"query-cpus","id":"libvirt-12"} 53.971 < 0x7fa8202381e0 {"return": [{"current": true, "CPU": 0, "pc": -2130073539, "halted": false, "thread_id": 23454}, {"current": false, "CPU": 1, "pc": -2127686014, "halted": false, "thread_id": 23456}, {"current": false, "CPU": 2, "pc": 4294967280, "halted": false, "thread_id": 23491}, {"current": false, "CPU": 3, "pc": 4294967280, "halted": false, "thread_id": 23492}], "id": "libvirt-12"} 4. checking in guest <guest> # numactl --hard available: 2 nodes (0-1) node 0 cpus: 0 1 2 node 0 size: 1023 MB node 0 free: 644 MB node 1 cpus: 3 node 1 size: 1023 MB node 1 free: 993 MB node distances: node 0 1 0: 10 20 1: 20 10 As we could see, vcpu2 is not located in guest NUMA node1. Additional info: The SRAT table is attached, and is generated by: [root@localhost ~]# acpidump > acpidump.bin [root@localhost ~]# acpixtract ./acpidump.bin Intel ACPI Component Architecture ACPI Binary Table Extraction Utility version 20140926-64 [Sep 29 2014] Copyright (c) 2000 - 2014 Intel Corporation Acpi table [DSDT] - 2807 bytes written to dsdt.dat Acpi table [SSDT] - 3356 bytes written to ssdt.dat [root@localhost ~]# iasl -e ssdt.dat -d dsdt.dat Intel ACPI Component Architecture ASL Optimizing Compiler version 20140926-64 [Sep 29 2014] Copyright (c) 2000 - 2014 Intel Corporation Loading Acpi table from file dsdt.dat - Length 00002807 (000AF7) ACPI: DSDT 0x0000000000000000 000AF7 (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001) Acpi table [DSDT] successfully installed and loaded Loading Acpi table from file ssdt.dat - Length 00003356 (000D1C) ACPI: SSDT 0x0000000000000000 000D1C (v01 BOCHS BXPCSSDT 00000001 BXPC 00000001) Acpi table [SSDT] successfully installed and loaded Pass 1 parse of [SSDT] Pass 2 parse of [SSDT] Pass 1 parse of [DSDT] Pass 2 parse of [DSDT] Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions) Parsing completed Found 3 external control methods, reparsing with new information Pass 1 parse of [DSDT] Pass 2 parse of [DSDT] Parsing Deferred Opcodes (Methods/Buffers/Packages/Regions) Parsing completed Disassembly completed ASL Output: dsdt.dsl - 30129 bytes