Description of problem: Current -numa CLI interface is quite limited in terms that allow map CPUs to NUMA nodes as it requires to provide cpu_index values which are non obvious and depend on machine/arch. As result libvirt has to assume/re-implement cpu_index allocation logic to provide valid values for -numa cpus=... QEMU CLI option. Now QEMU has in place generic CPU hotplug interface and ability to query possible CPUs layout (with QMP command query-hotpluggable-cpus), however it requires to run QEMU once per each machine type and topology configuration (-M & -smp combination) which would be too taxing for mgmt layer to do. Currently proposed idea to solve the issue is to do NUMA mapping at runtime: 1. start QEMU in stopped mode with needed -M & -smp configuration but leave out "-numa cpus" options 2. query possible cpus layout (query-hotpluggable-cpus) 3. use new QMP command to map CPUs to NUMA node in terms of generic CPU hotplug interface (socket/core/thread) 3.1 potentially translate mapping as new CLI options so the same configuration could be started without runtime configuration step 4. continue VM exection
Development Management has reviewed and declined this request. You may appeal this decision by reopening this request.
CLI part is merged in qemu-2.10 commit 419fcdec numa: add '-numa cpu,...' option for property based node mapping
we should get in 7.5 with rebase to 2.10
Hi Igor, Could you help confirm only CLI part is done please ? I didn't see the commit or code to implement the qmp map part. > 3. use new QMP command to map CPUs to NUMA node in terms of generic CPU > hotplug interface (socket/core/thread) For CLI part, verify with qemu-kvm-rhev-2.12.0-2.el7: Boot guest with following cmdline, # /usr/libexec/qemu-kvm -m 8G /home/kvm_autotest_root/images/rhel76-64-virtio-scsi.qcow2 -netdev tap,id=tap0 -device virtio-net-pci,id=net0,netdev=tap0 \ -smp 12,sockets=3,cores=2,threads=2 \ -numa node,nodeid=0 \ -numa node,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ -numa node,nodeid=4 \ -numa cpu,node-id=0,socket-id=0 \ -numa cpu,node-id=1,socket-id=1,core-id=0 \ -numa cpu,node-id=2,socket-id=1,core-id=1 \ -numa cpu,node-id=3,socket-id=2,core-id=0,thread-id=0 \ -numa cpu,node-id=3,socket-id=2,core-id=0,thread-id=1 \ -numa cpu,node-id=4,socket-id=2,core-id=1,thread-id=0 \ -numa cpu,node-id=4,socket-id=2,core-id=1,thread-id=1 \ -monitor stdio -vnc :0 -qmp tcp:0:4444,server,nowait Check numa in hmp, (qemu) info numa 5 nodes node 0 cpus: 0 1 2 3 node 0 size: 1632 MB node 0 plugged: 0 MB node 1 cpus: 4 5 node 1 size: 1640 MB node 1 plugged: 0 MB node 2 cpus: 6 7 node 2 size: 1640 MB node 2 plugged: 0 MB node 3 cpus: 8 9 node 3 size: 1640 MB node 3 plugged: 0 MB node 4 cpus: 10 11 node 4 size: 1640 MB node 4 plugged: 0 MB Check numa in guest, # numactl -H available: 5 nodes (0-4) node 0 cpus: 0 1 2 3 node 0 size: 1631 MB node 0 free: 1113 MB node 1 cpus: 4 5 node 1 size: 1639 MB node 1 free: 1101 MB node 2 cpus: 6 7 node 2 size: 1640 MB node 2 free: 1390 MB node 3 cpus: 8 9 node 3 size: 1640 MB node 3 free: 1337 MB node 4 cpus: 10 11 node 4 size: 1640 MB node 4 free: 1429 MB node distances: node 0 1 2 3 4 0: 10 20 20 20 20 1: 20 10 20 20 20 2: 20 20 10 20 20 3: 20 20 20 10 20 4: 20 20 20 20 10 Both the numa topology in hmp and guest are consistent with qemu cmdline. I also tried the qmp hotplug cmd, but it didn't work well. # /usr/libexec/qemu-kvm -smp 4,maxcpus=24,sockets=6,cores=2,threads=2 -numa node,nodeid=0 -numa node,nodeid=1 -numa node,nodeid=2 -monitor stdio -cpu host -qmp tcp:0:4444,server,nowait -S Hotplug one cpu to node 2, hit error. { "execute": "device_add","arguments":{"driver":"host-x86_64-cpu","core-id": 0,"thread-id": 0, "socket-id":4, "node-id": 2,"id":"core1"}} {"error": {"class": "GenericError", "desc": "node-id=2 must match numa node specified with -numa option"}}
Only CLI part is completed, QMP part was merged upstream just a 2 couple day ago and hasn't been backported to qemu-kvm-rhev yet.
(In reply to Igor Mammedov from comment #12) > Only CLI part is completed, > QMP part was merged upstream just a 2 couple day ago and hasn't been > backported to qemu-kvm-rhev yet. Then should we move this bz back to POST to wait for the backport?
(In reply to Yumei Huang from comment #13) > (In reply to Igor Mammedov from comment #12) > > Only CLI part is completed, > > QMP part was merged upstream just a 2 couple day ago and hasn't been > > backported to qemu-kvm-rhev yet. > > Then should we move this bz back to POST to wait for the backport? yep, pls do so.
Moving to POST to wait for the QMP part backport.
Hi Igor, I did following tests to cover qmp part. Would you please help check if it is sufficient, thanks! 1) Boot guest with --preconfig # /usr/libexec/qemu-kvm -m 4G -smp 8,sockets=4,cores=2,threads=1 \ -numa node,nodeid=0 \ -numa node,nodeid=1 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ -numa cpu,node-id=3,socket-id=0,core-id=0,thread-id=0 \ -numa cpu,node-id=2,socket-id=0,core-id=1,thread-id=0 \ -numa cpu,node-id=1,socket-id=1,core-id=0,thread-id=0 \ -numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=0 \ --preconfig -monitor stdio \ -qmp tcp:0:4444,server,nowait 2) Check CPUs layout {'execute': 'query-hotpluggable-cpus' } {"return": [ {"props": {"core-id": 1, "thread-id": 0, "socket-id": 3}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "socket-id": 3}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "socket-id": 2}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "socket-id": 2}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 1}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 2, "socket-id": 0}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 3, "socket-id": 0}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"} ]} 3) Config CPUs numa mapping {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 0, 'core-id': 0, 'thread-id': 0, 'socket-id': 2 }} {"return": {}} {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 1, 'core-id': 1, 'thread-id': 0, 'socket-id': 2 }} {"return": {}} {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 2, 'core-id': 0, 'thread-id': 0, 'socket-id': 3 }} {"return": {}} {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 3, 'core-id': 1, 'thread-id': 0, 'socket-id': 3 }} {"return": {}} 4) Check CPUs layout again {'execute': 'query-hotpluggable-cpus' } {"return": [ {"props": {"core-id": 1, "thread-id": 0, "node-id": 3, "socket-id": 3}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 2, "socket-id": 3}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 1, "socket-id": 2}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 2}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 1}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 2, "socket-id": 0}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 3, "socket-id": 0}, "vcpus-count": 1, "type": "qemu64-x86_64-cpu"} ]} 5) Exit preconfig, guest boot up successfully, and the numa topology in hmp is expected. (qemu) exit_preconfig (qemu) info numa 4 nodes node 0 cpus: 3 4 node 0 size: 1024 MB node 0 plugged: 0 MB node 1 cpus: 2 5 node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: 1 6 node 2 size: 1024 MB node 2 plugged: 0 MB node 3 cpus: 0 7 node 3 size: 1024 MB node 3 plugged: 0 MB Besides, I noticed we can specify node id when hotplug cpus, but seems the node for each cpu is fixed once guest launch. If hotplug cpu to a node other than the initial one, will hit error. E.g. # /usr/libexec/qemu-kvm -m 4G -smp 4,sockets=4,cores=2,threads=1,maxcpus=8 \ -numa node,nodeid=0,cpus=0-1 \ -numa node,nodeid=1,cpus=2-3 \ -numa node,nodeid=2 \ -numa node,nodeid=3 \ -monitor stdio \ -cpu Haswell-noTSX-IBRS \ -qmp tcp:0:4444,server,nowait The first 4 hotpluggable cpus are mapped to node 0 once guest boot. {'execute': 'query-hotpluggable-cpus' } {"return": [ {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 3}, "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 3}, "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 2}, "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 2}, "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 1, "socket-id": 1}, "vcpus-count": 1, "qom-path": "/machine/unattached/device[4]", "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, "vcpus-count": 1, "qom-path": "/machine/unattached/device[3]", "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 0}, "vcpus-count": 1, "qom-path": "/machine/unattached/device[2]", "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 0}, "vcpus-count": 1, "qom-path": "/machine/unattached/device[0]", "type": "Haswell-noTSX-IBRS-x86_64-cpu"}]} CPU can only be hotplugged to node 0, otherwise would hit error. { "execute": "device_add","arguments":{"driver":"Haswell-noTSX-IBRS-x86_64-cpu","core-id": 0,"thread-id": 0, "socket-id":2, "node-id": 0,"id":"core0"}} {"return": {}} { "execute": "device_add","arguments":{"driver":"Haswell-noTSX-IBRS-x86_64-cpu","core-id": 1,"thread-id": 0, "socket-id":2, "node-id": 1,"id":"core1"}} {"error": {"class": "GenericError", "desc": "node-id=1 must match numa node specified with -numa option"}} Is it expected? If yes, it does not make sense to me to have node-id option for hotplug. What do you think?
(In reply to Yumei Huang from comment #21) > Hi Igor, > I did following tests to cover qmp part. Would you please help check if it > is sufficient, thanks! > > 1) Boot guest with --preconfig > > # /usr/libexec/qemu-kvm -m 4G -smp 8,sockets=4,cores=2,threads=1 \ > -numa node,nodeid=0 \ > -numa node,nodeid=1 \ > -numa node,nodeid=2 \ > -numa node,nodeid=3 \ > -numa cpu,node-id=3,socket-id=0,core-id=0,thread-id=0 \ > -numa cpu,node-id=2,socket-id=0,core-id=1,thread-id=0 \ > -numa cpu,node-id=1,socket-id=1,core-id=0,thread-id=0 \ > -numa cpu,node-id=0,socket-id=1,core-id=1,thread-id=0 \ > --preconfig -monitor stdio \ > -qmp tcp:0:4444,server,nowait > > 2) Check CPUs layout > > {'execute': 'query-hotpluggable-cpus' } > {"return": [ > {"props": {"core-id": 1, "thread-id": 0, "socket-id": 3}, "vcpus-count": 1, > "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "socket-id": 3}, "vcpus-count": 1, > "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "socket-id": 2}, "vcpus-count": 1, > "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "socket-id": 2}, "vcpus-count": 1, > "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 1}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 2, "socket-id": 0}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 3, "socket-id": 0}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"} > ]} > > 3) Config CPUs numa mapping > > {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 0, > 'core-id': 0, 'thread-id': 0, 'socket-id': 2 }} > {"return": {}} > > {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 1, > 'core-id': 1, 'thread-id': 0, 'socket-id': 2 }} > {"return": {}} > > {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 2, > 'core-id': 0, 'thread-id': 0, 'socket-id': 3 }} > {"return": {}} > > {'execute': 'set-numa-node', 'arguments': { 'type': 'cpu', 'node-id': 3, > 'core-id': 1, 'thread-id': 0, 'socket-id': 3 }} > {"return": {}} > > 4) Check CPUs layout again > > {'execute': 'query-hotpluggable-cpus' } > {"return": [ > {"props": {"core-id": 1, "thread-id": 0, "node-id": 3, "socket-id": 3}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 2, "socket-id": 3}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 1, "socket-id": 2}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 2}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 1}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 2, "socket-id": 0}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 3, "socket-id": 0}, > "vcpus-count": 1, "type": "qemu64-x86_64-cpu"} > ]} > > 5) Exit preconfig, guest boot up successfully, and the numa topology in hmp > is expected. > (qemu) exit_preconfig > (qemu) info numa > 4 nodes > node 0 cpus: 3 4 > node 0 size: 1024 MB > node 0 plugged: 0 MB > node 1 cpus: 2 5 > node 1 size: 1024 MB > node 1 plugged: 0 MB > node 2 cpus: 1 6 > node 2 size: 1024 MB > node 2 plugged: 0 MB > node 3 cpus: 0 7 > node 3 size: 1024 MB > node 3 plugged: 0 MB Above looks fine to me > Besides, I noticed we can specify node id when hotplug cpus, but seems the > node for each cpu is fixed once guest launch. If hotplug cpu to a node other > than the initial one, will hit error. > > E.g. > # /usr/libexec/qemu-kvm -m 4G -smp 4,sockets=4,cores=2,threads=1,maxcpus=8 \ > -numa node,nodeid=0,cpus=0-1 \ > -numa node,nodeid=1,cpus=2-3 \ > -numa node,nodeid=2 \ > -numa node,nodeid=3 \ > -monitor stdio \ > -cpu Haswell-noTSX-IBRS \ > -qmp tcp:0:4444,server,nowait > > The first 4 hotpluggable cpus are mapped to node 0 once guest boot. well, it wasn't specified on CLI explicitly so it defaulted to node 0 > {'execute': 'query-hotpluggable-cpus' } > {"return": [ > {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 3}, > "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 3}, > "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 2}, > "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 2}, > "vcpus-count": 1, "type": "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 1, "socket-id": 1}, > "vcpus-count": 1, "qom-path": "/machine/unattached/device[4]", "type": > "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 1, "socket-id": 1}, > "vcpus-count": 1, "qom-path": "/machine/unattached/device[3]", "type": > "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 1, "thread-id": 0, "node-id": 0, "socket-id": 0}, > "vcpus-count": 1, "qom-path": "/machine/unattached/device[2]", "type": > "Haswell-noTSX-IBRS-x86_64-cpu"}, > {"props": {"core-id": 0, "thread-id": 0, "node-id": 0, "socket-id": 0}, > "vcpus-count": 1, "qom-path": "/machine/unattached/device[0]", "type": > "Haswell-noTSX-IBRS-x86_64-cpu"}]} > > CPU can only be hotplugged to node 0, otherwise would hit error. > > { "execute": > "device_add","arguments":{"driver":"Haswell-noTSX-IBRS-x86_64-cpu","core-id": > 0,"thread-id": 0, "socket-id":2, "node-id": 0,"id":"core0"}} > {"return": {}} > > { "execute": > "device_add","arguments":{"driver":"Haswell-noTSX-IBRS-x86_64-cpu","core-id": > 1,"thread-id": 0, "socket-id":2, "node-id": 1,"id":"core1"}} > {"error": {"class": "GenericError", "desc": "node-id=1 must match numa node > specified with -numa option"}} > > > Is it expected? If yes, it does not make sense to me to have node-id option > for hotplug. What do you think? it is expected. If it were up to me node-id would be mandatory but that ship sailed long ago. Due to some libvirt versions not providing node-id on hotplug, node-id property is optional (it's QEMU would fetch it from -numa ... configuration) but if it's set it must match -numa options.
(In reply to Igor Mammedov from comment #22) > (In reply to Yumei Huang from comment #21) > [..] > it is expected. If it were up to me node-id would be mandatory but that ship > sailed long ago. > Due to some libvirt versions not providing node-id on hotplug, node-id > property is optional > (it's QEMU would fetch it from -numa ... configuration) but if it's set it > must match -numa > options. Thanks for your confirmation. It's very helpful. Would you please move this bz to ON_QA, I tested with qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8, it works as expected. Thanks.
Verify: qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8 Tested scenario in c11 and c21, works well as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3723