Bug 2026893
Summary: | VM w/ passthough GPU and 20 vCPUs fails to start when configured as Q35 machine. i440FX works. | ||
---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Gilboa Davara <gilboad> |
Component: | BLL.Virt | Assignee: | Milan Zamazal <mzamazal> |
Status: | CLOSED DUPLICATE | QA Contact: | meital avital <mavital> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.4.9.4 | CC: | ahadas, bugs, pkubica |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-11-29 15:18:51 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Virt | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Gilboa Davara
2021-11-26 11:23:58 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again. your maxcpu is 320, so most likely duplicate of bug 2023313 I assumed as much. That said, if I remove a couple of PCI-E pass-through devices (leaving only the USB or HDMI audio) I can trigger https://bugzilla.redhat.com/show_bug.cgi?id=2023313, getting the caching-mode=on error. However, if I try to pass all the PCI-E devices, I get memory allocation error instead of the caching-mode=on error. Can I somehow reduce the maxcpu value to test if this bug is indeed related to 2023313? try to reconfigure the vm to make it smaller, just for test perhaps just set to few sockets, 1 core/socket. Alternatively, you can disable CPU hot-plug with (assuming you're using x86): engine-config -s HotPlugCpuSupported='{"x86":"false","ppc":"true","s390x":"true"}' (In reply to Arik from comment #5) > Alternatively, you can disable CPU hot-plug with (assuming you're using x86): > engine-config -s > HotPlugCpuSupported='{"x86":"false","ppc":"true","s390x":"true"}' And then restart ovirt-engine Thanks all. Bug can be closed as duplicate. Reducing the number of vcpus and/or disabling hotplug solves the problem. BTW, any reason why maxcpus is hard-wired to be 8 times to vcpus count? (max 320 vs 40 vcpus in my case)? Can I change the ratio somehow? It is not hard-wired to be 8 times, setting the limit is more complex (you can look into https://github.com/oVirt/ovirt-engine/blob/master/backend/manager/modules/common/src/main/java/org/ovirt/engine/core/common/utils/VmCpuCountHelper.java if interested). Somewhat simplified, with UEFI, it's usually the highest power of 2 that fits in the cluster level limit on the maximum number of vCPUs and doesn't cause exceeding the maximum number of vCPU sockets. (In reply to Milan Zamazal from comment #9) > it's usually the highest power of 2 Sorry, not power of 2, but multiplication of the total number of threads in a single socket. And you can influence it by changing the limits in Engine configuration or by using different vCPU cores/threads configurations. We can continue with the discussion on devel in case further clarifications are needed. *** This bug has been marked as a duplicate of bug 2023313 *** After reading at the supplied code, I managed to find some odd combination (more sockets, less core, no threads) that got the maxvcpu well under 256 (only 4*smp_count). Many thanks, again, for the help. I hit this again on 4.4.10 qemu-kvm-6.1.0-5.module_el8.6.0+1040+0ae94936.x86_64 libvirt-7.10.0-1.module_el8.6.0+1046+bd8eec5e.x86_64 vdsm-4.40.100.2-1.el8.x86_64 Simillar HW (2 NUMAs), doing passtrough a GPU with multiple other devices (GPU is causing this issue) Exactly the same reproduction steps. Same behavior with number of CPUs or switching Q35 - I440FX (lowering CPUs or switching to I440FX will solve this issue) Tried qemu 6.0 without any luck 2022-01-28T09:03:08.866199Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory 2022-01-28T09:03:08.912870Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory 2022-01-28T09:03:08.912990Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: vfio 0000:af:00.1: failed to setup container for group 150: memory listener initialization failed: Region ram-node0: vfio_dma_map(0x5630dd4b9af0, 0x0, 0x80000000, 0x7f5a73e00000) = -12 (Cannot allocate memory) 2022-01-28 09:03:09.005+0000: shutting down, reason=failed LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-1-Windows-01-GPU0 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-Windows-01-GPU0/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-Windows-01-GPU0/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-Windows-01-GPU0/.config \ /usr/libexec/qemu-kvm \ -name guest=Windows-01-GPU0,debug-threads=on \ -S \ -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-Windows-01-GPU0/master-key.aes"}' \ -blockdev '{"driver":"file","filename":"/usr/share/OVMF/OVMF_CODE.secboot.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \ -blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/d5ba5e0e-339e-40bb-90d8-e34d3d158261.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \ -machine pc-q35-rhel8.4.0,usb=off,dump-guest-core=off,kernel_irqchip=split,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format \ -accel kvm \ -cpu host,migratable=on,vmx=on,kvm=off \ -m size=33554432k,slots=16,maxmem=134217728k \ -overcommit mem-lock=off \ -smp 16,maxcpus=256,sockets=16,dies=1,cores=8,threads=2 \ -object '{"qom-type":"iothread","id":"iothread1"}' \ -object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":34359738368}' \ -numa node,nodeid=0,cpus=0-255,memdev=ram-node0 \ -uuid d5ba5e0e-339e-40bb-90d8-e34d3d158261 \ -smbios type=1,manufacturer=oVirt,product=RHEL,version=8.6-1.el8,serial=d7cb7a89-958c-2af6-c6b0-fc349767d7db,uuid=d5ba5e0e-339e-40bb-90d8-e34d3d158261,family=oVirt \ -display none \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=39,server=on,wait=off \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=2022-01-28T09:03:00,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global ICH9-LPC.disable_s3=1 \ -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device intel-iommu,intremap=on,caching-mode=on,eim=on \ -device pcie-root-port,port=8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \ -device pcie-root-port,port=9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \ -device pcie-root-port,port=10,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \ -device pcie-root-port,port=11,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \ -device pcie-root-port,port=12,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \ -device pcie-root-port,port=13,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \ -device pcie-root-port,port=14,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \ -device pcie-root-port,port=15,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x7 \ -device pcie-root-port,port=16,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=17,chassis=10,id=pci.10,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=18,chassis=11,id=pci.11,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=19,chassis=12,id=pci.12,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=20,chassis=13,id=pci.13,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=21,chassis=14,id=pci.14,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=22,chassis=15,id=pci.15,bus=pcie.0,addr=0x2.0x6 \ -device pcie-root-port,port=23,chassis=16,id=pci.16,bus=pcie.0,addr=0x2.0x7 \ -device pcie-root-port,port=24,chassis=17,id=pci.17,bus=pcie.0,addr=0x3 \ -device pcie-pci-bridge,id=pci.18,bus=pci.4,addr=0x0 \ -device qemu-xhci,p2=8,p3=8,id=ua-2d5407e4-258d-4858-a667-7c8d68e6e079,bus=pci.2,addr=0x0 \ -device virtio-scsi-pci,iothread=iothread1,id=ua-c8f1426f-0f5c-4acb-8c65-f90ae8781eff,bus=pci.7,addr=0x0 \ -device virtio-serial-pci,id=ua-c43f87f4-49fd-409a-abb4-e243eaf3ac8e,max_ports=16,bus=pci.3,addr=0x0 \ -device ide-cd,bus=ide.2,id=ua-87b896ca-ac92-4645-bd7d-ef89927a6989,werror=report,rerror=report \ -blockdev '{"driver":"file","filename":"/rhev/data-center/mnt/localhost:_srv_rhv/bd9ac7ab-b2cc-4588-8fb9-da497907378c/images/adcac7da-7c0a-46a9-815a-8455c3327af0/0346ab51-49c8-450e-86e6-7bbc5f86ae99","aio":"threads","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-2-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ -device scsi-hd,bus=ua-c8f1426f-0f5c-4acb-8c65-f90ae8781eff.0,channel=0,scsi-id=0,lun=0,device_id=adcac7da-7c0a-46a9-815a-8455c3327af0,drive=libvirt-2-format,id=ua-adcac7da-7c0a-46a9-815a-8455c3327af0,bootindex=1,write-cache=on,serial=adcac7da-7c0a-46a9-815a-8455c3327af0,werror=stop,rerror=stop \ -blockdev '{"driver":"file","filename":"/rhev/data-center/mnt/localhost:_srv_store_rhv/8de83773-d704-424c-b5b3-101a671c8954/images/8e146613-a4b7-41e8-b2f2-e2fbdf5246d7/e3a5b4e8-7781-4212-b98a-9f2c99e17337","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \ -device scsi-hd,bus=ua-c8f1426f-0f5c-4acb-8c65-f90ae8781eff.0,channel=0,scsi-id=0,lun=1,device_id=8e146613-a4b7-41e8-b2f2-e2fbdf5246d7,drive=libvirt-1-format,id=ua-8e146613-a4b7-41e8-b2f2-e2fbdf5246d7,write-cache=on,serial=8e146613-a4b7-41e8-b2f2-e2fbdf5246d7,werror=stop,rerror=stop \ -netdev tap,fds=41:43:44:45,id=hostua-daac2d8b-38cf-4ac0-b48c-b6fedebe2944,vhost=on,vhostfds=46:47:48:49 \ -device virtio-net-pci,mq=on,vectors=10,host_mtu=1500,netdev=hostua-daac2d8b-38cf-4ac0-b48c-b6fedebe2944,id=ua-daac2d8b-38cf-4ac0-b48c-b6fedebe2944,mac=56:6f:ca:5e:00:00,bus=pci.1,addr=0x0 \ -chardev socket,id=charchannel0,fd=50,server=on,wait=off \ -device virtserialport,bus=ua-c43f87f4-49fd-409a-abb4-e243eaf3ac8e.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 \ -chardev socket,id=charchannel1,fd=51,server=on,wait=off \ -device virtserialport,bus=ua-c43f87f4-49fd-409a-abb4-e243eaf3ac8e.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 \ -audiodev '{"id":"audio1","driver":"none"}' \ -device vfio-pci,host=0000:04:00.0,id=ua-3f3c1a1c-4b88-47b5-9119-f4056a96e93c,bus=pci.5,addr=0x0 \ -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0 \ -device vfio-pci,host=0000:3c:00.0,id=ua-5c49a9c0-8557-447a-9189-97bb3952a062,bus=pci.10,addr=0x0 \ -device vfio-pci,host=0000:af:00.0,id=ua-d3af7b38-0840-425b-8fcd-773e0d7dd03c,bus=pci.8,addr=0x0 \ -device virtio-balloon-pci,id=ua-71e3afb3-6c29-4201-87a3-5ed93fd873b1,bus=pci.9,addr=0x0 \ -object '{"qom-type":"rng-random","id":"objua-e8be8bc4-167f-4223-8628-b218a38c9ead","filename":"/dev/urandom"}' \ -device virtio-rng-pci,rng=objua-e8be8bc4-167f-4223-8628-b218a38c9ead,id=ua-e8be8bc4-167f-4223-8628-b218a38c9ead,bus=pci.11,addr=0x0 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on 2022-01-28T09:03:08.866199Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory 2022-01-28T09:03:08.912870Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: VFIO_MAP_DMA failed: Cannot allocate memory 2022-01-28T09:03:08.912990Z qemu-kvm: -device vfio-pci,host=0000:af:00.1,id=ua-582527a5-a6ea-475a-9c86-816f973ea027,bus=pci.6,addr=0x0: vfio 0000:af:00.1: failed to setup container for group 150: memory listener initialization failed: Region ram-node0: vfio_dma_map(0x5630dd4b9af0, 0x0, 0x80000000, 0x7f5a73e00000) = -12 (Cannot allocate memory) 2022-01-28 09:03:09.005+0000: shutting down, reason=failed Note that this time caching-mode=on is set in the QEMU command line in Comment 13, so we are apparently facing a different, although possibly related, issue here than in the supposed duplicate. Petr, can you please file a separate issue? Hi Arik, sure, filled a new bug here: https://bugzilla.redhat.com/show_bug.cgi?id=2048429 |