Red Hat Bugzilla – Bug 1441550
[ovmf] It's very slow to boot guest with multiple virtio-net-pci with q35+ovmf [TestOnly]
Last modified: 2018-04-10 12:30:14 EDT
Description of problem: when boot with 28 virtio-net-pci on q35+ovmf, the guest boot is too slow (about 30min), and if boot to system with mq and vhost,it will hang in os. Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.8.0-6.el7.x86_64 kernel-3.10.0-643.el7.x86_64 win2016 virtio-win-prewhql-135 How reproducible: 100% Steps to Reproduce: 1. boot with 28 virtio-net-pci /usr/libexec/qemu-kvm -name 135NICW10S64CQD -enable-kvm -m 12G -smp 4,cores=4 -uuid 83db3d0b-5bb0-429b-94c1-bc9befdf45b1 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/135NICW10S64CQD,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -drive file=135NICW10S64CHW,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -vnc 0.0.0.0:9 -vga std -M q35 -drive file=135NICW10S64CHW_ovmf/OVMF_CODE.secboot.fd,if=pflash,format=raw,unit=0,readonly=on -drive file=135NICW10S64CHW_ovmf/OVMF_VARS.fd,if=pflash,format=raw,unit=1 -qmp tcp:0:4449,server,nowait -monitor stdio \ -device ioh3420,bus=pcie.0,id=root1.0,slot=1 -netdev tap,id=hostnet1,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet1,bus=root1.0,id=virtio-net-pci1,mac=4e:63:28:bc:b1:01 \ -device ioh3420,bus=pcie.0,id=root2.0,slot=2 -netdev tap,id=hostnet2,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet2,bus=root2.0,id=virtio-net-pci2,mac=4e:63:28:bc:b1:02 \ -device ioh3420,bus=pcie.0,id=root3.0,slot=3 -netdev tap,id=hostnet3,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet3,bus=root3.0,id=virtio-net-pci3,mac=4e:63:28:bc:b1:03 \ -device ioh3420,bus=pcie.0,id=root4.0,slot=4 -netdev tap,id=hostnet4,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet4,bus=root4.0,id=virtio-net-pci4,mac=4e:63:28:bc:b1:04 \ -device ioh3420,bus=pcie.0,id=root5.0,slot=5 -netdev tap,id=hostnet5,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet5,bus=root5.0,id=virtio-net-pci5,mac=4e:63:28:bc:b1:05 \ -device ioh3420,bus=pcie.0,id=root6.0,slot=6 -netdev tap,id=hostnet6,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet6,bus=root6.0,id=virtio-net-pci6,mac=4e:63:28:bc:b1:06 \ -device ioh3420,bus=pcie.0,id=root7.0,slot=7 -netdev tap,id=hostnet7,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet7,bus=root7.0,id=virtio-net-pci7,mac=4e:63:28:bc:b1:07 \ -device ioh3420,bus=pcie.0,id=root8.0,slot=8 -netdev tap,id=hostnet8,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet8,bus=root8.0,id=virtio-net-pci8,mac=4e:63:28:bc:b1:08 \ -device ioh3420,bus=pcie.0,id=root9.0,slot=9 -netdev tap,id=hostnet9,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet9,bus=root9.0,id=virtio-net-pci9,mac=4e:63:28:bc:b1:09 \ -device ioh3420,bus=pcie.0,id=root10.0,slot=10 -netdev tap,id=hostnet10,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet10,bus=root10.0,id=virtio-net-pci10,mac=4e:63:28:bc:b1:0A \ -device ioh3420,bus=pcie.0,id=root11.0,slot=11 -netdev tap,id=hostnet11,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet11,bus=root11.0,id=virtio-net-pci11,mac=4e:63:28:bc:b1:0B \ -device ioh3420,bus=pcie.0,id=root12.0,slot=12 -netdev tap,id=hostnet12,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet12,bus=root12.0,id=virtio-net-pci12,mac=4e:63:28:bc:b1:0C \ -device ioh3420,bus=pcie.0,id=root13.0,slot=13 -netdev tap,id=hostnet13,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet13,bus=root13.0,id=virtio-net-pci13,mac=4e:63:28:bc:b1:0D \ -device ioh3420,bus=pcie.0,id=root14.0,slot=14 -netdev tap,id=hostnet14,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet14,bus=root14.0,id=virtio-net-pci14,mac=4e:63:28:bc:b1:0E \ -device ioh3420,bus=pcie.0,id=root15.0,slot=15 -netdev tap,id=hostnet15,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet15,bus=root15.0,id=virtio-net-pci15,mac=4e:63:28:bc:b1:0F \ -device ioh3420,bus=pcie.0,id=root16.0,slot=16 -netdev tap,id=hostnet16,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet16,bus=root16.0,id=virtio-net-pci16,mac=4e:63:28:bc:b1:10 \ -device ioh3420,bus=pcie.0,id=root17.0,slot=17 -netdev tap,id=hostnet17,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet17,bus=root17.0,id=virtio-net-pci17,mac=4e:63:28:bc:b1:11 \ -device ioh3420,bus=pcie.0,id=root18.0,slot=18 -netdev tap,id=hostnet18,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet18,bus=root18.0,id=virtio-net-pci18,mac=4e:63:28:bc:b1:12 \ -device ioh3420,bus=pcie.0,id=root19.0,slot=19 -netdev tap,id=hostnet19,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet19,bus=root19.0,id=virtio-net-pci19,mac=4e:63:28:bc:b1:13 \ -device ioh3420,bus=pcie.0,id=root20.0,slot=20 -netdev tap,id=hostnet20,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet20,bus=root20.0,id=virtio-net-pci20,mac=4e:63:28:bc:b1:14 \ -device ioh3420,bus=pcie.0,id=root21.0,slot=21 -netdev tap,id=hostnet21,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet21,bus=root21.0,id=virtio-net-pci21,mac=4e:63:28:bc:b1:15 \ -device ioh3420,bus=pcie.0,id=root22.0,slot=22 -netdev tap,id=hostnet22,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet22,bus=root22.0,id=virtio-net-pci22,mac=4e:63:28:bc:b1:16 \ -device ioh3420,bus=pcie.0,id=root23.0,slot=23 -netdev tap,id=hostnet23,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet23,bus=root23.0,id=virtio-net-pci23,mac=4e:63:28:bc:b1:17 \ -device ioh3420,bus=pcie.0,id=root24.0,slot=24 -netdev tap,id=hostnet24,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet24,bus=root24.0,id=virtio-net-pci24,mac=4e:63:28:bc:b1:18 \ -device ioh3420,bus=pcie.0,id=root25.0,slot=25 -netdev tap,id=hostnet25,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet25,bus=root25.0,id=virtio-net-pci25,mac=4e:63:28:bc:b1:19 \ -device ioh3420,bus=pcie.0,id=root26.0,slot=26 -netdev tap,id=hostnet26,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet26,bus=root26.0,id=virtio-net-pci26,mac=4e:63:28:bc:b1:1A \ -device ioh3420,bus=pcie.0,id=root27.0,slot=27 -netdev tap,id=hostnet27,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet27,bus=root27.0,id=virtio-net-pci27,mac=4e:63:28:bc:b1:1B \ -device ioh3420,bus=pcie.0,id=root28.0,slot=28 -netdev tap,id=hostnet28,script=/etc/qemu-ifup,vhost=on,queues=4 -device virtio-net-pci,mq=on,vectors=10,netdev=hostnet28,bus=root28.0,id=virtio-net-pci28,mac=4e:63:28:bc:b1:1C \ 2. 3. Actual results: boot to guest slowly (about 20 min) Expected results: time of boot is same as "-M pc", Additional info: 1 boot with pc+seabios is normal. 2 if boot with 10 virtio-net-pci with mq and vhost, it can boot to guest normally, not too slowly
Try witch pc+seabios , boot time is 1-2 min Try with q35+seabios , boot time is 2 min, not very slow. Try with q35+ovmf, it boot to system more than 10 min , so change the component to "ovmf". And boot with rhel guest (q35 + ovmf), is very slow too. Thanks Yu Wang
I see several problems with this bug report. (1) When reporting OVMF bugs, please always capture the OVMF debug log, and attach it to the report. (2) On the QEMU command line, you are using the -boot order option with OVMF. That will not work. You must use the ",bootindex=..." device property, for selecting boot devices. Please refer to bug 1323085 comment 10 for background. This is relevant because, without any bootindex properties, OVMF might very well be attempting to PXE-boot all of the virtio-net devices in sequence. (I can't say for sure because you didn't attach the OVMF debug log.) If you have no UEFI PXE boot set up on your ethernet subnet, then each PXE boot attempt will time out in turn. So please replace -boot order=cd,menu=on \ ... -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \ with -boot menu=on \ ... -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 \ and repeat the test. (3) You are using 28 PCI Express Root Ports (ioh3420 device). That's too many. In addition to the above issue, you are likely running out of IO space during PCI resource assignment. (I can't say for sure because you didn't attach the OVMF debug log.) At present, you can't use more than 9 root ports. Please refer to bug 1348798 comment 8, to see where the current limit of 9 comes from, and to bug 1434740 (which in turn depends on bug 1344299), to see how we are going to address this issue. Summary: - please repeat the test after addressing both (2) and (3) -- use bootindex device properties rather than "-boot order" and "-boot once" switches, plus use no more than 9 PCI Express ports, - if the issue persists, please attach the OVMF debug log (for the updated test run).
Additionally, the ovmf N/V/R is not identified in the bug report.
Hi Laszlo, Thanks for your detailed explanation :) I re-tested with 9 pci express ports and modify the boot order according to your comment, it could boot normal and faster than 28 pcie ports. And I also attach the ovmf.log for 9 pcie and 28 pice. You can refer to the attachment. Many thanks Yu Wang
Created attachment 1273795 [details] ovmf-log-9-28
Thank you for the logs. $ egrep 'Out of Resources|was rejected' ovmf-28.log > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|02|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|03|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|04|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|05|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|06|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|07|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|08|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|09|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0A|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0B|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0C|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0D|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0E|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|0F|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|10|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|11|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|12|00] was rejected due to resource confliction. > PciBus: HostBridge->NotifyPhase(AllocateResources) - Out of Resources > PciBus: [00|13|00] was rejected due to resource confliction. $ egrep 'Out of Resources|was rejected' ovmf-9.log > [nothing] So this confirms point (3) in comment 3. (In reply to Yu Wang from comment #6) > I re-tested with 9 pci express ports and modify the boot order according > to your comment, it could boot normal and faster than 28 pcie ports. And this confirms point (2) in comment 3. Hence, this as a duplicate of 1434740. When that bug is fixed (and the bootindex properties are still set as described in comment 3 point (2)), your use case should start working. I'm marking this BZ as TestOnly, and as dependent on bug 1434740. If you think it is unnecessary to retest this particular scenario after bug 1434740 is fixed, we can close this one immediately as a duplicate.
(In reply to Laszlo Ersek from comment #8) > your use case should start working. (By that I mean of course, "after you replace the ioh3420 root ports with such generic root ports that don't want IO space".)
Created attachment 1276530 [details] ovmf log for virtio-blk-pci Hit the same issue when boot guest with multiple virtio-blk-pci with q35+ovmf, boot time is 7mins, sometimes 4 mins,but hit the same debug info like comment 8.
I've come across another (unrelated) use case where many NICs are used. In this specific example, 31 virtio-net-pci devices are attached to functions 0 of all >=1 slots of a legacy PCI bridge (so there are IO space exhaustion problems). Interestingly, the boot isn't exactly fast in this case either -- it takes approx. 1 min 50 sec on my laptop to reach the UEFI shell on the very first boot (i.e. starting on a pristine varstore), and approx. 1 min 25 sec on subsequent boots. I dug into it, and one significant boot time contributor is the writing of non-volatile UEFI variables. There are two call sites influenced by the number of NICs: (1) The Ip4Config2InitInstance() function in "MdeModulePkg/Universal/Network/Ip4Dxe/Ip4Config2Impl.c" looks for persistent IPv4 settings (such as static IP vs. DHCP bringup) whenever each NIC is bound. First Ip4Config2ReadConfigData() is called to read any existent settings for the NIC, and if none are found, Ip4Config2WriteConfigData() is called to write out the initial settings. For each NIC, the binary settings block is saved in the gEfiIp4Config2ProtocolGuid variable namespace, using the hex-rendered MAC address of the NIC as variable name. Writing out such a new variable takes approx. 0.36 seconds on my laptop. These variable writes have an effect only the first boot, because after these variables are created, reading them back on subsequent boots is very fast. (Note that the 31*0.36 = 11.16 seconds don't add up to the ~25 seconds difference I described in the second paragraph above. That's because the first boot (on a pristine varstore) sets other (unrelated) variables as well that are not rewritten on subsequent boots.) (2) The second contributor is the EfiBootManagerRefreshAllBootOption() function call in "OvmfPkg/Library/PlatformBootManagerLib/BdsPlatform.c". For each NIC, a new Boot#### variable is set (costing approx. 0.38 seconds on my laptop) and the BootOrder variable is updated (costing approx. 0.56 seconds on my laptop). This adds 0.94 seconds per NIC to the boot time. (Pflash writes are costly, just like on real hardware.) Eliminating the EfiBootManagerRefreshAllBootOption() call is not feasible (it is required for the correct operation of OVMF's fw_cfg boot order processing); the call has to happen on every boot. Hooking some platform-specific filtering into it might be theoretically possible, but it looks like a hard upstream sell (EfiBootManagerRefreshAllBootOption() is part of UefiBootManagerLib, which was written in the near past for strict UEFI spec conformance). So this is just to say that, even after no IO space issues remain (which is the root cause of this BZ), the boot with very many NICs won't be lightning fast.
Upstream patches: 1 8844f15d33c7 MdePkg/IndustryStandard/Pci23: add vendor-specific capability header 2 bdf73b57f283 OvmfPkg/IndustryStandard: define PCI Capabilities for QEMU's PCI Bridges 3 91231fc2ff2b OvmfPkg/PciHotPlugInitDxe: clean up protocol usage comment 4 c18ac9fbcc71 OvmfPkg/PciHotPlugInitDxe: clean up addr. range for non-prefetchable MMIO 5 a980324709b1 OvmfPkg/PciHotPlugInitDxe: generalize RESOURCE_PADDING composition 6 4776d5cb3abf OvmfPkg/PciHotPlugInitDxe: add helper functions for setting up paddings 7 fe4049471bdf OvmfPkg/PciHotPlugInitDxe: translate QEMU's resource reservation hints
Note to virt-QE: please see <https://bugzilla.redhat.com/show_bug.cgi?id=1434747#c20>.
Created attachment 1369293 [details] ovmf log of verified
Hello Jing, sorry about the delay, I was on PTO. (1) The command line in comment 14 looks good, and the OVMF log in comment 15 is fine as well. (2) The issue captured in this bugzilla entry was that IO space was exhausted during PCI enumeration / resource allocation, due to the IO space demand (reservation) by the many "ioh3420" PCI Express root ports. The issue is solved by: (2a) using "pcie-root-port" devices instead of "ioh3420"; (2b) passing the "io-reserve=0" property to each "pcie-root-port" device (from bug 1344299 and bug 1434740) -- this way each PCI Express root port will require (reserve) no IO space at all; (2c) placing a PCI Express device that has no IO BARs (= needs no IO ports) on each root port. The "virtio-net-pci" device behaves like this (as do the other virtio devices too). So this BZ can be set to VERIFIED. Thank you!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0902