Bug 2002283 - Make NumOfPciExpressPorts configurable via engine-config
Summary: Make NumOfPciExpressPorts configurable via engine-config
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.4.7
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ovirt-4.5.0
: 4.5.0
Assignee: Arik
QA Contact: Tamir
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-08 12:39 UTC by Raul Aldaz
Modified: 2022-08-04 14:19 UTC (History)
12 users (show)

Fixed In Version: ovirt-engine-4.5.0
Doc Type: Enhancement
Doc Text:
With this release, it is now possible to set the number of PCI Express ports for virtual machines by setting the NumOfPciExpressPorts configuration using engine-config.
Clone Of:
Environment:
Last Closed: 2022-05-26 16:23:07 UTC
oVirt Team: Virt
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-43408 0 None None None 2021-09-08 12:40:04 UTC
Red Hat Knowledge Base (Solution) 6317151 0 None None None 2021-09-08 12:42:53 UTC
Red Hat Product Errata RHSA-2022:4711 0 None None None 2022-05-26 16:23:16 UTC
oVirt gerrit 118004 0 master MERGED core: add NumOfPciExpressPorts to engine-config 2021-12-12 20:36:15 UTC

Description Raul Aldaz 2021-09-08 12:39:00 UTC
Description of problem:

Hot-plugging more than 10 Virtio disks with Q35 VM type fails with "internal error: No more available PCI slots"

Version-Release number of selected component (if applicable):

ovirt-host-4.4.7-1.el8ev.x86_64
vdsm-4.40.70.6-1.el8ev.x86_64
libvirt-client-7.0.0-14.1.module+el8.4.0+11095+d46acebf.x86_64
libvirt-7.0.0-14.1.module+el8.4.0+11095+d46acebf.x86_64



How reproducible:

Always

Steps to Reproduce:
1. Create a VM using Q35 chipset
2. Create, attach and activate more than ~10 Virtio disks

Actual results:

Last hot plug fails with VDSM error log on host that is running VM:

2021-09-07 12:06:04,040+0200 INFO  (jsonrpc/4) [api.virt] FINISH hotplugDisk return={'status': {'code': 45, 'message': 'internal error: No more available PCI slots'}} from=::ffff:<Manager IP>,55406, flow_id=<Correlation UUID>, vmId=<Affected VM UUID> (api:54)



Expected results:

Disk attached and active as previous ones.

Additional info:

If disk is attached and activated while VM is powered off, it is available correctly after boot.

Comment 2 Michal Skrivanek 2021-09-08 17:07:50 UTC
please attach lspci from inside the guest (right before the failure, or after)
and virsh -r dumpxml <vm> from the host

Thanks

Comment 5 Milan Zamazal 2021-09-10 18:45:33 UTC
There is NumOfPciExpressPorts config value, set to 16 by default. According to BZ 1527882 (and Engine source code), it is the total number of PCIe slots, for all the PCIe devices, whether present on start or hot plugged later. It's not written there why 16 was selected and whether it's safe to increase the value. There is also a separate check on the number of PCI devices, counting them differently and limiting the number to devices.maxPciDevices osinfo value (26 by default on x86_64).

Comment 6 Arik 2021-10-04 12:00:36 UTC
Shmuel, can you please make a quick check to see if increasing NumOfPciExpressPorts and devices.maxPciDevices will enable plugging more VIRTIO disks?

Comment 7 Arik 2021-10-10 12:48:52 UTC
Nisim, could you please check that by increasing the config value NumOfPciExpressPorts and devices.maxPciDevices we are able to hot plug more disks to the VM?

Comment 8 Nisim Simsolo 2021-10-11 14:33:02 UTC
(In reply to Arik from comment #7)
- This issue is reproducible when attaching more than 10 VirtIO disks.
- After increasing NumOfPciExpressPorts value, it is possible to attach more disks to the VM.

Comment 9 Arik 2021-10-11 14:58:24 UTC
Thanks Nisim

No matter what is the limit we set there will probably be users that would argue this limit doesn't fit their use case
As long as it is configurable and the default works for commons flows (one to, let's say, five disks), I think we should keep it as is

Comment 15 Paolo Bonzini 2021-11-17 14:16:40 UTC
Arik,

can you provide a sample QEMU command line with NumOfPciExpressPorts=20? This can help me find the maximum valid value.

Comment 16 Arik 2021-11-17 15:23:55 UTC
Sure:

2021-11-17 15:15:46.882+0000: starting up libvirt version: 7.6.0, package: 6.module+el8.5.0+13051+7ddbe958 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2021-10-25-07:42:10, ), qemu version: 6.0.0qemu-kvm-6.0.0-33.module+el8.5.0+13041+05be2dc6, kernel: 4.18.0-348.el8.x86_64, hostname: ocelot03.qa.lab.tlv.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-5-pci20 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-pci20/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-pci20/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-pci20/.config \
/usr/libexec/qemu-kvm \
-name guest=pci20,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-pci20/master-key.aes"}' \
-machine pc-q35-rhel8.4.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu Skylake-Server,hle=off,rtm=off,mpx=off \
-m size=1048576k,slots=16,maxmem=4194304k \
-overcommit mem-lock=off \
-smp 1,maxcpus=16,sockets=16,dies=1,cores=1,threads=1 \
-object '{"qom-type":"iothread","id":"iothread1"}' \
-object '{"qom-type":"memory-backend-ram","id":"ram-node0","size":1073741824}' \
-numa node,nodeid=0,cpus=0-15,memdev=ram-node0 \
-uuid 680cc593-e1e9-4822-8678-1eacf810aa07 \
-smbios 'type=1,manufacturer=Red Hat,product=RHEL,version=8.5-0.8.el8,serial=00000000-0000-0000-0000-ac1f6b57af52,uuid=680cc593-e1e9-4822-8678-1eacf810aa07,sku=8.4.0,family=RHV' \
-smbios 'type=2,manufacturer=Red Hat,product=RHEL-AV' \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=41,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=2021-11-17T15:15:46,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot strict=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \
-device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \
-device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \
-device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=0x17,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
-device pcie-root-port,port=0x18,chassis=9,id=pci.9,bus=pcie.0,multifunction=on,addr=0x3 \
-device pcie-root-port,port=0x19,chassis=10,id=pci.10,bus=pcie.0,addr=0x3.0x1 \
-device pcie-root-port,port=0x1a,chassis=11,id=pci.11,bus=pcie.0,addr=0x3.0x2 \
-device pcie-root-port,port=0x1b,chassis=12,id=pci.12,bus=pcie.0,addr=0x3.0x3 \
-device pcie-root-port,port=0x1c,chassis=13,id=pci.13,bus=pcie.0,addr=0x3.0x4 \
-device pcie-root-port,port=0x1d,chassis=14,id=pci.14,bus=pcie.0,addr=0x3.0x5 \
-device pcie-root-port,port=0x1e,chassis=15,id=pci.15,bus=pcie.0,addr=0x3.0x6 \
-device pcie-root-port,port=0x1f,chassis=16,id=pci.16,bus=pcie.0,addr=0x3.0x7 \
-device pcie-root-port,port=0x20,chassis=17,id=pci.17,bus=pcie.0,multifunction=on,addr=0x4 \
-device pcie-root-port,port=0x21,chassis=18,id=pci.18,bus=pcie.0,addr=0x4.0x1 \
-device pcie-root-port,port=0x22,chassis=19,id=pci.19,bus=pcie.0,addr=0x4.0x2 \
-device pcie-root-port,port=0x23,chassis=20,id=pci.20,bus=pcie.0,addr=0x4.0x3 \
-device qemu-xhci,p2=8,p3=8,id=ua-c00c051e-9145-4f45-94d3-7e78d4a26334,bus=pci.4,addr=0x0 \
-device virtio-scsi-pci,iothread=iothread1,id=ua-3fe54696-6708-4d09-993c-4e50dd234016,bus=pci.3,addr=0x0 \
-device virtio-serial-pci,id=ua-1da13c35-6d72-470a-968c-175bb8a34c79,max_ports=16,bus=pci.2,addr=0x0 \
-device ide-cd,bus=ide.2,id=ua-a2744b63-c826-40f0-b1f6-934f5779a7ec,werror=report,rerror=report \
-blockdev '{"driver":"file","filename":"/rhev/data-center/mnt/3par-nfs-vfs1.scl.lab.tlv.redhat.com:_vfs1_vfs1_rhv_compute_compute-he-5_nfs__0/3532b1ca-f351-4034-9626-118b15426d0e/images/07f0e505-c56a-4402-b329-e39d7057cb62/3d6d58f1-54bb-4cf9-b023-81b3cca6396f","aio":"threads","node-name":"libvirt-3-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-3-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-3-storage","backing":null}' \
-blockdev '{"driver":"file","filename":"/rhev/data-center/mnt/3par-nfs-vfs1.scl.lab.tlv.redhat.com:_vfs1_vfs1_rhv_compute_compute-he-5_nfs__0/3532b1ca-f351-4034-9626-118b15426d0e/images/07f0e505-c56a-4402-b329-e39d7057cb62/e408ef92-40b4-4c82-9322-a9d2109aeca0","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-3-format"}' \
-device virtio-blk-pci,iothread=iothread1,bus=pci.5,addr=0x0,drive=libvirt-1-format,id=ua-07f0e505-c56a-4402-b329-e39d7057cb62,bootindex=1,write-cache=on,serial=07f0e505-c56a-4402-b329-e39d7057cb62,werror=stop,rerror=stop \
-netdev tap,fd=46,id=hostua-1578061c-f5cf-4d3c-a654-39ace2450ffd,vhost=on,vhostfd=47 \
-device virtio-net-pci,host_mtu=1500,netdev=hostua-1578061c-f5cf-4d3c-a654-39ace2450ffd,id=ua-1578061c-f5cf-4d3c-a654-39ace2450ffd,mac=00:1a:4a:16:10:3c,bus=pci.1,addr=0x0 \
-chardev socket,id=charchannel0,fd=48,server=on,wait=off \
-device virtserialport,bus=ua-1da13c35-6d72-470a-968c-175bb8a34c79.0,nr=1,chardev=charchannel0,id=channel0,name=ovirt-guest-agent.0 \
-chardev socket,id=charchannel1,fd=49,server=on,wait=off \
-device virtserialport,bus=ua-1da13c35-6d72-470a-968c-175bb8a34c79.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0 \
-chardev spicevmc,id=charchannel2,name=vdagent \
-device virtserialport,bus=ua-1da13c35-6d72-470a-968c-175bb8a34c79.0,nr=3,chardev=charchannel2,id=channel2,name=com.redhat.spice.0 \
-device usb-tablet,id=input0,bus=ua-c00c051e-9145-4f45-94d3-7e78d4a26334.0,port=1 \
-audiodev id=audio1,driver=spice \
-object '{"qom-type":"tls-creds-x509","id":"vnc-tls-creds0","dir":"/etc/pki/vdsm/libvirt-vnc","endpoint":"server","verify-peer":false}' \
-vnc 10.35.30.3:0,password=on,tls-creds=vnc-tls-creds0,audiodev=audio1 \
-k en-us \
-spice port=5901,tls-port=5902,addr=10.35.30.3,x509-dir=/etc/pki/vdsm/libvirt-spice,tls-channel=main,tls-channel=display,tls-channel=inputs,tls-channel=cursor,tls-channel=playback,tls-channel=record,tls-channel=smartcard,tls-channel=usbredir,seamless-migration=on \
-device qxl-vga,id=ua-e18787df-16f6-4996-8874-1176a78fe33b,ram_size=67108864,vram_size=8388608,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pcie.0,addr=0x1 \
-device ich9-intel-hda,id=ua-566ee7fe-a3fa-4f01-b5bb-02659f38a31c,bus=pcie.0,addr=0x1b \
-device hda-duplex,id=ua-566ee7fe-a3fa-4f01-b5bb-02659f38a31c-codec0,bus=ua-566ee7fe-a3fa-4f01-b5bb-02659f38a31c.0,cad=0,audiodev=audio1 \
-device virtio-balloon-pci,id=ua-60292f11-4ce1-432f-9981-f476558b729c,bus=pci.6,addr=0x0 \
-object '{"qom-type":"rng-random","id":"objua-08c9c100-a943-43ea-92f3-971c7975d876","filename":"/dev/urandom"}' \
-device virtio-rng-pci,rng=objua-08c9c100-a943-43ea-92f3-971c7975d876,id=ua-08c9c100-a943-43ea-92f3-971c7975d876,bus=pci.7,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Comment 17 Paolo Bonzini 2021-11-17 16:43:03 UTC
Based on this, I don't think there's any issue with adding more PCIe root ports. In fact, it can be used to bypass the limit of 26 devices (because even though _the ports_ are on a multifunction PCI device, the devices are not multifunctions and they can be hotplugged individually).

Comment 23 Tamir 2022-04-19 13:50:25 UTC
Verified on RHV 4.5.0-6.

Env:
    - Engine instance with RHV 4.5.0-6 (ovirt-engine-4.5.0.2-0.7.el8ev) and RHEL 8.6 installed.
    - 3 hosts with RHV 4.5.0-6 and RHEL 8.6 and with vdsm-4.50.0.12-1.el8ev.

Steps:

In Admin Portal:
1. Create a 4.7 data center and a 4.7 cluster.
2. Install the hosts and create a new NFS storage domain.
3. Create an RHEL VM with a 10GB disk.
4. Run the VM.
5. Create 9 1GB virtio disks and attach them to the VM.
6. Create and attach another 1GB virtio disk.
7. Run the command "virsh -r dumpxml Hotplug_disks | grep pci" in the host the VM is running on.
8. Shut down the VM.
9. Run the command "engine-config -g NumOfPciExpressPorts" in the engine.
10. Run the command "engine-config -s NumOfPciExpressPorts=20" in the engine.
11. Restart the engine.
12. Run the VM.
13. Add and attach 4 more 1GB virtio disks to the VM.
14. Create and attach another 1GB virtio disk.
15. Run the command "engine-config -g NumOfPciExpressPorts" in the engine. 

Results (As Expected):
1. The 4.7 data center and the 4.7 cluster were created.
2. The hosts were installed and the NFS storage domain was created.
3. The VM was created.
4. The VM is running.
5. The disks were created and attached to the VM.
6. The disk won't activate and the error "libvirt.libvirtError: internal error: No more available PCI slots" is displayed in the vdsm.log.
7. The max controller index is 16.
8. The VM is down.
9. The command's result is: "NumOfPciExpressPorts: 16 version: general"
10. The NumOfPciExpressPorts property in the engine is set to 20.
11. The engine has restarted.
12. The VM is up.
13. The disks were hotplugged to the VM.
14. The disk won't activate and the error "libvirt.libvirtError: internal error: No more available PCI slots" is displayed in the vdsm.log.
15. The max controller index is 20.

Comment 28 errata-xmlrpc 2022-05-26 16:23:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: RHV Manager (ovirt-engine) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4711

Comment 29 meital avital 2022-08-04 14:19:00 UTC
Due to QE capacity, we are not going to cover this issue in our automation


Note You need to log in before you can comment on or make changes to this bug.