Bug 1716356 - [RHOS 15] Hot-plugging more than a single network interface with 'q35' machine type fails with "libvirt.libvirtError: internal error: No more available PCI slots"
Summary: [RHOS 15] Hot-plugging more than a single network interface with 'q35' machin...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Kashyap Chamarthy
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks: 1581414 1717290 1821559 1905198 1944607
TreeView+ depends on / blocked
 
Reported: 2019-06-03 10:15 UTC by Bernard Cafarelli
Modified: 2023-03-21 19:20 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1717290 (view as bug list)
Environment:
Last Closed: 2020-11-04 19:53:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1831701 0 None None None 2019-06-05 06:58:19 UTC
OpenStack gerrit 663261 0 'None' MERGED Add new role parameter NovaLibvirtNumPciePorts 2021-01-29 10:02:18 UTC
Red Hat Issue Tracker OSP-1550 0 None None None 2023-03-21 19:20:32 UTC

Description Bernard Cafarelli 2019-06-03 10:15:08 UTC
tempest.api.compute.servers.test_attach_interfaces.AttachInterfacesTestJSON.test_create_list_show_delete_interfaces_by_network_port fails with:
tempest.lib.exceptions.ServerFault: Got server fault
Details: Failed to attach network adapter device to 483cfa3d-2af5-4a4e-9296-e4204b59fbd7

Seen in a 1 controller 1 compute ML2/OVS with VXLAN tunnels deployment

Digging in nova logs shows:
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] attaching network adapter failed.: libvirt.libvirtError: internal error: No more available PCI slots
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] Traceback (most recent call last):
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1761, in attach_interface
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     guest.attach_device(cfg, persistent=True, live=live)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     self._domain.attachDeviceFlags(device_xml, flags=flags)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     rv = execute(f, *args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     six.reraise(c, e, tb)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     raise value
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     rv = meth(*args, **kwargs)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 605, in attachDeviceFlags
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2019-05-31 21:18:22.391 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] libvirt.libvirtError: internal error: No more available PCI slots

Comment 2 Matthew Booth 2019-06-04 13:36:40 UTC
2019-05-31 21:18:09.638 [controller-0/N-API] 20 DEBUG nova.api.openstack.wsgi [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] Action: 'create', calling method: <bound method InterfaceAttachmentController.create of <nova.api.openstack.compute.attach_interfaces.InterfaceAttachmentController object at 0x7f286c940c18>>, body: {"interfaceAttachment": {"net_id": "5605723a-fa70-401b-b4b8-5724bc6a350c"}} _process_stack /usr/lib/python3.6/site-packages/nova/api/openstack/wsgi.py:520

2019-05-31 21:18:09.640 [controller-0/N-API] 20 DEBUG nova.compute.api [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] Fetching instance by UUID get /usr/lib/python3.6/site-packages/nova/compute/api.py:2633

2019-05-31 21:18:22.368 [compute-0/N-CPU] 7 DEBUG nova.virt.libvirt.guest [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] attach device xml: <interface type="bridge">
                                          <mac address="fa:16:3e:54:10:e5"/>
                                          <model type="virtio"/>
                                          <driver name="vhost" rx_queue_size="512"/>
                                          <source bridge="qbre4872d18-3f"/>
                                          <mtu size="1450"/>
                                          <target dev="tape4872d18-3f"/>
                                        </interface>
                                         attach_device /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:305

2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [req-d8c533c7-81ea-4fcd-8713-8c97b8e6738d 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] attaching network adapter failed.: libvirt.libvirtError: internal error: No more available PCI slots
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] Traceback (most recent call last):
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1761, in attach_interface
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     guest.attach_device(cfg, persistent=True, live=live)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 306, in attach_device
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     self._domain.attachDeviceFlags(device_xml, flags=flags)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     result = proxy_call(self._autowrap, f, *args, **kwargs)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     rv = execute(f, *args, **kwargs)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     six.reraise(c, e, tb)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/six.py", line 693, in reraise
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     raise value
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     rv = meth(*args, **kwargs)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]   File "/usr/lib64/python3.6/site-packages/libvirt.py", line 605, in attachDeviceFlags
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7]     if ret == -1: raise libvirtError ('virDomainAttachDeviceFlags() failed', dom=self)
2019-05-31 21:18:22.391 [compute-0/N-CPU] 7 ERROR nova.virt.libvirt.driver [instance: 483cfa3d-2af5-4a4e-9296-e4204b59fbd7] libvirt.libvirtError: internal error: No more available PCI slots

Comment 3 Matthew Booth 2019-06-04 13:40:31 UTC
2019-05-31 21:17:30.807 [compute-0/N-CPU] 7 DEBUG nova.virt.libvirt.driver [req-dc72abc6-052a-49d0-9811-18950532d2a3 806027625a2d4a54b47f8c6e522aa6bd aa381084f7e14421abef5e4378404b36 - default default] [instance: 483c
fa3d-2af5-4a4e-9296-e4204b59fbd7] End _get_guest_xml xml=<domain type="kvm">
                                          <uuid>483cfa3d-2af5-4a4e-9296-e4204b59fbd7</uuid>
                                          <name>instance-0000002a</name>
                                          <memory>65536</memory>
                                          <vcpu>1</vcpu>
                                          <metadata>
                                            <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
                                              <nova:package version="19.0.1-0.20190528131506.498608c.el8ost"/>
                                              <nova:name>tempest-AttachInterfacesTestJSON-server-1268307078</nova:name>
                                              <nova:creationTime>2019-05-31 21:17:30</nova:creationTime>
                                              <nova:flavor name="m1.nano">
                                                <nova:memory>64</nova:memory>
                                                <nova:disk>1</nova:disk>
                                                <nova:swap>0</nova:swap>
                                                <nova:ephemeral>0</nova:ephemeral>
                                                <nova:vcpus>1</nova:vcpus>
                                              </nova:flavor>
                                              <nova:owner>
                                                <nova:user uuid="806027625a2d4a54b47f8c6e522aa6bd">tempest-AttachInterfacesTestJSON-848335770</nova:user>
                                                <nova:project uuid="aa381084f7e14421abef5e4378404b36">tempest-AttachInterfacesTestJSON-848335770</nova:project>
                                              </nova:owner>
                                              <nova:root type="image" uuid="10d693f8-c441-45c4-ae89-46b5b7922a41"/>
                                            </nova:instance>
                                          </metadata>
                                          <sysinfo type="smbios">
                                            <system>
                                              <entry name="manufacturer">Red Hat</entry>
                                              <entry name="product">OpenStack Compute</entry>
                                              <entry name="version">19.0.1-0.20190528131506.498608c.el8ost</entry>
                                              <entry name="serial">483cfa3d-2af5-4a4e-9296-e4204b59fbd7</entry>
                                              <entry name="uuid">483cfa3d-2af5-4a4e-9296-e4204b59fbd7</entry>
                                              <entry name="family">Virtual Machine</entry>
                                            </system>
                                          </sysinfo>
                                          <os>
                                            <type machine="pc-q35-rhel8.0.0">hvm</type>
                                            <boot dev="hd"/>
                                            <smbios mode="sysinfo"/>
                                          </os>
                                          <features>
                                            <acpi/>
                                            <apic/>
                                          </features>
                                          <cputune>
                                            <shares>1024</shares>
                                          </cputune>
                                          <clock offset="utc">
                                            <timer name="pit" tickpolicy="delay"/>
                                            <timer name="rtc" tickpolicy="catchup"/>
                                            <timer name="hpet" present="no"/>
                                          </clock>
                                          <cpu mode="host-model" match="exact">
                                            <topology sockets="1" cores="1" threads="1"/>
                                          </cpu>
                                          <devices>
                                            <disk type="file" device="disk">
                                              <driver name="qemu" type="qcow2" cache="none"/>
                                              <source file="/var/lib/nova/instances/483cfa3d-2af5-4a4e-9296-e4204b59fbd7/disk"/>
                                              <target bus="virtio" dev="vda"/>
                                            </disk>
                                            <interface type="bridge">
                                              <mac address="fa:16:3e:ac:9b:ac"/>
                                              <model type="virtio"/>
                                              <driver name="vhost" rx_queue_size="512"/>
                                              <source bridge="qbrd170f8c3-c0"/>
                                              <mtu size="1450"/>
                                              <target dev="tapd170f8c3-c0"/>
                                            </interface>
                                            <serial type="pty">
                                              <log file="/var/lib/nova/instances/483cfa3d-2af5-4a4e-9296-e4204b59fbd7/console.log" append="off"/>
                                            </serial>
                                            <input type="tablet" bus="usb"/>
                                            <graphics type="vnc" autoport="yes" listen="172.17.1.51"/>
                                            <video>
                                              <model type="cirrus"/>
                                            </video>
                                            <rng model="virtio">
                                              <backend model="random">/dev/urandom</backend>
                                            </rng>
                                            <memballoon model="virtio">
                                              <stats period="10"/>
                                            </memballoon>
                                          </devices>
                                        </domain>
                                         _get_guest_xml /usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py:5516

Comment 4 Matthew Booth 2019-06-04 13:49:39 UTC
Guessing this is another q35 bug. Perhaps we need to manually create a pcie-root-port controller? However, it reads to me like libvirt should be creating one for us: https://libvirt.org/pci-hotplug.html#x86_64-q35

Comment 6 Kashyap Chamarthy 2019-06-04 15:21:35 UTC
tl;dr — The immediate "fix" is to make TripleO configure the
        'num_pcie_ports' to 12 (or 16), because 'q35' machine type by
        default allows hotplugging only _one_  PCIe device.

Long
----

(*) Firstly, the Tempest test[1],
    test_create_list_show_delete_interfaces_by_network_port(), is trying
    to hot-plug *three* network interfaces:

        [...]
        try:
            iface = self._test_create_interface(server)
        [...]
        iface = self._test_create_interface_by_network_id(server, ifs)
        ifs.append(iface)

        iface = self._test_create_interface_by_port_id(server, ifs)
	ifs.append(iface)
        [...]

(*) We're here using 'q35' machine type, which by default allows only a
    *single* PCIe device to be hotplugged.  And Nova currently sets
    'num_pcie_ports' to "0" (which means, it defaults to libvirt's "1"),
    but as the previous point showed, the test is hot-plugging _3_
    interfaces.

    And as the libvirt document[2] states: "If you plan to hotplug more
    than a single PCI Express device, you should add a suitable number
    of pcie-root-port controllers when defining the guest".

(*) But the next question is: "Why does the test work with 'pc'
    machine type, then?"  It works because, with 'pc' (or 'i440fx'),
    "each of the 31 slots (from 0x01 to 0x1f) on the pci-root controller
    is hotplug capable and can accept a legacy PCI device"[3].

[1] https://github.com/openstack/tempest/blob/25f5d28f3c2c79d7d0abfaa48db5d53a41f5e40d/tempest/api/compute/servers/test_attach_interfaces.py#L219
[2] https://libvirt.org/pci-hotplug.html#x86_64-q35
[3] https://libvirt.org/pci-hotplug.html#x86_64-i440fx

Next Steps
----------

- Immediately, make TripleO increment the no. of 'num_pcie_ports' to 16.

- Long-term, write-up a spec-less Blueprint for allowing this via
  flavor and image metadata property (e.g. "hw_num_pcie_ports").

Comment 7 Artom Lifshitz 2019-06-05 01:59:03 UTC
This has been reproduced upstream with my DNM q35 job (once the IDE CDROM business was at least partially sorted by Lee's patch below).

[1] http://logs.openstack.org/87/662887/6/check/tempest-full-py3/65e3798/controller/logs/screen-n-cpu.txt.gz?level=ERROR#_Jun_04_23_59_40_494675

Comment 8 Martin Schuppert 2019-06-05 06:45:51 UTC
Cloned https://bugzilla.redhat.com/show_bug.cgi?id=1717290 to track increment 'num_pcie_ports' to 16 via tripleo

Comment 9 Matthew Booth 2019-06-05 11:52:28 UTC
We're addressing the immediate RHOS 15 issue in bug 1717290. This bug remains to track less urgent work in Nova to improve this 'out of the box'. e.g.:

* Increase the default to something closer to parity with 'pc'
* Expose num_pcie_ports via image properties and/or flavors.

Comment 10 Daniel Berrangé 2019-06-06 10:14:06 UTC
(In reply to Kashyap Chamarthy from comment #6)
> - Immediately, make TripleO increment the no. of 'num_pcie_ports' to 16.

If you want parity with i440fx machine type you should set num_pcie_ports to 32 instead.

Comment 11 Kashyap Chamarthy 2019-06-07 09:17:21 UTC
(In reply to Daniel Berrange from comment #10)
> (In reply to Kashyap Chamarthy from comment #6)
> > - Immediately, make TripleO increment the no. of 'num_pcie_ports' to 16.
> 
> If you want parity with i440fx machine type you should set num_pcie_ports to
> 32 instead.

Noted; as we've seen yesterday on IRC, the answer seems to be complex, depending on which architecture and machine type one is using?

From my current understanding (please correct if something is off here):

(*) For x86_64:

    - pc|i440fx : 32 PCIe root ports [with SeaBIOS]
    - q35        : 32 (?) PCIe root ports [with SeaBIOS]
    - q35        : ? PCIe root ports [with UEFI]
    - q35        : ? PCIe root ports [with UEFI]


(*) For AArch64:

   - 'virt' machine type: 24 PCIe root ports [with UEFI], with unpatched libvirt
   - 'virt' machine type: 32 PCIe root ports [with UEFI], based on Andrea 
     Bolognani's tests with a patched libvirt (using "io-reserve=0")


I'll update this bug with summary once the dust settles on this.

Comment 13 stchen 2020-11-04 19:53:03 UTC
Closing EOL, OSP 16.0 has been retired as of Oct 27, 2020


Note You need to log in before you can comment on or make changes to this bug.