Bug 1770697 - VM can't start after was shut down with - XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0
Summary: VM can't start after was shut down with - XML error: Invalid PCI address 0000...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.4.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ovirt-4.4.0
: ---
Assignee: Shmuel Melamud
QA Contact: Beni Pelled
URL:
Whiteboard:
: 1782741 (view as bug list)
Depends On:
Blocks: 1776317 1813694
TreeView+ depends on / blocked
 
Reported: 2019-11-11 08:14 UTC by Michael Burman
Modified: 2020-05-20 19:59 UTC (History)
19 users (show)

Fixed In Version: ovirt-engine-4.4.0 gitb5b5c99ca2f
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-20 19:59:32 UTC
oVirt Team: Virt
pm-rhel: ovirt-4.4+
pm-rhel: blocker?
pm-rhel: devel_ack+


Attachments (Terms of Use)
vdsm.log of comment 40 (46.09 KB, text/plain)
2020-03-27 05:14 UTC, Meina Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 105362 0 master MERGED core: Adjust VM devices coming from i440fx template for Q35 VM 2021-02-02 10:25:28 UTC
oVirt gerrit 106138 0 None MERGED ui: Correctly show cluster BIOS type when editing 2021-02-02 10:25:28 UTC
oVirt gerrit 106545 0 master MERGED core: Adjust VM devices coming from Q35 template for i440fx VM 2021-02-02 10:25:28 UTC
oVirt gerrit 106643 0 master MERGED core: check if the VM cd device is null 2021-02-02 10:25:28 UTC
oVirt gerrit 107785 0 master MERGED core: Correctly match VM chipset to Q35 cluster emulated machine 2021-02-02 10:26:13 UTC

Description Michael Burman 2019-11-11 08:14:20 UTC
Description of problem:
VM can't start after was shut down with - XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0

In RHV manager, after VM was running and then shut down, it can't be started again with a libvirt error:

2019-11-11 02:57:47,384-0500 ERROR (vm/8c6f4d16) [virt.vm] (vmId='8c6f4d16-ed8f-4074-a4ad-fd63f4139794') The vm start process failed (vm:841)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 775, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2597, in _run
    dom = self._connection.defineXML(self._domain.xml)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 3752, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirt.libvirtError: XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0
2019-11-11 02:57:47,384-0500 INFO  (vm/8c6f4d16) [virt.vm] (vmId='8c6f4d16-ed8f-4074-a4ad-fd63f4139794') Changed state to Down: XML error: Invalid PCI address 0000:03:01.0
. slot must be <= 0 (code=1) (vm:1610)

Searching the web, i have found 2 similar bugs:
BZ 1382079 and BZ 1366497 which were fixed in rhel 7. I would expect that same bugs already fixed in 8.1 as well, but it seems that i manage to reproduce the same error. 


Version-Release number of selected component (if applicable):
libvirt-client-5.0.0-12.module+el8.0.1+3755+6782b0ed.x86_64 (virt8.0)
rhel8.1

How reproducible:
100%

Steps to Reproduce:
1. Start VM in RHV using rhel8.1 host
2. SHut down the VM
3. Try to run the same VM again

Actual results:
Fail with - libvirt.libvirtError: XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0

Expected results:
Should work

Additional info:
Similar to https://bugzilla.redhat.com/show_bug.cgi?id=1382079 and https://bugzilla.redhat.com/show_bug.cgi?id=1366497

Comment 2 Han Han 2019-11-11 09:20:38 UTC
The failure is caused by:
        <sound model="ich6">
            <alias name="ua-0bc34fb5-5c3e-4376-8bdd-d50bfb45ed29" />
            <address bus="0x03" domain="0x0000" function="0x0" slot="0x01" type="pci" />
        </sound>

If pcie-root-port multifunction is not on, the slot should not > 0. It seems that vdsm
assigned the wrong address.

Comment 3 Han Han 2019-11-11 09:46:32 UTC
(In reply to Han Han from comment #2)
> The failure is caused by:
>         <sound model="ich6">
>             <alias name="ua-0bc34fb5-5c3e-4376-8bdd-d50bfb45ed29" />
>             <address bus="0x03" domain="0x0000" function="0x0" slot="0x01"
> type="pci" />
>         </sound>
> 
> If pcie-root-port multifunction is not on, the slot should not > 0. It seems
> that vdsm
> assigned the wrong address.

pcie-root-port doesn't accept slot should not > 0 no matter multifunction is on or not

Comment 4 Han Han 2019-11-11 09:59:09 UTC
For libvirt, if you define a q35 vm with specific pci address(slot > 0),
without pcie-to-pci-bridge and pci-bridge, that problem can be reproduced.(version libvirt 4.5 to 5.6)

Comment 5 Dan Kenigsberg 2019-11-11 12:09:08 UTC
Han Han, would you provide the proper XML which you believe ovirt-engine should have passed to libvirt?
ovirt never invents pci addresses; it only keeps formerly-used ones, and use that again to maintain pci address stability across VM restarts.

Comment 6 Laine Stump 2019-11-12 03:28:23 UTC
In the XML contained in the logs, the device is on bus 3, and bus 3 is a pcie-to-pci-bridge, which should be fine with slot > 0.

I'll take a look at this in the morning.

Comment 7 Han Han 2019-11-12 05:41:41 UTC
(In reply to Dan Kenigsberg from comment #5)
> Han Han, would you provide the proper XML which you believe ovirt-engine
> should have passed to libvirt?
> ovirt never invents pci addresses; it only keeps formerly-used ones, and use
> that again to maintain pci address stability across VM restarts.

In my opinion, ovirt should not pass address to domain xml when create vm, so the device xml should be like:
        <sound model="ich6">
            <alias name="ua-0bc34fb5-5c3e-4376-8bdd-d50bfb45ed29" />
        </sound>

Since you mentioned that the address is to maintain pci address stability across VM restarts. I think this case
may happen when changing machine type to q35.

Comment 8 Laine Stump 2019-11-12 17:22:20 UTC
I * think* that's what Dan is saying that they do - they allow libvirt to assign the PCI addresses when they initially define the domain, and then save those assignments for the next time they start the guest.

A side note - For Q35 guests, it is better to use the ich9 sound device rather than ich6 - ich6 requires an actual conventional PCI slot, which in turn requires adding the pcie-to-pci-bridge device to the config, but ich9 will be automatically placed directly on the pcie-root bus (mimicking real Q35 hardware), thus eliminating the need for a pcie-to-pci-bridge device.

Comment 9 Laine Stump 2019-11-12 19:01:02 UTC
Ah, wait - when I wrote Comment 6, I was looking at the libvirt XML at the timestamp "2019-11-11 02:52:18,432-0500" in vdsm.logs. It assigns bus='0x3' slot='0x1' to the ich6 device, and also preserves the controller for bus 3 (i.e. the controller that has "index='3'"):

        <controller index="3" model="pcie-to-pci-bridge" type="pci">
            <address bus="0x01" domain="0x0000" function="0x0" slot="0x00" type="pci" />
        </controller>

But the XML that is sent to libvirt just prior to the error, at timestamp "2019-11-11 02:54:31,202-0500", preserved the address of the ich6 device, but has thrown away all the PCI controllers from the config. It's just not possible (or at least much too complicated to even try) for libvirt to imply and reconstruct the PCI bus topology based only on addresses that have been assigned to endpoint devices - either you need to remove *all* PCI topology information from the config (including the PCI addresses assigned to endpoint devices) and let libvirt reconstruct it (I don't think you want to do this, since you want ABI stability), or you need to preserve all of the PCI topology information, including all of the pci controllers that libvirt creates underneath the addresses it assigns to endpoint devices.

Did vdsm previously do that and it stopped for some reason? Or is this something that wasn't necessary in the days of i440fx (because it's fairly simple to imply and recreate the topology of a conventional PCI system), and the problem is now showing up because it's the first time you've restarted a Q35-based guest? (That seems unlikely at this late date, but those are the only two possibilities I can think of).

Comment 10 Dan Kenigsberg 2019-11-13 05:20:58 UTC
> the problem is now showing up because it's the first time you've restarted a Q35-based guest

I am guessing this is it. I am less certain if ovirt-engine(*) should render more of the pci bus, or that libvirt should be smarter about reconstruction.

(*) Yes, this logic moved out of vdsm few years ago.

Comment 11 Michal Skrivanek 2019-11-13 07:46:22 UTC
it's not supposed to use ich6 at all. I would go back and start with:

> Steps to Reproduce:
> 1. Start VM in RHV using rhel8.1 host

Which VM? Where did you get that VM from? Can we get logs from the initial run?
which RHV version/build, is that a fresh install?

Comment 15 Michal Skrivanek 2019-11-13 09:19:46 UTC
(In reply to Dan Kenigsberg from comment #10)
> > the problem is now showing up because it's the first time you've restarted a Q35-based guest
> 
> I am guessing this is it. I am less certain if ovirt-engine(*) should render
> more of the pci bus, or that libvirt should be smarter about reconstruction.

well, there are (at least) two issues
- ich9 should have been used. It could be that it gets in from the template though. We do not expose it anywhere other than osinfo definition. Worth checking for new VMs vs imported VMs.
- bridge controllers are indeed dropped. They are treated as unmanaged devices, as RHV didn't create them

Comment 17 Michal Skrivanek 2019-11-13 09:30:37 UTC
(In reply to Laine Stump from comment #9)
> [snip]
> you need to preserve all of the
> PCI topology information, including all of the pci controllers that libvirt
> creates underneath the addresses it assigns to endpoint devices.

right, that's possible to do on our side

> Did vdsm previously do that and it stopped for some reason? Or is this
> something that wasn't necessary in the days of i440fx (because it's fairly
> simple to imply and recreate the topology of a conventional PCI system), and

it wasn't necessary. It's not that the topology would be any different, the assumption was that libvirt creates the base topology always the same way and then the addresses could remain the same. It does look like a libvirt regression, on i440fx it was able to accept those addresses. Or maybe it's just because we didn't use any bridge on i440fx?


Either way, can you pelase clarify what exactly needs to be preserved? all pcie-to-pci-bridges ? OR also pcie-root-ports? for root ports we always send/specify just the (same) number of them for libvirt to create.

Hm...now that I look again..that's missing from the first run. There seems to be another problem in engine's logic that on the first run the VM doesn't seem to be sent as a Q35 VM. Worth fixing that first and checking again

Comment 18 Michal Skrivanek 2019-11-13 09:40:00 UTC
reassigning to oVirt for the timebeing, we first need to see why q35 is not even being considered during VM xml creation (no PCIe ports written)
Probably the same reason why ich6 is used instead of ich9

Comment 19 RHEL Program Management 2019-11-13 09:40:11 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 20 Michal Skrivanek 2019-11-13 09:52:13 UTC
(In reply to Michal Skrivanek from comment #18)

> Probably the same reason why ich6 is used instead of ich9

eh, no, I can see in the provided logs that Vm4 was created from scratch and is defining the topology all right (so it's recognized as a q35) yet the sound card is ich6 which is not what osinfo says:
packaging/conf/osinfo-defaults.properties:os.other.devices.audio.value = ich6,q35/ich9

xml:

    <controller type="pci" model="pcie-root"/>
    <controller type="pci" model="pcie-root-port"/>
    <controller type="pci" model="pcie-root-port"/>
    [...] 
16 root ports
    [...]
    <sound model="ich6">
      <alias name="ua-c3066907-ff50-4fac-9182-a6aa3b1e35a8"/>
    </sound>

The good news is that then root port controllers are preserved all right

Comment 21 Laine Stump 2019-11-13 17:25:42 UTC
(In reply to Michal Skrivanek from comment #17)

> it wasn't necessary. It's not that the topology would be any different, the
> assumption was that libvirt creates the base topology always the same way
> and then the addresses could remain the same. It does look like a libvirt
> regression, on i440fx it was able to accept those addresses. Or maybe it's
> just because we didn't use any bridge on i440fx?

Definitely you didn't use any pci-bridge on i440fx - all the devices you need would fit directly on the pci-root bus. If you had, though, libvirt would be able to auto-recreate the correct topology, since there is only one type of PCI controller that is valid for i440fx (aside from pci-root) - pci-bridge.

In the case of Q35 though, there are several different types / models of PCI controllers, and for many/most of them they can't just be directly added, but must be added in a "chain" (e.g. a pcie-to-pci-bridge can't be plugged directly into pcie-root, but must be plugged into a pcie-root-port, which must, in turn, be plugged into pcie-root. But there are two different models of pcie-root-port, and anyway perhaps the user actually wanted a pci-bridge, or dmi-to-pci-bridge rather than a pcie-to-pci-bridge. We actually contemplated doing a "cascaded" creation of controllers, but it turned already complicated code into a tower of Babel.

> 
> 
> Either way, can you pelase clarify what exactly needs to be preserved? all
> pcie-to-pci-bridges ? OR also pcie-root-ports? for root ports we always
> send/specify just the (same) number of them for libvirt to create.

I would just save all <controller> elements. The whole point of us spitting them back out in the XML is to assure that guest ABI is stable.

> 
> Hm...now that I look again..that's missing from the first run. There seems
> to be another problem in engine's logic that on the first run the VM doesn't
> seem to be sent as a Q35 VM. Worth fixing that first and checking again

That would explain why the ich6 soundcard was used...

Comment 22 Ilan Zuckerman 2019-11-26 11:28:35 UTC
Just encountered the similar issue.
This is an Automation blocker for us since many tests rely on starting and stopping vms.

2019-11-26 09:12:16,648+0000 ERROR (vm/8059fc2f) [virt.vm] (vmId='8059fc2f-f88d-4a5a-8c30-65aebabc8a6a') The vm start process failed (
vm:841)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 775, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2597, in _run
    dom = self._connection.defineXML(self._domain.xml)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 3752, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirt.libvirtError: internal error: Bus 0 must be PCI for integrated PIIX3 USB or IDE controllers

Comment 23 Shmuel Melamud 2019-11-26 11:46:56 UTC
(In reply to Ilan Zuckerman from comment #22)
> Just encountered the similar issue.

Can you please attach the VM XML?

Comment 24 Ilan Zuckerman 2019-11-26 12:15:39 UTC
Melamud from comment #23)
> (In reply to Ilan Zuckerman from comment #22)
> > Just encountered the similar issue.
> 
> Can you please attach the VM XML?

Sure, here you go Shmuel.

2019-11-26 09:12:16,645+0000 INFO  (vm/8059fc2f) [virt.vm] (vmId='8059fc2f-f88d-4a5a-8c30-65aebabc8a6a') <?xml version='1.0' encoding='utf-8'?>
<domain xmlns:ns0="http://ovirt.org/vm/tune/1.0" xmlns:ovirt-vm="http://ovirt.org/vm/1.0" type="kvm">
    <name>ansible_regression_vm</name>
    <uuid>8059fc2f-f88d-4a5a-8c30-65aebabc8a6a</uuid>
    <memory>1048576</memory>
    <currentMemory>1048576</currentMemory>
    <iothreads>1</iothreads>
    <maxMemory slots="16">4194304</maxMemory>
    <vcpu current="1">16</vcpu>
    <sysinfo type="smbios">
        <system>
            <entry name="manufacturer">Red Hat</entry>
            <entry name="product">RHEL</entry>
            <entry name="version">8.1-3.3.el8</entry>
            <entry name="serial">4c4c4544-004d-5a10-804c-b3c04f515731</entry>
            <entry name="uuid">8059fc2f-f88d-4a5a-8c30-65aebabc8a6a</entry>
        </system>
    </sysinfo>
    <clock adjustment="0" offset="variable">
        <timer name="rtc" tickpolicy="catchup" />
        <timer name="pit" tickpolicy="delay" />
        <timer name="hpet" present="no" />
    </clock>
    <features>
        <acpi />
        <vmcoreinfo />
    </features>
    <cpu match="exact">
        <model>SandyBridge</model>
        <topology cores="1" sockets="16" threads="1" />
        <numa>
            <cell cpus="0-15" id="0" memory="1048576" />
        </numa>
    </cpu>
    <cputune />
    <devices>
        <input bus="usb" type="tablet" />
        <channel type="unix">
            <target name="ovirt-guest-agent.0" type="virtio" />
            <source mode="bind" path="/var/lib/libvirt/qemu/channels/8059fc2f-f88d-4a5a-8c30-65aebabc8a6a.ovirt-guest-agent.0" />
        </channel>
        <channel type="unix">
            <target name="org.qemu.guest_agent.0" type="virtio" />
            <source mode="bind" path="/var/lib/libvirt/qemu/channels/8059fc2f-f88d-4a5a-8c30-65aebabc8a6a.org.qemu.guest_agent.0" />
        </channel>
        <video>
            <model heads="1" ram="65536" type="qxl" vgamem="16384" vram="32768" />
            <alias name="ua-08656a6b-ffd0-40f4-94bd-fccdc83ec7c0" />
        </video>
        <controller index="0" model="virtio-scsi" type="scsi">
            <driver iothread="1" />
            <alias name="ua-3adfa8dc-42dc-4282-8b3d-a0b11a3fc719" />
        </controller>
        <controller index="0" ports="16" type="virtio-serial">
            <alias name="ua-520ba73e-4485-44a7-91c2-c958078d7edb" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x05" type="pci" />
        </controller>
        <graphics autoport="yes" keymap="en-us" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" type="vnc">
            <listen network="vdsm-display" type="network" />
        </graphics>
        <controller type="ide">
            <address bus="0x00" domain="0x0000" function="0x1" slot="0x01" type="pci" />
        </controller>
        <memballoon model="virtio">
            <stats period="5" />
            <alias name="ua-a340e86b-dc64-415a-94d4-71dfe59c0ed3" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x09" type="pci" />
        </memballoon>
        <controller index="0" model="qemu-xhci" ports="8" type="usb" />
        <rng model="virtio">
            <backend model="random">/dev/urandom</backend>
            <alias name="ua-bb5d6fdb-7cc1-43c2-9db8-fcc1cf6d639f" />
        </rng>
        <graphics autoport="yes" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" tlsPort="-1" type="spice">
            <channel mode="secure" name="main" />
            <channel mode="secure" name="inputs" />
            <channel mode="secure" name="cursor" />
            <channel mode="secure" name="playback" />
            <channel mode="secure" name="record" />
            <channel mode="secure" name="display" />
            <channel mode="secure" name="smartcard" />
            <channel mode="secure" name="usbredir" />
            <listen network="vdsm-display" type="network" />
        </graphics>
        <sound model="ich6">
            <alias name="ua-fecd520d-52be-4902-847f-2a90d88bd1b6" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x04" type="pci" />
        </sound>
        <channel type="spicevmc">
            <target name="com.redhat.spice.0" type="virtio" />
        </channel>
        <disk device="cdrom" snapshot="no" type="file">
            <driver error_policy="report" name="qemu" type="raw" />
            <source file="" startupPolicy="optional">
                <seclabel model="dac" relabel="no" type="none" />
            </source>
            <target bus="sata" dev="sdc" />
            <readonly />
            <alias name="ua-7c99a867-7dd5-4863-8078-746dd12eca14" />
            <address bus="1" controller="0" target="0" type="drive" unit="0" />
        </disk>
        <disk device="disk" snapshot="no" type="block">
            <target bus="virtio" dev="vda" />
            <source dev="/rhev/data-center/mnt/blockSD/66c682f4-d084-4144-afdf-c15f25c6e3ce/images/3835a0b1-9a83-4984-9b81-cb6f606554f9/349789f6-181a-4977-9654-edd40e552b5f">
                <seclabel model="dac" relabel="no" type="none" />
            </source>
            <driver cache="none" error_policy="stop" io="native" iothread="1" name="qemu" type="qcow2" />
            <alias name="ua-3835a0b1-9a83-4984-9b81-cb6f606554f9" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x07" type="pci" />
            <boot order="1" />
            <serial>3835a0b1-9a83-4984-9b81-cb6f606554f9</serial>
        </disk>
        <disk device="disk" snapshot="no" type="block">
            <target bus="virtio" dev="vdb" />
            <source dev="/rhev/data-center/mnt/blockSD/66c682f4-d084-4144-afdf-c15f25c6e3ce/images/f0c1dbdd-5197-498d-98c1-ee3e22091bc2/757236d8-0f88-4d4e-b787-f27831f5b635">
                <seclabel model="dac" relabel="no" type="none" />
            </source>
            <driver cache="none" error_policy="stop" io="native" iothread="1" name="qemu" type="raw" />
            <alias name="ua-f0c1dbdd-5197-498d-98c1-ee3e22091bc2" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x08" type="pci" />
            <serial>f0c1dbdd-5197-498d-98c1-ee3e22091bc2</serial>
        </disk>
        <interface type="bridge">
            <model type="virtio" />
            <link state="up" />
            <source bridge="vm" />
            <alias name="ua-31edfbff-357d-4d4e-9577-14a6b202fb97" />
            <address bus="0x00" domain="0x0000" function="0x0" slot="0x03" type="pci" />
            <mac address="00:1a:4a:16:29:15" />
            <mtu size="1500" />
            <filterref filter="vdsm-no-mac-spoofing" />
            <bandwidth />
        </interface>
    </devices>
    <pm>
        <suspend-to-disk enabled="no" />
        <suspend-to-mem enabled="no" />
    </pm>
    <os>
        <type arch="x86_64" machine="pc-q35-rhel8.0.0">hvm</type>
        <smbios mode="sysinfo" />
    </os>
    <metadata>
        <ns0:qos />
        <ovirt-vm:vm>
            <ovirt-vm:minGuaranteedMemoryMb type="int">128</ovirt-vm:minGuaranteedMemoryMb>
            <ovirt-vm:clusterVersion>4.4</ovirt-vm:clusterVersion>
            <ovirt-vm:custom />
            <ovirt-vm:device mac_address="00:1a:4a:16:29:15">
                <ovirt-vm:custom />
            </ovirt-vm:device>
            <ovirt-vm:device devtype="disk" name="vda">
                <ovirt-vm:poolID>480b76cf-3546-465d-87af-5e8c3956c1c5</ovirt-vm:poolID>
                <ovirt-vm:volumeID>349789f6-181a-4977-9654-edd40e552b5f</ovirt-vm:volumeID>
                <ovirt-vm:imageID>3835a0b1-9a83-4984-9b81-cb6f606554f9</ovirt-vm:imageID>
                <ovirt-vm:domainID>66c682f4-d084-4144-afdf-c15f25c6e3ce</ovirt-vm:domainID>
            </ovirt-vm:device>
            <ovirt-vm:device devtype="disk" name="vdb">
                <ovirt-vm:poolID>480b76cf-3546-465d-87af-5e8c3956c1c5</ovirt-vm:poolID>
                <ovirt-vm:volumeID>757236d8-0f88-4d4e-b787-f27831f5b635</ovirt-vm:volumeID>
                <ovirt-vm:imageID>f0c1dbdd-5197-498d-98c1-ee3e22091bc2</ovirt-vm:imageID>
                <ovirt-vm:domainID>66c682f4-d084-4144-afdf-c15f25c6e3ce</ovirt-vm:domainID>
            </ovirt-vm:device>
            <ovirt-vm:launchPaused>false</ovirt-vm:launchPaused>
            <ovirt-vm:resumeBehavior>auto_resume</ovirt-vm:resumeBehavior>
        </ovirt-vm:vm>
    </metadata>
</domain>
 (vm:2595)
2019-11-26 09:12:16,648+0000 ERROR (vm/8059fc2f) [virt.vm] (vmId='8059fc2f-f88d-4a5a-8c30-65aebabc8a6a') The vm start process failed (vm:841)
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 775, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python3.6/site-packages/vdsm/virt/vm.py", line 2597, in _run
    dom = self._connection.defineXML(self._domain.xml)
  File "/usr/lib/python3.6/site-packages/vdsm/common/libvirtconnection.py", line 131, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/vdsm/common/function.py", line 94, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/libvirt.py", line 3752, in defineXML
    if ret is None:raise libvirtError('virDomainDefineXML() failed', conn=self)
libvirt.libvirtError: internal error: Bus 0 must be PCI for integrated PIIX3 USB or IDE controllers(In reply to Shmuel

Comment 25 Avihai 2019-11-26 14:13:22 UTC
Hi Shmuel,
There is another bug with a similar issue[1] which is bug 1771454.

We have 2 different libvirt errors:

1) Libvirt from this bug's initial description error which is:
libvirt.libvirtError: XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0

2) From bug 1771454:
libvirt.libvirtError: internal error: Bus 0 must be PCI for integrated PIIX3 USB or IDE controllers

If this is indeed the same root cause you can duplicate, if not I think it would be better if you handle the 2nd error in bug 1771454.

Comment 26 Shmuel Melamud 2019-11-26 14:31:17 UTC
The issue that Ilan found happens with a VM created from a template imported from a previous RHV release. The template contains IDE controllers that require bus 0 to be PCI, but it is PCI-E in Q35.

Comment 27 Ryan Barry 2019-11-26 14:41:03 UTC
There are a few bugs which look similar to this. All from templates?

Comment 28 Michael Burman 2019-11-26 14:47:49 UTC
(In reply to Ryan Barry from comment #27)
> There are a few bugs which look similar to this. All from templates?

There is one bug at least that is for new VMs for sure 100% , BZ 1776317

Comment 29 Shmuel Melamud 2019-11-26 15:11:01 UTC
(In reply to Michael Burman from comment #28)
> There is one bug at least that is for new VMs for sure 100% , BZ 1776317

It doesn't seem to be the same issue.

Comment 30 Shmuel Melamud 2019-11-26 15:17:57 UTC
(In reply to Ryan Barry from comment #27)
> There are a few bugs which look similar to this. All from templates?

All bugs mentioned here that return "libvirt.libvirtError: internal error: Bus 0 must be PCI for integrated PIIX3 USB or IDE controllers" error are related to templates. As I checked recently, this is reproducible also for templates created in 4.4, if they are created from a i440fx VM.

Comment 31 Michal Skrivanek 2019-11-26 15:21:06 UTC
I believe Burman mentioned it's happening to him with Blank as well

Comment 32 Shmuel Melamud 2019-11-26 15:40:29 UTC
(In reply to Michal Skrivanek from comment #31)
> I believe Burman mentioned it's happening to him with Blank as well

Yes, this issue is different, I'm investigating it currently. One thing is repoducible: ich6 sound card is created for q35 VMs.

Comment 33 Michal Skrivanek 2019-12-05 15:00:47 UTC
I believe your issues are resolved good enough with a merged fix (linked in bug 1776317).
For non-Blank usage your Templates should actually be rebuilt and they should be (re)created using q35 VMs. 

We still want to fix/allow the behavior to deploy q35 VMs form i440fx VMs but it's going to be an involved process(confirmation dialog) anyway and not really a proper way how to use Templates. As such, not an automationblocker

Comment 34 Michal Skrivanek 2019-12-12 10:38:34 UTC
(In reply to Michal Skrivanek from comment #31)
> I believe Burman mentioned it's happening to him with Blank as well

I tested build with https://gerrit.ovirt.org/#/c/105322/ and it still doesn't work properly.
Apparently the autodetection setting is not working correctly, so I was able to reproduce it by
- create new cluster, don't touch anything, just add a name and create
- add a host(or reassign from elsewhere)
- create a Blank VM 
- Run VM

it's the same hybrid i440fx/q35 VM. Apparently because:
select bios_type from cluster;
 bios_type
-----------
         1

which is wrong. If I open UI then it shows "Q35 with Legacy BIOS" but it's _wrong_. The db gets updated with 2 once I hit the Save button.

Comment 35 Ryan Barry 2019-12-12 13:16:19 UTC
(In reply to Michal Skrivanek from comment #34)
> (In reply to Michal Skrivanek from comment #31)
> > I believe Burman mentioned it's happening to him with Blank as well
> 
> I tested build with https://gerrit.ovirt.org/#/c/105322/ and it still
> doesn't work properly.
> Apparently the autodetection setting is not working correctly, so I was able
> to reproduce it by
> - create new cluster, don't touch anything, just add a name and create
> - add a host(or reassign from elsewhere)
> - create a Blank VM 
> - Run VM
> 
> it's the same hybrid i440fx/q35 VM. Apparently because:
> select bios_type from cluster;
>  bios_type
> -----------
>          1
> 
> which is wrong. If I open UI then it shows "Q35 with Legacy BIOS" but it's
> _wrong_. The db gets updated with 2 once I hit the Save button.

Which field is saved?

Comment 36 Liran Rotenberg 2020-01-29 15:36:29 UTC
*** Bug 1782741 has been marked as a duplicate of this bug. ***

Comment 37 Michal Skrivanek 2020-03-18 08:30:56 UTC
(In reply to Ryan Barry from comment #35)
> Which field is saved?

the field i list above, bios_type;-)

any update on this? I suppose it was fixed? but the patches linked into this bug don't look relevant.

Comment 38 Shmuel Melamud 2020-03-18 13:44:01 UTC
There is another patch https://gerrit.ovirt.org/#/c/106138/ that fixes the issue in comment 34.

Comment 39 Ryan Barry 2020-03-26 03:11:43 UTC
*** Bug 1817053 has been marked as a duplicate of this bug. ***

Comment 40 Meina Li 2020-03-27 05:13:38 UTC
I also encountered this bug in another scenario:

Test Version:
ovirt-engine-4.4.0-0.25.master.el8ev.noarch
vdsm-4.40.5-1.el8ev.x86_64
libvirt-client-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64

Test Steps:
1. Create and install a vm with gluster storage.
2. Create 2 snapshots for this vm.
3. Clone a vm from the second snapshot.
4. Start the vm failed.

Test error info in vdsm.log:
libvirt.libvirtError: XML error: Invalid PCI address 0000:03:01.0. slot must be <= 0

...
<controller type="usb" model="piix3-uhci" index="0">
<alias name="ua-3c926f89-23f5-4315-889e-9d63deedd6f8">
</alias>
<address bus="0x03" domain="0x0000" function="0x0" slot="0x01" type="pci">
</address>
...

Comment 41 Meina Li 2020-03-27 05:14:59 UTC
Created attachment 1673954 [details]
vdsm.log of comment 40

Comment 42 Beni Pelled 2020-03-29 10:41:21 UTC
Verified with:
- ovirt-engine-4.4.0-0.26.master.el8ev.noarch
- vdsm-4.40.7-1.el8ev.x86_64 (host)
- libvirt-daemon-6.0.0-15.module+el8.2.0+6106+b6345808.x86_64 (host)

Verification steps #1:

- Create a VM with RHEL8.1
- Start the VM on a Host with RHEL8.1
- Shutdown the VM
- Start the VM again


Verification steps #2:
- Create a VM with a disk on gluster storage domain
- Start the VM on Host with RHEL8.1
- Create a first Snapshot
- Create a second Snapshot
- Clone a VM from the second snapshot and start it on a host with RHEL8.1

Result:
- The VM is up and running with no errors (both cases)

Comment 43 Sandro Bonazzola 2020-05-20 19:59:32 UTC
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020.

Since the problem described in this bug report should be
resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.