Bug 2034160 - v2v conversion is failed with PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-net-pci [code=1 int1=-1]
Summary: v2v conversion is failed with PCI: slot 2 function 0 not available for virtio...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: libguestfs
Version: 9.0
Hardware: x86_64
OS: Unspecified
urgent
urgent
Target Milestone: rc
: ---
Assignee: Laszlo Ersek
QA Contact: YongkuiGuo
URL:
Whiteboard:
Depends On:
Blocks: 2035177
TreeView+ depends on / blocked
 
Reported: 2021-12-20 09:45 UTC by mxie@redhat.com
Modified: 2022-05-17 12:39 UTC (History)
17 users (show)

Fixed In Version: libguestfs-1.46.1-2.el9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2035177 (view as bug list)
Environment:
Last Closed: 2022-05-17 12:28:38 UTC
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-106211 0 None None None 2021-12-20 09:51:07 UTC
Red Hat Product Errata RHBA-2022:2317 0 None Closed VMs in paused state 2022-06-06 11:39:55 UTC

Description mxie@redhat.com 2021-12-20 09:45:35 UTC
Description of problem:
v2v conversion is failed with PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-net-pci [code=1 int1=-1]


Version-Release number of selected component (if applicable):
qemu-img-6.2.0-1.el9.x86_64
virt-v2v-1.45.95-1.el9.x86_64
libvirt-libs-7.10.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Convert a guest from VMware by v2v
# virt-v2v -ic vpx://root@10.73.73.141/data/10.73.196.89/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk6.7 -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA   -ip /home/passwd  -o rhv-upload -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api  -op /home/rhvpasswd  -os nfs_data -b ovirtmgmt esx6.5-win11-x86_64 -v -x ^C
[root@dell-per740-53 ~]# virt-v2v -ic vpx://root@10.73.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.2 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  -ip /home/passwd  -o rhv-upload -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api  -op /home/rhvpasswd  -os nfs_data -b ovirtmgmt esx7.0-win11-x86_64  
[  22.4] Opening the source
virt-v2v: error: libguestfs error: could not create appliance through 
libvirt.

Try running qemu directly without libvirt using this environment variable:
export LIBGUESTFS_BACKEND=direct

Original error from libvirt: internal error: qemu unexpectedly closed the 
monitor: 2021-12-20T03:46:39.698837Z qemu-kvm: -device 
{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x2"}: PCI: 
slot 2 function 0 not available for virtio-scsi-pci, in use by 
virtio-net-pci [code=1 int1=-1]

If reporting bugs, run virt-v2v with debugging enabled and include the 
complete output:

  virt-v2v -v -x [...]

Actual results:


Expected results:
As above description

Additional info:
1.Can't reproduce the bug with qemu-img-6.1.0-8.el9.x86_64, so this is a regression bug
# rpm -q qemu-img
qemu-img-6.1.0-8.el9.x86_64

# virt-v2v -ic vpx://root@10.73.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.2 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  -ip /home/passwd  -o rhv-upload -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api  -op /home/rhvpasswd  -os nfs_data -b ovirtmgmt esx7.0-win11-x86_64  
[  22.3] Opening the source
[  29.4] Inspecting the source
[  35.1] Checking for sufficient free disk space in the guest
[  35.1] Converting Windows 10 Enterprise to run on KVM
virt-v2v: This guest has virtio drivers installed.
[  39.9] Mapping filesystem data to avoid copying unused and blank areas
[  41.4] Closing the overlay
[  41.7] Assigning disks to buses
[  41.7] Checking if the guest needs BIOS or UEFI to boot
[  41.7] Copying disk 1/1

Comment 5 Laszlo Ersek 2021-12-21 08:39:45 UTC
From comment#1, this is the libvirt domain XML:

<?xml version="1.0"?>
<domain xmlns:qemu="http://libvirt.org/schemas/domain/qemu/1.0" type="kvm">
  <name>guestfs-o8v8wa77m4o10bq3</name>
  <memory unit="MiB">2560</memory>
  <currentMemory unit="MiB">2560</currentMemory>
  <cpu mode="maximum"/>
  <vcpu>8</vcpu>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <os>
    <type>hvm</type>
    <kernel>/var/tmp/.guestfs-0/appliance.d/kernel</kernel>
    <initrd>/var/tmp/.guestfs-0/appliance.d/initrd</initrd>
    <cmdline>panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=UUID=ffc7cd5a-d69a-4b72-a60d-1d453ccfd85a selinux=0 guestfs_verbose=1 guestfs_network=1 TERM=xterm-256color guestfs_identifier=v2v</cmdline>
    <bios useserial="yes"/>
  </os>
  <on_reboot>destroy</on_reboot>
  <devices>
    <rng model="virtio">
      <backend model="random">/dev/urandom</backend>
    </rng>
    <controller type="scsi" index="0" model="virtio-scsi"/>
    <disk device="disk" type="network">
      <source protocol="nbd">
        <host transport="unix" socket="/tmp/v2v.xLInTr/in0"/>
      </source>
      <target dev="sda" bus="scsi"/>
      <driver name="qemu" type="raw" cache="unsafe" discard="unmap"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="file" device="disk">
      <source file="/tmp/libguestfsbVZPh5/overlay1.qcow2"/>
      <target dev="sdb" bus="scsi"/>
      <driver name="qemu" type="qcow2" cache="unsafe"/>
      <address type="drive" controller="0" bus="0" target="1" unit="0"/>
    </disk>
    <serial type="unix">
      <source mode="connect" path="/tmp/libguestfshAVrzF/console.sock"/>
      <target port="0"/>
    </serial>
    <channel type="unix">
      <source mode="connect" path="/tmp/libguestfshAVrzF/guestfsd.sock"/>
      <target type="virtio" name="org.libguestfs.channel.0"/>
    </channel>
    <controller type="usb" model="none"/>
    <memballoon model="none"/>
  </devices>
  <qemu:commandline>
    <qemu:env name="TMPDIR" value="/var/tmp"/>
    <qemu:arg value="-netdev"/>
    <qemu:arg value="user,id=usernet,net=169.254.0.0/16"/>
    <qemu:arg value="-device"/>
    <qemu:arg value="virtio-net-pci,netdev=usernet"/>
  </qemu:commandline>
</domain>

Note the use of </qemu:commandline>. I don't yet know what justifies it, but it's no surprise that QEMU detects an address conflict in this case. Libvirtd generates the PCI addresses for all devices it knows about, and makes sure those addresses do not conflict. But the virtio-net NIC added in <qemu:commandline> is something that libvirtd cannot manage. So, what likely happens is that QEMU first handles the virtio-net NIC, and automatically assigns a PCI address to it. Then QEMU encounters some other device, whose PCI address is fully specified by libvirtd. That's when the conflict is determined -- as QEMU happened to auto-assign that same address to the virtio-net NIC previously.

What seems fishy here is the use of <qemu:commandline> to begin with. This is the first time I see it. Once we figure out where <qemu:commandline> originates from, we can set the proper Component for this bugzilla ticket (qemu-kvm seems innocent).

Comment 6 Laszlo Ersek 2021-12-21 08:41:11 UTC
The statement from comment#0 that the issue vanishes by downgrading *qemu-img* of all things, is very strange!

Comment 7 Richard W.M. Jones 2021-12-21 08:42:59 UTC
That's from the appliance and it comes from:

https://github.com/libguestfs/libguestfs/blob/4af6d68e2d8b856d91fa5527216ea3db04556086/lib/launch-libvirt.c#L1824

Both elements are necessary.

Is this a regression/change in libvirt or qemu??

Comment 8 Laszlo Ersek 2021-12-21 08:51:30 UTC
The source code of the stated virt-v2v version does not contain the string "commandline" at all, therefore the <qemu:commandline> element in the domain XML I highlighted in comment 5 *cannot* come from virt-v2v. Thus far it seems like either a libvirtd bug (very unlikely), or some manual modification of the domain XML (which I don't know how is possible, as virt-v2v generates the domain XML).

Comment 9 Laszlo Ersek 2021-12-21 08:52:03 UTC
(ugh, sorry, out of order comment)

Comment 10 Laszlo Ersek 2021-12-21 09:06:01 UTC
(In reply to Richard W.M. Jones from comment #7)
> That's from the appliance and it comes from:
> 
> https://github.com/libguestfs/libguestfs/blob/
> 4af6d68e2d8b856d91fa5527216ea3db04556086/lib/launch-libvirt.c#L1824
> 
> Both elements are necessary.
> 
> Is this a regression/change in libvirt or qemu??

I don't know, but IMO this <qemu:commandline> addition -- restored in commit 492a945791b4 ("Revert "launch: libvirt: Use qemu-bridge-helper to implement a full network (RHBZ#1148012)."", 2019-01-07) -- has never had any stability guarantees. Libvirt owns the PCI address allocation, and we're interfering with that.

I see that commit 492a945791b4 is actually a revert of commit 224de20b9a8d ("launch: libvirt: Use qemu-bridge-helper to implement a full network (RHBZ#1148012).", 2014-10-02). Why was that commit reverted? The revert commit (492a945791b4) names only one reason, namely "SLIRP is going to be a separate project and to get better support". It does not look convincing; SLIRP should be avoided at all costs IMO.

"/etc/qemu-kvm/bridge.conf" contains "allow virbr0" by default even on RHEL7, so we should just use the virbr0 bridge via the standard domain XML.

... Is the reason that we want the appliance to have a fixed IP address, per <https://bugzilla.redhat.com/show_bug.cgi?id=1148012#c2>? I think we could do that just by using a static IP assignment inside the appliance, without using DHCP at all. DHCP being available on a network does not require all hosts on that network to use DHCP; static assignment (in particular outside of the DHCP pool) works fine alongside DHCP.

Comment 11 Laszlo Ersek 2021-12-21 09:09:40 UTC
I'm even more confused now: parent commit 67e6f32a240c ("appliance: Use dhclient or dhcpcd instead of hard-coding IP address of appliance.", 2014-10-02) actually implemented the DHCP request, replacing the in-guest fixed IP address assignment. I don't know what the problem was with those patches, the feature seems complete and good to me.

Comment 12 Richard W.M. Jones 2021-12-21 09:12:53 UTC
libguestfs networking has to work as non-root, and it has to work on a wide range
of distros without needing any non-standard net configuration or editing of config
files at install time.  I'm pretty sure qemu-bridge-helper violates one or both of
these conditions, although I don't recall exact details right now.  SLIRP works well
for our needs (security concerns aside).  Anyhow I'm not really going to be able to
look in detail at this until January.

Comment 13 Laszlo Ersek 2021-12-21 09:25:33 UTC
The auto-assignment of PCI addresses is in "hw/pci/pci.c" in QEMU; I've tried to review the changes for that file in the v6.1.0..v6.2.0 commit set. There are a few commits, but nothing seems related.

The issue could be in libvirtd too; however (per comment#0) downgrading just QEMU makes the issue go away. :/

We'd have to get a before-after comparison for the completed libvirt domain XML, *and* the "before" output of the "info qtree" monitor command (to see the libvirt-specified virtio-scsi-pci PCI address and the auto-assigned virtio-net-pci PCI address, in the working case).

Comment 14 Laszlo Ersek 2021-12-21 09:34:05 UTC
Hi Ming Xie,

can you please collect & attach the following info:

(1) in the functional (pre-regression) case, please start "guestfish", launch the appliance (using the libvirt backend), and then use "virsh dumpxml" to collect the complete domain XML for the running appliance.

(2) In the same state, please run the following command from a different terminal (as the same user):

virsh -c LIBVIRTD_URI qemu-monitor-command APPLIANCE_NAME --hmp info qtree > qtree.txt

Customize LIVIRTD_URI and APPLIANCE_NAME as necessary. Please attach the "qtree.txt" file.

(3) This is a more difficult step:

(3a) With the broken (post-regression) setup, please take the domain XML from the virt-v2v log (similar to the one I highlighted in comment 5).

(3b) Once you have that domain XML -- with a domain name like "guestfs-o8v8wa77m4o10bq3" in it --, please run an explicit "virsh define" command on it.

(3c) With the domain permanently defined, please run "virsh dumpxml" on the domain, so that we can see the PCI addresses assigned by libvirtd. We'll compare it with the output of step (1).

Thanks!

Comment 15 Laszlo Ersek 2021-12-21 09:41:38 UTC
Either way, we can likely hack this around by specifying the following PCI B/D/F for our NIC:


  virtio-net-pci,netdev=usernet,addr=1e.0

This would use function#0 in the penultimate slot on the root bus -- that slot should not contain any device with a hard-coded address (such as the SATA controller on Q35 in slot 0x1f), and would likely not be auto-assigned by either libvirtd or qemu to any other device.

Comment 16 Laszlo Ersek 2021-12-21 09:56:01 UTC
The address auto-assignment in qemu (hw/pci/pci.c) seems to scan slots
in increasing order.

The libvirtd logic (src/qemu/qemu_domain_address.c) is very complex. It
seems to assign slot 0x1E explicitly only to the DMI-to-PCI bridge; and
that device model is not added to the domain, as far as I can tell. The
libvirt docs confirm it:

https://libvirt.org/formatdomain.html#controllers

> For machine types which provide an implicit PCI Express (PCIe) bus
> (for example, the machine types based on the Q35 chipset), the
> pcie-root controller with index=0 is auto-added to the domain's
> configuration. pcie-root has also no address, provides 31 slots
> (numbered 1-31) that can be used to attach PCIe or PCI devices
> (although libvirt will never auto-assign a PCI device to a PCIe slot,
> it will allow manual specification of such an assignment). Devices
> connected to pcie-root cannot be hotplugged. If traditional PCI
> devices are present in the guest configuration, a pcie-to-pci-bridge
> controller will automatically be added: this controller, which plugs
> into a pcie-root-port, provides 31 usable PCI slots (1-31) with
> hotplug support ( since 4.3.0 ). If the QEMU binary doesn't support
> the corresponding device, then a dmi-to-pci-bridge controller will be
                                   ^^^^^^^^^^^^^^^^^
> added instead, usually at the defacto standard location of slot=0x1e.
> A dmi-to-pci-bridge controller plugs into a PCIe slot (as provided by
> pcie-root), and itself provides 31 standard PCI slots (which also do
> not support device hotplug). In order to have hot-pluggable PCI slots
> in the guest system, a pci-bridge controller will also be
> automatically created and connected to one of the slots of the
> auto-created dmi-to-pci-bridge controller; all guest PCI devices with
> addresses that are auto-determined by libvirt will be placed on this
> pci-bridge device. ( since 1.1.2 ).

In our case, a dmi-to-pci-bridge should never be added; the
pcie-to-pci-bridge device should take preference (which is not tied to
slot 0x1e).

Comment 17 Richard W.M. Jones 2021-12-21 10:36:52 UTC
(In reply to Laszlo Ersek from comment #14)
> Hi Ming Xie,
> 
> can you please collect & attach the following info:
> 
> (1) in the functional (pre-regression) case, please start "guestfish",
> launch the appliance (using the libvirt backend), and then use "virsh
> dumpxml" to collect the complete domain XML for the running appliance.

You'll need to use:

  guestfish --network

or:

  guestfish
  ><fs> set-network true

The addition of SLIRP isn't the default.  Virt-v2v sets it here:
https://github.com/libguestfs/virt-v2v/blob/702a511b7f3379102ec5d267a7a43bdd47f3e594/convert/convert.ml#L61
(note also the comment - I wonder if that is still true?)

> Either way, we can likely hack this around by specifying the following PCI B/D/F for our NIC:
>  virtio-net-pci,netdev=usernet,addr=1e.0

Hack sounds good for now.  I wonder however if libvirt does in fact now
support setting the net= option now?  According to this, it seems as if
it does?
https://libvirt.org/formatdomain.html#userspace-slirp-stack

That would solve the problem by letting libvirt deal with PCI assignment.

Also I wonder if this same problem happens with LIBGUESTFS_BACKEND=direct.

Also I wonder why we didn't see this in Fedora.  Maybe because Fedora has
only just been upgraded to qemu 6.2.

Comment 19 Richard W.M. Jones 2021-12-21 11:11:19 UTC
FWIW I am able to reproduce this bug so if you send the patch to me I can
test it.

Broken with:
libvirt-libs-7.10.0-1.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64
libguestfs-1.46.0-4.el9.x86_64

One line reproducer:

$ guestfish --network run 
libguestfs: error: could not create appliance through libvirt.

Try running qemu directly without libvirt using this environment variable:
export LIBGUESTFS_BACKEND=direct

Original error from libvirt: internal error: qemu unexpectedly closed the monitor: 2021-12-21T11:11:39.218136Z qemu-kvm: -device {"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x2"}: PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-net-pci [code=1 int1=-1]

Comment 20 mxie@redhat.com 2021-12-21 11:38:48 UTC
(In reply to Laszlo Ersek from comment #14)
> Hi Ming Xie,
> 
> can you please collect & attach the following info:
> 
> (1) in the functional (pre-regression) case, please start "guestfish",
> launch the appliance (using the libvirt backend), and then use "virsh
> dumpxml" to collect the complete domain XML for the running appliance.

Please check 'reply-for-1.log'

> (2) In the same state, please run the following command from a different
> terminal (as the same user):
> 
> virsh -c LIBVIRTD_URI qemu-monitor-command APPLIANCE_NAME --hmp info qtree >
> qtree.txt
> 
> Customize LIVIRTD_URI and APPLIANCE_NAME as necessary. Please attach the
> "qtree.txt" file.

Please check 'reply-for-2.log'

> (3) This is a more difficult step:
> 
> (3a) With the broken (post-regression) setup, please take the domain XML
> from the virt-v2v log (similar to the one I highlighted in comment 5).
> 
> (3b) Once you have that domain XML -- with a domain name like
> "guestfs-o8v8wa77m4o10bq3" in it --, please run an explicit "virsh define"
> command on it.
> 
> (3c) With the domain permanently defined, please run "virsh dumpxml" on the
> domain, so that we can see the PCI addresses assigned by libvirtd. We'll
> compare it with the output of step (1).
> 
> Thanks!

Please check 'reply-for-3.log'

Comment 25 Laszlo Ersek 2021-12-22 14:33:15 UTC
(In reply to Richard W.M. Jones from comment #17)

> I wonder however if libvirt does in fact now
> support setting the net= option now?  According to this, it seems as if
> it does?
> https://libvirt.org/formatdomain.html#userspace-slirp-stack
> 
> That would solve the problem by letting libvirt deal with PCI assignment.

This seems like the best option, by far. I'll try to cook up something this week.

Comment 26 Laszlo Ersek 2021-12-22 14:58:18 UTC
Regarding Ming Xie's logs from comment 20 -- and thanks for those --, they are interesting:

(1) In the working case, the completed domain XML contains the following PCI addresses:

virtio-scsi-pci:   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
virtio-serial-pci: <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
virtio-rng-pci:    <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>

This is consistent with the qtree dumped for my question (2):

      dev: virtio-rng-pci, id "rng0"
        addr = 04.0
      dev: virtio-serial-pci, id "virtio-serial0"
        addr = 03.0
      dev: virtio-scsi-pci, id "scsi0"
        addr = 02.0

Importantly, the domain XML (1) does *not* contain the virtio-net-pci device added via <qemu:commandline>, and this is confirmed by the qtree dump (2) as well. In other words, I must say that the symptom could be absent in the working case because we simply don't trigger the address conflict, due to *not* adding the virtio-net-pci device manually. Note that <qemu:commandline> still exists in (1), but only as follows:

  <qemu:commandline>
    <qemu:env name='TMPDIR' value='/var/tmp'/>
  </qemu:commandline>

This comes from "lib/launch-libvirt.c" just the same -- which seems consistent with my impression that when we avoid the conflict, we do so because we don't even try to add a virtio-net-pci device manaully.

(3) In the broken case, the completed domain XML assigns the following addresses:

virtio-scsi-pci:   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
virtio-serial-pci: <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
virtio-rng-pci:    <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>

Which are *identical* to case (1). However, in case (3), we also have:
  <qemu:commandline>
    <qemu:arg value='-netdev'/>
    <qemu:arg value='user,id=usernet,net=169.254.0.0/16'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='virtio-net-pci,netdev=usernet'/>
    <qemu:env name='TMPDIR' value='/var/tmp'/>
  </qemu:commandline>

The conflict is that the manually provided virtio-net-pci takes address 02.0, per QEMU's automatic address assignment, and then the libvirt-provided virtio-scsi-pci cannot get the same address ("PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-net-pci").


My verdict (which I find hard to believe, myself!) is that the address assignment in libvirt has not changed; the regression is that the "enable_network" flag in the daemon now defaults to "true", whereas it used to default to "false". I have no other explanation.

(I note that in case (1), the "--network" option was not passed to guestfish; just check the attachment in comment 21 -- but my understanding is that it's *also* not passed in case (3)!)

Comment 27 Laszlo Ersek 2021-12-22 15:03:46 UTC
Ming Xie, can you please repeat step (1) from comment 14, but pass "--network" to "guestfish"? I'm worried that we didn't test the actual case that regressed. Thanks.

Comment 28 Laszlo Ersek 2021-12-22 15:07:46 UTC
Inexplicable: virt-v2v has unconditionally set

  g#set_network true

ever since commit 5d22d60a7ca4 ("New tool: virt-v2v.", 2014-05-15). :/

Comment 29 mxie@redhat.com 2021-12-23 07:48:45 UTC
(In reply to Laszlo Ersek from comment #27)
> Ming Xie, can you please repeat step (1) from comment 14, but pass
> "--network" to "guestfish"? I'm worried that we didn't test the actual case
> that regressed. Thanks.

will meet the same error with bug if adding --network to guestfish
# LIBGUESTFS_BACKEND=libvirt guestfish --ro -a /home/esx7.0-win2019-x86_64/esx7.0-win2019-x86_64-1.vmdk --network

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
libguestfs: error: could not create appliance through libvirt.

Try running qemu directly without libvirt using this environment variable:
export LIBGUESTFS_BACKEND=direct

Original error from libvirt: internal error: qemu unexpectedly closed the monitor: 2021-12-23T07:47:23.615897Z qemu-kvm: -device {"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x2"}: PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-net-pci [code=1 int1=-1]
><fs>

Comment 30 Laszlo Ersek 2021-12-23 08:20:13 UTC
Hi Ming Xie,

thanks for the repeated test.

However, given that the issue reproduces with "guestfish --network", using the *old* components, I don't understand how this qualifies as a regression.

Comment 31 Richard W.M. Jones 2021-12-23 08:26:17 UTC
> My verdict (which I find hard to believe, myself!) is that the address assignment
> in libvirt has not changed; the regression is that the "enable_network" flag in
> the daemon now defaults to "true", whereas it used to default to "false". I have
> no other explanation.

I'm not sure I follow?  The flag defaults to false:

$ guestfish get-network
false
$ rpm -qf /usr/bin/guestfish 
libguestfs-1.46.0-4.el9.x86_64

It all works without the network:

$ guestfish run
$ echo $?
0

> However, given that the issue reproduces with "guestfish --network", using the
> *old* components, I don't understand how this qualifies as a regression.

It's a regression in something (qemu?)  We might end up having to fix it
in libguestfs though.

Anyhow if you have a patch I can (still!) test it.

Comment 32 Laszlo Ersek 2021-12-23 10:43:38 UTC
Let me rephrase.

I don't understand how this is a regression. According to comment#0, the *sole* factor for triggering the symptom is a qemu upgrade. Downgrading qemu, while keeping everything else unchanged, hides the symptom.

In this use case, libguestfs is launched by virt-v2v, and virt-v2v has always enabled networking. QEMU has not changed the PCI address assignment. Libvirt has not changed the PCI address assignment. So I don't understand *what* changed, so that we see this bug only now. We should have seen it *for ages* when launching libguestfs through virt-v2v. It is a bug alright, I just don't understand "why now", and I'm not satisfied with the proposed trigger being a QEMU update. In order to call this a regression, we'd need to narrow down the symptom to a *single-package* "yum upgrade" command.

Anyway, I've posted the upstream series here:

[PATCH 0/3] resolve conflict between manual and libvirt-assigned PCI addresses
Message-Id: <20211223103701.12702-1-lersek@redhat.com>
https://listman.redhat.com/archives/libguestfs/2021-December/msg00228.html

I'll go ahead and build a new scratch package for Virt-QE too (assuming the RHEL9 Brew root is again functional).

Thanks!
Laszlo

Comment 35 Richard W.M. Jones 2021-12-23 11:33:13 UTC
I ACKed the patch.  Do you want to do the build?  If it's urgent and you can't
do it then I could do it today, but would prefer to wait until Jan.  There is
a workaround (downgrading to qemu 6.1) but that's not too nice because we
ought to be testing if there are any other problems with 6.2.

Comment 36 YongkuiGuo 2021-12-23 12:22:22 UTC
I tested the scratch build (https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=42065775) and it works.

# rpm -q libguestfs qemu-kvm
libguestfs-1.46.1-1.bz2034160.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64

# guestfish --network run
# echo $?
0

Comment 37 Laszlo Ersek 2021-12-23 12:26:27 UTC
(In reply to Laszlo Ersek from comment #32)

> [PATCH 0/3] resolve conflict between manual and libvirt-assigned PCI addresses
> Message-Id: <20211223103701.12702-1-lersek@redhat.com>
> https://listman.redhat.com/archives/libguestfs/2021-December/msg00228.html

Merged as upstream commit range 4af6d68e2d8b..5858c2cf6c24.

Comment 38 Laszlo Ersek 2021-12-23 12:42:47 UTC
(In reply to Richard W.M. Jones from comment #35)
> I ACKed the patch.  Do you want to do the build?  If it's urgent and
> you can't do it then I could do it today, but would prefer to wait
> until Jan.  There is a workaround (downgrading to qemu 6.1) but that's
> not too nice because we ought to be testing if there are any other
> problems with 6.2.

I'll get on it now.

At the moment, the rhel-9.0.0 and master branches diverge after commit
e2f8db27d0af ("Go bindings: fix "C array of strings" -- char** --
allocation", 2021-09-27). Comparing these branches:

$ git range-diff --color e2f8db27d0af..master e2f8db27d0af..rhel-9.0.0

We get:

 2:  63c9cd933af7 =  1:  aa78dc8ece1a m4/guestfs-ocaml.m4: Fix deprecated warning format
 3:  7915938b8e06 =  2:  fc921c6515aa build: fix typo in "--enable-werror" help string
 4:  98ed0243e773 =  3:  421ab66c8c55 lib/proto: suppress "may be used uninitialized" in send_file_complete()
 5:  e597fc5317e0 =  4:  4113eca69ce3 daemon/yara: fix undefined behavior due to Yara 4.0 API changes
 6:  54187b7f9877 =  5:  12414ef7058d build: fix the pkg-config identifier of the (optional) Yara library
 7:  4daec34a01b8 =  6:  56172a193c0f build: eliminate the AC_CHECK_LIB / AC_CHECK_HEADER tests for Yara
 8:  f34bd6b12f85 =  7:  ecf444976221 build, docs: spell out minimum version (4.0.0) for the (optional) Yara lib
10:  1834f19d2067 =  8:  637f193189f4 rust: Wire up make clean so it runs cargo clean
11:  760d11ecfad1 =  9:  c2c7dfd66b19 rust: Use distclean to clean cache rather than make clean
12:  b536c61a6df3 = 10:  9a5c5ac5bd9f m4: Remove test for OCaml Bytes module
13:  a69cde79ca42 = 11:  2ecccfce384e daemon: Replace "noalloc" with [@@noalloc]
14:  60e9232f4e33 = 12:  3c41324ef53f Move minimum OCaml version to 4.04.
15:  5da2b9f1306c = 13:  768658033877 tests/gdisk/test-expand-gpt.pl: Fix some warnings
16:  30a3c72d512f = 14:  b0c69b6ebbb4 tests/gdisk/test-expand-gpt.pl: Don't hide error message from qemu-img resize
17:  4fe8df48a723 = 15:  c6082b97ee14 tests/gdisk/test-expand-gpt.pl: Don't race with other tests
18:  9fda9110e6a7 = 16:  9f7695e53962 Update common submodule
19:  e7f72ab146b9 = 17:  07049b2a1fda xfs: Document lazy-counters setting cannot be changed in XFS version 5
20:  0ab930505542 = 18:  e8e2f417c96a daemon/mkfs: disable creation of fake MBR partition table with "mkfs.fat"
21:  e3671362afc6 = 19:  e589faf92ae8 daemon/9p: fix wrong pathname in error message
22:  c33c2a1d1310 = 20:  08856ccda21f daemon/parted: simplify print_partition_table() prototype
23:  edfebee4046a = 21:  0efccea84456 daemon/parted: work around part table type misreporting by "parted"
24:  d829f9ff9ae0 = 22:  7b5e00f03638 daemon/listfs: don't call "sgdisk -i" on bogus MBR partition table entry
25:  90eb3a418446 = 23:  6c05d666b804 lib, lua: Fix usage of strerror_r
26:  d6773c102d45 ! 24:  e8c7b5f4ce21 Version 1.47.1.
    @@ -1,17 +1,15 @@
     Author: Richard W.M. Jones <rjones@redhat.com>
     
    -    Version 1.47.1.
    +    Version 1.46.1.
     
     diff --git a/configure.ac b/configure.ac
     --- a/configure.ac
     +++ b/configure.ac
     @@
    - # add extra information using --with-extra="..." which may be any
      # freeform string.
      m4_define([libguestfs_major],   [1])
    --m4_define([libguestfs_minor],   [46])
    + m4_define([libguestfs_minor],   [46])
     -m4_define([libguestfs_release], [0])
    -+m4_define([libguestfs_minor],   [47])
     +m4_define([libguestfs_release], [1])
      
      AC_INIT([libguestfs],libguestfs_major.libguestfs_minor.libguestfs_release)
    @@ -24,7 +22,7 @@
      # in the form <version> <date>.  If you update the version field (in
      # configure.ac) you must also add the current date to this file.
      
    -+1.47.1    2021-12-09
    ++1.46.1    2021-12-09
      1.46.0    2021-09-23
      1.45.7    2021-08-31
      1.45.6    2021-05-27
    @@ -37,7 +35,7 @@
      msgid ""
      msgstr ""
     -"Project-Id-Version: libguestfs 1.46.0\n"
    -+"Project-Id-Version: libguestfs 1.47.1\n"
    ++"Project-Id-Version: libguestfs 1.46.1\n"
      "Report-Msgid-Bugs-To: https://bugzilla.redhat.com/enter_bug.cgi?"
      "component=libguestfs&product=Virtualization+Tools\n"
     -"POT-Creation-Date: 2021-09-23 14:41+0100\n"
 1:  3f6f2fb8f699 = 25:  336ecfab3bb1 daemon/inspect_fs_unix: recognize modern Pardus GNU/Linux releases
 9:  305b02e7e74a ! 26:  3db4dd1804b7 daemon: inspection: Add support for Kylin (RHBZ#1995391).
    @@ -6,6 +6,7 @@
         Signed-off-by: Laszlo Ersek <lersek@redhat.com>
         Message-Id: <20211013163023.21786-1-lersek@redhat.com>
         Acked-by: Richard W.M. Jones <rjones@redhat.com>
    +    (cherry picked from commit 305b02e7e74afc3777b2291783cd7634fb76ecaf)
     
     diff --git a/daemon/inspect_types.mli b/daemon/inspect_types.mli
     --- a/daemon/inspect_types.mli
27:  631962c0e88a <  -:  ------------ Add detection support for Rocky Linux (CentOS/RHEL-like)
28:  1e60550c2a57 <  -:  ------------ Update common submodule
29:  b64e9bffc1b8 <  -:  ------------ generator: Replace more "noalloc" with [@@noalloc]
30:  0d05a229f3f4 <  -:  ------------ customize: Suppress OCaml warning
31:  194159358557 <  -:  ------------ Disable OCaml warning 6 completely
32:  c6b25262ee25 <  -:  ------------ valgrind: Update suppressions list
33:  4af6d68e2d8b <  -:  ------------ fish: Avoid valgrind test from creating fish/.cache
34:  5ce5ef6a97a5 <  -:  ------------ launch-libvirt: place our virtio-net-pci device in slot 0x1e
35:  216de164e091 <  -:  ------------ lib: extract NETWORK_ADDRESS and NETWORK_PREFIX as macros
36:  5858c2cf6c24 <  -:  ------------ launch-libvirt: add virtio-net via the standard <interface> element
 -:  ------------ > 27:  4ac1f94cd17b RHEL: Remove libguestfs live (RHBZ#798980).
 -:  ------------ > 28:  f0dce483f0d4 RHEL: Remove 9p APIs from RHEL (RHBZ#921710).
 -:  ------------ > 29:  bf090c70aae8 RHEL: Disable unsupported remote drive protocols (RHBZ#962113).
 -:  ------------ > 30:  fca6730bb7f4 RHEL: Remove User-Mode Linux (RHBZ#1144197).
 -:  ------------ > 31:  ce2e49114bde RHEL: Reject use of libguestfs-winsupport features except for virt-* tools (RHBZ#1240276).
 -:  ------------ > 32:  bf0d0053627d RHEL: Create /etc/crypto-policies/back-ends/opensslcnf.config

It looks like, if I did a rebase now, we'd get these "extra" patches:

27:  631962c0e88a <  -:  ------------ Add detection support for Rocky Linux (CentOS/RHEL-like)
28:  1e60550c2a57 <  -:  ------------ Update common submodule
29:  b64e9bffc1b8 <  -:  ------------ generator: Replace more "noalloc" with [@@noalloc]
30:  0d05a229f3f4 <  -:  ------------ customize: Suppress OCaml warning
31:  194159358557 <  -:  ------------ Disable OCaml warning 6 completely
32:  c6b25262ee25 <  -:  ------------ valgrind: Update suppressions list
33:  4af6d68e2d8b <  -:  ------------ fish: Avoid valgrind test from creating fish/.cache

*plus* I'm unsure about the version difference in:

26:  d6773c102d45 ! 24:  e8c7b5f4ce21 Version 1.47.1.

I think I'll go with a cherry-pick, for now.

Comment 41 Richard W.M. Jones 2021-12-24 08:26:10 UTC
I wish I understood why this *doesn't* break in Fedora even though
it also has qemu 6.2.

Comment 43 Richard W.M. Jones 2021-12-24 09:37:56 UTC
Bisection points to:

5dacda5167560b3af8eadbce5814f60ba44b467e is the first bad commit
commit 5dacda5167560b3af8eadbce5814f60ba44b467e
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Fri Oct 8 15:34:42 2021 +0200

    vl: Enable JSON syntax for -device
    
    Like we already do for -object, introduce support for JSON syntax in
    -device, which can be kept stable in the long term and guarantees that a
    single code path with identical behaviour is used for both QMP and the
    command line. Compared to the QemuOpts based code, the parser contains
    less surprises and has support for non-scalar options (lists and
    structs). Switching management tools to JSON means that we can more
    easily change the "human" CLI syntax from QemuOpts to the keyval parser
    later.
    
    In the QAPI schema, a feature flag is added to the device-add command to
    allow management tools to detect support for this.
    
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Message-Id: <20211008133442.141332-16-kwolf@redhat.com>
    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
    Tested-by: Peter Krempa <pkrempa@redhat.com>
    Signed-off-by: Kevin Wolf <kwolf@redhat.com>

 qapi/qdev.json | 15 ++++++++++----
 softmmu/vl.c   | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 67 insertions(+), 11 deletions(-)

Of course it's not actually this commit which breaks anything.  This commit
interacts with the following libvirt commit:

commit c9b13e05570d07addb4bfb86c5baf373064842e0
Author: Peter Krempa <pkrempa@redhat.com>
Date:   Fri Oct 15 11:42:19 2021 +0200

    qemu: Use JSON directly for '-device'
    
    Starting with QEMU-6.2 started accepting a JSON object as argument for
    '-device' which will also become the only syntax considered stable by
    qemu in the future.
[...]

which explains why it didn't break on Fedora.  I was using libvirt 7.7 which
doesn't have this change.  Upgrading to libvirt-libs-7.10.0-1.fc36.x86_64
reproduces the same behaviour as RHEL 9.

Comment 44 Laszlo Ersek 2021-12-24 10:08:17 UTC
Great!

I've just reproduced it on RHEL9; the QEMU command line is

TMPDIR=/var/tmp \
/usr/libexec/qemu-kvm \
-name guest=test,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-8-test/master-key.aes"}' \
-machine pc-i440fx-rhel7.6.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,graphics=off \
-accel tcg \
-cpu max \
-m 1280 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":1342177280}' \
-overcommit mem-lock=off \
-smp 1,sockets=1,cores=1,threads=1 \
-uuid 0f6ba0b7-fd80-4d90-a57b-86849b0ba57c \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-no-acpi \
-boot strict=on \
-kernel /var/tmp/.guestfs-0/appliance.d/kernel \
-initrd /var/tmp/.guestfs-0/appliance.d/initrd \
-append 'panic=1 console=ttyS0 edd=off udevtimeout=6000 udev.event-timeout=6000 no_timer_check lpj=2793542 printk.time=1 cgroup_disable=memory usbcore.nousb cryptomgr.notests tsc=reliable 8250.nr_uarts=1 root=UUID=fdacb1db-d352-48f7-b554-a5e4e3fe333f selinux=0 guestfs_verbose=1 guestfs_network=1 TERM=xterm' \
-device '{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x2"}' \
-device '{"driver":"virtio-serial-pci","id":"virtio-serial0","bus":"pci.0","addr":"0x3"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/test1.img","node-name":"libvirt-1-storage","cache":{"direct":false,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":false,"no-flush":false},"driver":"raw","file":"libvirt-1-storage"}' \
-device '{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-1-format","id":"scsi0-0-0-0","bootindex":1,"write-cache":"on"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0"}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \
-device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","bus":"pci.0","addr":"0x4"}' \
-netdev user,id=usernet,net=169.254.0.0/16 \
-device virtio-net-pci,netdev=usernet \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on

Here's my suspicion: the libvirt and QEMU commits noted by Rich in comment 43 change the *order* in which QEMU parses the -device options. Before, the "-device virtio-net-pci,netdev=usernet" option from libguestfs was processed *last*, so QEMU's auto-assignment simply picked the first free slot. Now, the same option from libguestfs is processed *first* (getting slot 2 auto-assigned), and then the elaborate JSON-formatted option:

  -device '{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x2"}' \

conflicts.

Comment 45 Laszlo Ersek 2021-12-24 10:17:41 UTC
BTW I do think this is a QEMU bug. A user could create such a command line manually. Previously, if you had three -device options with addr=... spelled out, and then a fourth (last on the command line) -device option without addr=... spelled out, a free slot would be assigned. Now, if you keep the same order of -device options, just reformat the first three as JSON, QEMU will not start.

Simple reproducer (should even work with upstream 6.2):

/usr/libexec/qemu-kvm -nodefaults \
  -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=02.0 \
  -device virtio-scsi-pci,id=scsi1,bus=pci.0

(this runs!)

versus:

/usr/libexec/qemu-kvm -nodefaults \
  -device '{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x02.0"}' \
  -device virtio-scsi-pci,id=scsi1,bus=pci.0

this breaks with:

qemu-kvm: -device {"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.0","addr":"0x02.0"}: PCI: slot 2 function 0 not available for virtio-scsi-pci, in use by virtio-scsi-pci

and all I did was reformulate the "scsi0" device as JSON.

Comment 47 YongkuiGuo 2021-12-24 11:46:42 UTC
Test with packages:
libguestfs-1.46.1-2.el9.x86_64
qemu-kvm-6.2.0-1.el9.x86_64


Steps:

1. On rhel9 host
$ guestfish --network run
$ echo $?
0

2.
# virt-v2v -ic vpx://root@10.73.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.2 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78 esx7.0-win2022-preview-x86_64 -ip /home/passwd
[   2.0] Opening the source
[   7.5] Inspecting the source
[  12.4] Checking for sufficient free disk space in the guest
[  12.4] Converting Windows Server 2022 Datacenter to run on KVM
virt-v2v: This guest has virtio drivers installed.
[  23.2] Mapping filesystem data to avoid copying unused and blank areas
[  24.7] Closing the overlay
[  25.0] Assigning disks to buses
[  25.0] Checking if the guest needs BIOS or UEFI to boot
[  26.9] Copying disk 1/1
█ 100% [****************************************]
[ 161.3] Creating output metadata
[ 161.3] Finishing off

It all works.

Comment 51 YongkuiGuo 2022-01-20 03:22:04 UTC
Verified this bug since there was no this issue when running libguestfs autotest with the latest RHEL9 compose.

Comment 54 errata-xmlrpc 2022-05-17 12:28:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (new packages: libguestfs), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2317


Note You need to log in before you can comment on or make changes to this bug.