Bug 1004593

Summary: libvirt should provide a more useful error message when a PCI controller is configured to plug into itself (bus = index)
Product: Red Hat Enterprise Linux 7 Reporter: EricLee <bili>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: dyuan, eskultet, jtomko, laine, mzhan, rbalakri, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-1.3.4-1.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-03 18:06:36 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description EricLee 2013-09-05 02:21:49 UTC
Description of problem:
libvirt should keep the "bus" of the pci-bridge matches the "index" of the dmi-to-pci-bridge, and should deny the operation to change them to not match.

Version-Release number of selected component (if applicable):
qemu-kvm-1.5.3-2.el7.x86_64
libvirt-1.1.1-3.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. define a guest will machine='q35'.
2. edit guest xml:
from 
"    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
"
to 
"    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>                       --------- the bus id not match dmi-to-pci-bridge index id.
    </controller>
"

and save.

3. start guest.

Actual results:
2. save successfully
3. start failed with error:
# virsh start test1
error: Failed to start domain test1
error: Unable to read from monitor: Connection reset by peer
and there is log in /var/log/libvirt/qemu/test1.log:
qemu-kvm: -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.2,addr=0x2: Bus 'pci.2' not found

Expected results:
2. save failed and promote error like:
error: XML error: the bus of the pci-bridge should matches the index of the dmi-to-pci-bridge
Failed. Try again? [y,n,f,?]:

Additional info:

Comment 2 Laine Stump 2013-09-06 09:40:11 UTC
You misunderstood my comment in https://bugzilla.redhat.com/show_bug.cgi?id=819968#c17 slightly.

In the case that the indexes and bus addresses of these items are *automatically assigned*, libvirt should make the assignment such that the index of the auto-added dmi-to-pci-bridge matches the bus of the auto-added pci-bridge. Your test case is manually setting these to not match, and in that case the best libvirt can do is log an error.

The real problem that needs to be reported in this case is that the controller is being connected to itself, which obviously can't work.

I *think* what should be done is to check that the bus in the pci address of each controller is <= the index of that same controller. This would be a simple way to assure that there would never be any loops in the graph of PCI controller connections.

Does anyone see any problems with this restriction? (theoretically speaking it would be okay to have a controller with  index "n" specify an address with bus < "n" as long as it didn't result in any loops, but that would be more difficult to verify).

Comment 3 EricLee 2013-09-06 11:22:32 UTC
(In reply to Laine Stump from comment #2)
> You misunderstood my comment in
> https://bugzilla.redhat.com/show_bug.cgi?id=819968#c17 slightly.
> 
> In the case that the indexes and bus addresses of these items are
> *automatically assigned*, libvirt should make the assignment such that the
> index of the auto-added dmi-to-pci-bridge matches the bus of the auto-added
> pci-bridge. Your test case is manually setting these to not match, and in
> that case the best libvirt can do is log an error.
> 
> The real problem that needs to be reported in this case is that the
> controller is being connected to itself, which obviously can't work.

You mean the bug summary should be updated? And what libvirt do is expected in my test case(from the description)?

> 
> I *think* what should be done is to check that the bus in the pci address of
> each controller is <= the index of that same controller. This would be a
> simple way to assure that there would never be any loops in the graph of PCI
> controller connections.

We only need to check the bus in the pci address of each controller(which automatically generated) is <= the index of that same controller, and should not modify the address manually? 
For example, add 100 pci-bridge controller to check that:
   <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='100' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x02' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x03' function='0x0'/>
    </controller>
....
<snip>
.....
    <controller type='pci' index='97' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='98' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x06' function='0x0'/>
    </controller>
    <controller type='pci' index='99' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x07' function='0x0'/>


> 
> Does anyone see any problems with this restriction? (theoretically speaking
> it would be okay to have a controller with  index "n" specify an address
> with bus < "n" as long as it didn't result in any loops, but that would be
> more difficult to verify).

No, for now.

Comment 4 Laine Stump 2013-09-06 13:48:00 UTC
Sorry, in my reply when I talked about what should be done, I meant what changes should be made in the code. As for testing, when the PCI addresses of pci controllers are automatically assigned, just make sure that bus != index. When the PCI addresses are manually set in the config, libvirt can't change them, but the error reporting should be better.

Yes, the summary should be changed to request a better error message when a pci controller is manually configured to have index == bus (which by definition can't work). I just did that.

Comment 5 EricLee 2013-09-11 06:32:00 UTC
(In reply to Laine Stump from comment #4)
> Sorry, in my reply when I talked about what should be done, I meant what
> changes should be made in the code. As for testing, when the PCI addresses
> of pci controllers are automatically assigned, just make sure that bus !=
> index. When the PCI addresses are manually set in the config, libvirt can't
> change them, but the error reporting should be better.
> 
> Yes, the summary should be changed to request a better error message when a
> pci controller is manually configured to have index == bus (which by
> definition can't work). I just did that.

Thanks Laine.

Comment 10 Ján Tomko 2015-05-15 11:27:16 UTC
The following commit added an error message if the manually specified PCI addresses of the domain would result in a bridge plugging to itself:
error: unsupported configuration: failed to create PCI bridge on bus 1: too many devices with fixed addresses

commit b7e6f2fc805c389e82c635ce10dde379224df92d
Author:     Erik Skultety <eskultet>
CommitDate: 2015-01-23 14:35:03 +0100

    qemu: Add check for PCI bridge placement if there are too many PCI devices
    
    Previous patch of this series fixed the issue with adding a new PCI bridge
    when all the slots were reserved by devices with user specified addresses.
    In case there are still some PCI devices waiting to get a slot reserved
    by qemuAssignDevicePCISlots, this means a new bus needs to be
    created along with a corresponding bridge controller. By adding an
    additional check, this scenario now results in a reasonable error
    instead of generating wrong qemu command line.

git describe: v1.2.12-rc1-22-gb7e6f2f contains: v1.2.12-rc2~1

Comment 11 Hu Jianwei 2015-05-18 02:00:54 UTC
I found last mentioned patch has been merged into below version,please double check it.

[root@localhost ~]# rpm -q libvirt 
libvirt-1.2.15-1.el7.x86_64

1. add below wrong pci bridge controller, the index is equal to bus of itself
    <controller type='pci' index='2' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </controller>
[root@localhost ~]# virsh edit r71
error: unsupported configuration: failed to create PCI bridge on bus 2: too many devices with fixed addresses
Failed. Try again? [y,n,i,f,?]:

2. add below wrong bridge, the index is less than its bus 
    <controller type='pci' index='3' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x01' function='0x0'/>
    </controller>

[root@localhost ~]# virsh edit r71
error: unsupported configuration: failed to create PCI bridge on bus 4: too many devices with fixed addresses
Failed. Try again? [y,n,i,f,?]: 

Thanks.

Comment 12 Laine Stump 2015-07-22 18:17:15 UTC
> error: unsupported configuration: failed to create PCI bridge on bus 4: too
> many devices with fixed addresses
> Failed. Try again? [y,n,i,f,?]: 

I think that check is in the wrong place (or rather, the error message logged in response to that check is incorrect)

The problem is that at that point (qemuDomainAssignPCIAddresses()) it's not possible to tell if the "bad" address was one manually assigned by the user or one automatically assigned by libvirt. We should instead be carrying the controller's index all the way down into virDomainPCIAddressGetNextSlot(), and only allowing auto-assignment of a slot in a bus < index in that function - if it's unable to find one, that is when it should log the "too many fixed addresses" error.

The error message that is logged in qemuDomainAssignPCIAddresses() should instead be something like "PCI controller bus (%d) must be <= its own index (%d)".

I unfortunately don't have time to do that right now.

Comment 15 Laine Stump 2016-04-21 18:49:41 UTC
I just posted a patch upstream to correct the error message:

https://www.redhat.com/archives/libvir-list/2016-April/msg01528.html

Comment 16 Laine Stump 2016-04-25 14:41:38 UTC
Pushed upstream:

commit ff2126225df019566e4e580d92e69e9df3ca96c0
Author: Laine Stump <laine>
Date:   Tue Apr 19 12:53:02 2016 -0400

    qemu: fix error log in qemuAssignPCIAddresses()

Comment 18 Yang Yang 2016-08-09 09:31:20 UTC
Verified with libvirt-2.0.0-4.el7.x86_64


# virsh edit vm1

<controller type='pci' index='11' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='0x10' slot='0x0a' function='0x0'/>
    </controller>

error: unsupported configuration: PCI controller at index 11 (0x0b) has bus='0x10', but index must be larger than bus
Failed. Try again? [y,n,i,f,?]:


# virsh edit vm1

<controller type='pci' index='11' model='pci-bridge'>
      <address type='pci' domain='0x0000' bus='11' slot='0x0a' function='0x0'/>
    </controller>

error: unsupported configuration: PCI controller at index 11 (0x0b) has bus='0x0b', but index must be larger than bus
Failed. Try again? [y,n,i,f,?]:

Comment 20 errata-xmlrpc 2016-11-03 18:06:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2016-2577.html