This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1479674 - Start vm after remove some vPHBs will fail at first try
Start vm after remove some vPHBs will fail at first try
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt (Show other bugs)
7.4-Alt
ppc64le Linux
medium Severity medium
: rc
: 7.4-Alt
Assigned To: Andrea Bolognani
Virtualization Bugs
:
Depends On:
Blocks: 1440030
  Show dependency treegraph
 
Reported: 2017-08-09 03:20 EDT by Wayne Sun
Modified: 2017-09-21 05:28 EDT (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-09-21 05:28:26 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 157817 None None None 2017-08-18 12:02 EDT

  None (edit)
Description Wayne Sun 2017-08-09 03:20:05 EDT
Description of problem:
Start multiple vPHBs vm after edit will fail at first try 

Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm kernel
libvirt-3.2.0-18.el7a.ppc64le
qemu-kvm-2.9.0-20.el7a.ppc64le
kernel-4.11.0-19.el7a.ppc64le

How reproducible:
always

Steps to Reproduce:
1. Define a guest and add pci-bridge with non-zero PCI bus
...
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

2. start vm
the vm will be started with automatically add vPHB 1 and 2 with pci-bridge on index 2 as:
... 
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

3. destroy vm and edit vm with remove vPHB index 1 and 2
# virsh edit vm2
Domain vm2 XML configuration edited.

# virsh dumpxml vm2  
...
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
    </controller>
...

the xml is auto updated with both 1 and 2 vPHB back, difference is xml updated with pci-bridge ahead of vPHB 1 and 2.

3. start vm2
# virsh start vm2                                                               
error: Failed to start domain vm2
error: internal error: qemu unexpectedly closed the monitor: 2017-08-09T10:52:30.963370Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
2017-08-09T10:52:30.994513Z qemu-kvm: -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7: Bus 'pci.2.0' not found

In qemu vm log:
2017-08-09 10:52:30.520+0000: starting up libvirt version: 3.2.0, package: 18.el7a (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-03-07:52:53, ppc-059.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-2.9.0-20.el7a), hostname: c155f1-u31
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=vm2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-vm2/master-key.aes -machine pseries-rhel7.4.0alt,accel=kvm,usb=off,dump-guest-core=off -m size=1048576k,slots=16,maxmem=2621440k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 38c3f179-fa77-473e-98fd-04e1f17c2ad7 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7 -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-25-64-clone.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e5:e5:ef,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-vm2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-08-09T10:52:30.963370Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
2017-08-09T10:52:30.994513Z qemu-kvm: -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7: Bus 'pci.2.0' not found
2017-08-09 10:52:31.262+0000: shutting down, reason=failed

4. start vm again:
# virsh start vm2
Domain vm2 started

# ps aux|grep qemu
qemu     12405  4.7  2.0 1332544 648832 ?      SLl  06:52   0:41 /usr/libexec/qemu-kvm -name guest=vm2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-vm2/master-key.aes -machine pseries-rhel7.4.0alt,accel=kvm,usb=off,dump-guest-core=off -m size=1048576k,slots=16,maxmem=2621440k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 38c3f179-fa77-473e-98fd-04e1f17c2ad7 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-25-64-clone.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e5:e5:ef,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-7-vm2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on


# virsh dumpxml vm2
...
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
      <alias name='pci.0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
      <alias name='pci.1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
      <alias name='pci.2'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

the xml is auto updated with right sequence

Actual results:
vm failed to start at first try after edit with remove vPHBs and succeed at second time

Expected results:
vm could start at first try

Additional info:
Comment 2 David Gibson 2017-08-21 00:16:46 EDT
AFAICT what's happening here is that when you run the VM with the extra vPHBs, libvirt is assigning some devices on the second vPHB, and updating the XML to give those devices explicit addresses on the second vPHB.  When it goes away, libvirt obviously can't place those devices on the second vPHB any more, hence the failure.
Comment 3 Andrea Bolognani 2017-08-21 04:12:19 EDT
(In reply to David Gibson from comment #2)
> AFAICT what's happening here is that when you run the VM with the extra
> vPHBs, libvirt is assigning some devices on the second vPHB, and updating
> the XML to give those devices explicit addresses on the second vPHB.  When
> it goes away, libvirt obviously can't place those devices on the second vPHB
> any more, hence the failure.

That's not quite what happens: as you can see (step 3 in the
description) the PHBs get re-added automatically, but for
some reason they end up after the devices rather than before
them, and QEMU can't handle having device and controller
specified in that order.

I'll look into making it so the PHBs get re-added before the
devices using them.
Comment 6 Andrea Bolognani 2017-09-05 09:26:13 EDT
Fix posted upstream.

  https://www.redhat.com/archives/libvir-list/2017-September/msg00084.html
Comment 7 Andrea Bolognani 2017-09-07 13:00:51 EDT
v2 patches posted upstream.

  https://www.redhat.com/archives/libvir-list/2017-September/msg00168.html
Comment 8 David Gibson 2017-09-13 20:55:50 EDT
Andrea, any update on getting this upstream and downstream?
Comment 9 Andrea Bolognani 2017-09-21 05:28:26 EDT
(In reply to David Gibson from comment #8)
> Andrea, any update on getting this upstream and downstream?

Sorry for taking so long to reply.

It turns out that reordering controllers can break migration[1],
so given that the problem scenario described above was kinda
convoluted to being with, I think it's better to avoid further
breakage and close the bug as CANTFIX. Doing so now.


[1] https://www.redhat.com/archives/libvir-list/2017-September/msg00734.html

Note You need to log in before you can comment on or make changes to this bug.