Bug 1479674

Summary: Start vm after remove some vPHBs will fail at first try
Product: Red Hat Enterprise Linux 7 Reporter: Wayne Sun <gsun>
Component: libvirtAssignee: Andrea Bolognani <abologna>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.4-AltCC: abologna, bugproxy, dgibson, dzheng, haizhao, hannsj_uhl, hhan, jsuchane, junli, rbalakri
Target Milestone: rc   
Target Release: 7.4-Alt   
Hardware: ppc64le   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-21 09:28:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1440030    

Description Wayne Sun 2017-08-09 07:20:05 UTC
Description of problem:
Start multiple vPHBs vm after edit will fail at first try 

Version-Release number of selected component (if applicable):
# rpm -q libvirt qemu-kvm kernel
libvirt-3.2.0-18.el7a.ppc64le
qemu-kvm-2.9.0-20.el7a.ppc64le
kernel-4.11.0-19.el7a.ppc64le

How reproducible:
always

Steps to Reproduce:
1. Define a guest and add pci-bridge with non-zero PCI bus
...
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

2. start vm
the vm will be started with automatically add vPHB 1 and 2 with pci-bridge on index 2 as:
... 
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

3. destroy vm and edit vm with remove vPHB index 1 and 2
# virsh edit vm2
Domain vm2 XML configuration edited.

# virsh dumpxml vm2  
...
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
    </controller>
...

the xml is auto updated with both 1 and 2 vPHB back, difference is xml updated with pci-bridge ahead of vPHB 1 and 2.

3. start vm2
# virsh start vm2                                                               
error: Failed to start domain vm2
error: internal error: qemu unexpectedly closed the monitor: 2017-08-09T10:52:30.963370Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
2017-08-09T10:52:30.994513Z qemu-kvm: -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7: Bus 'pci.2.0' not found

In qemu vm log:
2017-08-09 10:52:30.520+0000: starting up libvirt version: 3.2.0, package: 18.el7a (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2017-08-03-07:52:53, ppc-059.build.eng.bos.redhat.com), qemu version: 2.9.0(qemu-kvm-2.9.0-20.el7a), hostname: c155f1-u31
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/libexec/qemu-kvm -name guest=vm2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-vm2/master-key.aes -machine pseries-rhel7.4.0alt,accel=kvm,usb=off,dump-guest-core=off -m size=1048576k,slots=16,maxmem=2621440k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 38c3f179-fa77-473e-98fd-04e1f17c2ad7 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7 -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-25-64-clone.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e5:e5:ef,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-vm2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2017-08-09T10:52:30.963370Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/0 (label charserial0)
2017-08-09T10:52:30.994513Z qemu-kvm: -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7: Bus 'pci.2.0' not found
2017-08-09 10:52:31.262+0000: shutting down, reason=failed

4. start vm again:
# virsh start vm2
Domain vm2 started

# ps aux|grep qemu
qemu     12405  4.7  2.0 1332544 648832 ?      SLl  06:52   0:41 /usr/libexec/qemu-kvm -name guest=vm2,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-vm2/master-key.aes -machine pseries-rhel7.4.0alt,accel=kvm,usb=off,dump-guest-core=off -m size=1048576k,slots=16,maxmem=2621440k -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 38c3f179-fa77-473e-98fd-04e1f17c2ad7 -display none -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-7-vm2/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device spapr-pci-host-bridge,index=1,id=pci.1 -device spapr-pci-host-bridge,index=2,id=pci.2 -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2.0,addr=0x7 -device qemu-xhci,id=usb,bus=pci.0,addr=0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x3 -drive file=/var/lib/avocado/data/avocado-vt/images/jeos-25-64-clone.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:e5:e5:ef,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,reg=0x30000000 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-7-vm2/org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-kbd,id=input0,bus=usb.0,port=1 -device usb-mouse,id=input1,bus=usb.0,port=2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on


# virsh dumpxml vm2
...
    <controller type='pci' index='0' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='0'/>
      <alias name='pci.0'/>
    </controller>
    <controller type='pci' index='1' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='1'/>
      <alias name='pci.1'/>
    </controller>
    <controller type='pci' index='2' model='pci-root'>
      <model name='spapr-pci-host-bridge'/>
      <target index='2'/>
      <alias name='pci.2'/>
    </controller>
    <controller type='pci' index='3' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='3'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x07' function='0x0'/>
    </controller>
...

the xml is auto updated with right sequence

Actual results:
vm failed to start at first try after edit with remove vPHBs and succeed at second time

Expected results:
vm could start at first try

Additional info:

Comment 2 David Gibson 2017-08-21 04:16:46 UTC
AFAICT what's happening here is that when you run the VM with the extra vPHBs, libvirt is assigning some devices on the second vPHB, and updating the XML to give those devices explicit addresses on the second vPHB.  When it goes away, libvirt obviously can't place those devices on the second vPHB any more, hence the failure.

Comment 3 Andrea Bolognani 2017-08-21 08:12:19 UTC
(In reply to David Gibson from comment #2)
> AFAICT what's happening here is that when you run the VM with the extra
> vPHBs, libvirt is assigning some devices on the second vPHB, and updating
> the XML to give those devices explicit addresses on the second vPHB.  When
> it goes away, libvirt obviously can't place those devices on the second vPHB
> any more, hence the failure.

That's not quite what happens: as you can see (step 3 in the
description) the PHBs get re-added automatically, but for
some reason they end up after the devices rather than before
them, and QEMU can't handle having device and controller
specified in that order.

I'll look into making it so the PHBs get re-added before the
devices using them.

Comment 6 Andrea Bolognani 2017-09-05 13:26:13 UTC
Fix posted upstream.

  https://www.redhat.com/archives/libvir-list/2017-September/msg00084.html

Comment 7 Andrea Bolognani 2017-09-07 17:00:51 UTC
v2 patches posted upstream.

  https://www.redhat.com/archives/libvir-list/2017-September/msg00168.html

Comment 8 David Gibson 2017-09-14 00:55:50 UTC
Andrea, any update on getting this upstream and downstream?

Comment 9 Andrea Bolognani 2017-09-21 09:28:26 UTC
(In reply to David Gibson from comment #8)
> Andrea, any update on getting this upstream and downstream?

Sorry for taking so long to reply.

It turns out that reordering controllers can break migration[1],
so given that the problem scenario described above was kinda
convoluted to being with, I think it's better to avoid further
breakage and close the bug as CANTFIX. Doing so now.


[1] https://www.redhat.com/archives/libvir-list/2017-September/msg00734.html