Description of problem: Recently we upgrade our libvirt from 0.9.12 to 1.1.4, there are serval guests running on the hypervisor, after the upgrading, the `old` running guests(created at 0.9.12) can't attach virtio disk device with 'internal error: Device alias was not set for PCI controller with index 0 required for device at address 0000:00:1e.0' I dump the guest's xml in 0.9.12 and 1.1.4, and find the difference of pci controller, there doesn't has pci controller in 0.9.12, but it exists in 1.1.4: 0.9.12 xml: <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> 1.1.4 xml: <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> and I check the code of libvirt 1.1.4, I find a commit adds a checking for pci alias, if the alias is not existing, the attachment will be failed. the commit is 01b8812765f17d1a2592bcec2708315f136fb611. Version-Release number of selected component (if applicable): upgrade libvirt from 0.9.12 to 1.1.4 How reproducible: Steps to Reproduce: 1. create a guest under libvirt 0.9.12 2. upgrade libvirt to 1.1.4 3. attach a virtio disk device Actual results: internal error: Device alias was not set for PCI controller with index 0 required for device at address 0000:00:1e.0 Expected results: disk attached successfully Additional info: If we reboot(destroy and start it by virsh) the guest, the attachment will be successful.
the xml used for attaching disk: <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/opt/zero.raw'/> <target dev='vdb' bus='virtio'/> </disk>
Interesting problem. Reproducing it requires: 1) you have a guest that is running. 2) *while the guest is running*, you upgrade libvirt from something pre-1.0.5 (when we began explicitly describing a pci-root controller in the domain XML) to 1.1.2 or beyond (when we switch from hard-coding the pci controller alias in the commandline to using the one that was generated and stored in the XML). 3) You attempt to hotplug a new PCI device after the upgrade of libvirt in (2), but prior to shutting down/restarting the guest. I'm fairly certain (but not 100%, as I don't have a proper setup handy to try it) that this problem is avoided when saving a domain to disk / restoring it after upgrade, or when migrating the guest from a host with older libvirt to newer libvirt (this is because the alias names are recreated any time the qemu process is re-started, which happens during any save/restore cycle or migrate). A reasonable way to fix this is to auto-add a "pci.0" alias when one isn't found rather than generating an error. I'll work on a patch and post it upstream.
Ugh. It's not that simple. There is also the case where someone is upgrading from a libvirt somewhere between 1.0.2 and 1.1.2 (so the XML says that the alias for the PCI controller is "pciN", but libvirt actually used "pci.N". (It is this bug which commit 01b8812 fixed). In this case we would attempt to attach the new device to the bus named "pciN", which wouldn't exist, and fail at a later stage. In order to avoid that, we would need to not only look for missing aliases, but also for aliases which didn't fit the pattern; I don't really like the idea of hard-coding a version-specific fixup like that which could one day come back to haunt us (what if we later decided to use "pciN" for some odd reason?). So, in the name of keeping the code clean, should we consider this as an itinerant bug that can be worked around by saving and then restoring (or migrating) the guest (and will eventually no longer be encountered, as soon as everybody has upgraded to at least 1.1.2)? Or do we need to permanently clutter the code with hard-coded checks for missing / "pciN" aliases and correct them to "pci.N"? Wangpan - can you verify that doing a save/restore of the guest after upgrade also eliminates the error? If that's the case, then I'm inclined to just leave the code as-is.
I sent this patch, which should solve the problem, upstream: https://www.redhat.com/archives/libvir-list/2014-January/msg00265.html Due to the presence of a reasonable workaround, and the transient nature of the problem, 'm not convinced that we should apply that fix, but thought it would be proper to at least send the patch to allow for discussion.
There was one agreement upstream to not push the patch, and no disagreement, so I'm closing as WONTFIX. In the meantime, if anyone encounters the problem and does a google search for the error message, they will be lead both to this bugzilla report and to the patch, both of which point out the workaround of doing a save/restore of the guest.
I have tested Laine's patch, and my issue has been resoloved, the disk is attached successfully. thanks! and this issue also can be resoloved by save/restore the guest by virsh command firstly after the upgrading.