Description of problem: Grant a pci device to pv guest, then try to save the pv guest when a pci device assigned to it, the pv guest crashes. Version-Release number of selected component (if applicable): xen-3.0.3-105.el5 How reproducible: 5/5 Steps to Reproduce: 1. Enable VT-d in BIOS for Intel host. 2. In grub.conf, add iommu=1 to kernel command line. 3. Load module "pciback" via modprobe in domain 0 and check. 4. Unbind pci device from native driver # echo -n {pci_idenfier} > /sys/bus/pci/drivers/{native_dirver}/unbind 5. Bind pci device to pciback driver # echo -n {pci_idenfier} > /sys/bus/pci/drivers/pciback/new_slot # echo -n {pci_idenfier} > /sys/bus/pci/drivers/pciback/bind 6. Check that pci device has been hidden from domain0 successfully, in domain0 # xm pci-list-assignable-devices 0000:19:00.0 7. Add a pci entry in the config file of the PV guest: pci = [ '{pci_identifier}' ] 8. Start the PV guest with the config file: # xm create {config_file} 9. In the PV guest, check if the pci device has been assigned to it successfully: # lspci Check if pci device works fine within the guest. - eg. ping for pci network card 10. In domain0, check pci device that assigned to the guest: # xm pci-list $domain_id 11. Try to save this PV guest and check the status of the guest. Actual results: After step 11, 1. "xm save" prevents saving PV guest. 2. PV guest crashed after saved. Expected results: 1. xm save should prevent you saving PV guest: Error: Migration not permitted with assigned PCI device. Usage: xm save <Domain> <CheckpointFile> Save a domain state to restore later. 2. The PV guest and the pci device should work fine after xm save. Additional info:
Created attachment 396038 [details] pv guest crash Details refer to the attached .png file.
The issue here is that something is being done to the domain before on save/migration start and before it's being restored by recoverMigrateDevices() so it's connected to buggy working resumeDomain() operation on the PV guests since this is the path that's being met when the domain is being saved or migrated. Moreover, this seems not to be an issue on upstream Xen 4.1 because they use the suspend cancellation in the Xen kernel itself to resume the guest so for us it's better to implement checkForMigrate() function and call the function on the save/migrate start to fail on immediately save/migration start. I'm currently working on this patch, when patch & testing is done, I'm going to post it as a new comment. Michal
Created attachment 441441 [details] Implement check for save/migration of guest devices before started The issue was there because of possible buggy implementation of resumeDomain() function for PV guests in some cases so we either could fix it in the kernel-xen/hypervisor or in the user-space tools to make sure this code path is not being met at all by raising an exception even before anything is being done to the guest itself. New method checkMigrateDevices() has been implemented into the XendDomainInfo class for the checking whether we can successfully save/migrate the guest or not (before the action starts). Also patching the DevController class to have the checkForMigrate() method was necessary and this method is overriden for the PCI device interface to raise an exception the save/migration for the guest with assigned PCI devices is not possible. Since there is the network variable set when doing migration and not set when doing save the message has been altered to be precise. Michal
*** Bug 627095 has been marked as a duplicate of this bug. ***
Verified on xen-3.0.3-116.el5, kernel-xen-2.6.18-216.el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0031.html