Bug 567916 - RHEL5.4-x86_64-PV-guest with a assigned pci device crashed after saved it.
Summary: RHEL5.4-x86_64-PV-guest with a assigned pci device crashed after saved it.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: xen
Version: 5.4
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Michal Novotny
QA Contact: Virtualization Bugs
URL:
Whiteboard:
: 627095 (view as bug list)
Depends On:
Blocks: 514500
TreeView+ depends on / blocked
 
Reported: 2010-02-24 10:27 UTC by Lu Li
Modified: 2014-02-02 22:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 22:20:17 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
pv guest crash (211.64 KB, image/png)
2010-02-24 11:03 UTC, Lu Li
no flags Details
Implement check for save/migration of guest devices before started (5.87 KB, patch)
2010-08-27 09:07 UTC, Michal Novotny
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:0031 0 normal SHIPPED_LIVE xen bug fix and enhancement update 2011-01-12 15:59:24 UTC

Description Lu Li 2010-02-24 10:27:36 UTC
Description of problem:
Grant a pci device to pv guest, then try to save the pv guest when a pci device assigned to it, the pv guest crashes.

Version-Release number of selected component (if applicable):
xen-3.0.3-105.el5

How reproducible:
5/5

Steps to Reproduce:
1. Enable VT-d in BIOS for Intel host.
2. In grub.conf, add iommu=1 to kernel command line.
3. Load module "pciback" via modprobe in domain 0 and check.
4. Unbind pci device from native driver
   # echo -n {pci_idenfier} > /sys/bus/pci/drivers/{native_dirver}/unbind
5. Bind pci device to pciback driver
   # echo -n {pci_idenfier} > /sys/bus/pci/drivers/pciback/new_slot
   # echo -n {pci_idenfier} > /sys/bus/pci/drivers/pciback/bind
6. Check that pci device has been hidden from domain0 successfully, in domain0
   # xm pci-list-assignable-devices
   0000:19:00.0
7. Add a pci entry in the config file of the PV guest:
   pci = [ '{pci_identifier}' ]
8. Start the PV guest with the config file:
   # xm create {config_file}
9. In the PV guest, check if the pci device has been assigned to it successfully:
   # lspci
   Check if pci device works fine within the guest. 
   - eg. ping for pci network card
10. In domain0, check pci device that assigned to the guest:
   # xm pci-list $domain_id
11. Try to save this PV guest and check the status of the guest.
  
Actual results:
After step 11,
1. "xm save" prevents saving PV guest.
2. PV guest crashed after saved.

Expected results:
1. xm save should prevent you saving PV guest:
   Error: Migration not permitted with assigned PCI device.
   Usage: xm save <Domain> <CheckpointFile>
   Save a domain state to restore later.
2. The PV guest and the pci device should work fine after xm save.

Additional info:

Comment 1 Lu Li 2010-02-24 11:03:44 UTC
Created attachment 396038 [details]
pv guest crash

Details refer to the attached .png file.

Comment 4 Michal Novotny 2010-08-27 08:39:10 UTC
The issue here is that something is being done to the domain before on save/migration start and before it's being restored by recoverMigrateDevices() so it's connected to buggy working resumeDomain() operation on the PV guests since this is the path that's being met when the domain is being saved or migrated.

Moreover, this seems not to be an issue on upstream Xen 4.1 because they use the suspend cancellation in the Xen kernel itself to resume the guest so for us it's better to implement checkForMigrate() function and call the function on the save/migrate start to fail on immediately save/migration start.

I'm currently working on this patch, when patch & testing is done, I'm going to  post it as a new comment.

Michal

Comment 5 Michal Novotny 2010-08-27 09:07:28 UTC
Created attachment 441441 [details]
Implement check for save/migration of guest devices before started

The issue was there because of possible buggy implementation of resumeDomain()
function for PV guests in some cases so we either could fix it in the kernel-xen/hypervisor or in the user-space tools to make sure this code path is not being met at all by raising an exception even before anything is being done to the guest itself.

New method checkMigrateDevices() has been implemented into the XendDomainInfo class for the checking whether we can successfully save/migrate the guest or not (before the action starts). Also patching the DevController class to have the checkForMigrate() method was necessary and this method is overriden for the PCI device interface to raise an exception the save/migration for the guest with assigned PCI devices is not possible. Since there is the network variable set when doing migration and not set when doing save the message has been altered to be precise.

Michal

Comment 6 Miroslav Rezanina 2010-09-01 05:06:07 UTC
*** Bug 627095 has been marked as a duplicate of this bug. ***

Comment 11 Linqing Lu 2010-09-08 06:33:04 UTC
Verified on xen-3.0.3-116.el5, kernel-xen-2.6.18-216.el5

Comment 13 errata-xmlrpc 2011-01-13 22:20:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0031.html


Note You need to log in before you can comment on or make changes to this bug.