Bug 544950
Summary: | Trying to save a PV guest with a pci device assigned makes the PV guest hang there | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Yufang Zhang <yuzhang> |
Component: | xen | Assignee: | Linqing Lu <lilu> |
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.4 | CC: | clalance, ddutile, drjones, gshipley, leiwang, minovotn, mshao, xen-maint |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | xen-3.0.3-115.el5 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-01-13 22:19:47 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 514500 | ||
Attachments: |
(In reply to comment #0) > Created an attachment (id=376587) [details] > xend.log > > Description of problem: > When trying to save a PV guest with a pci device assigned to it, the PV guest > hang there without any response. > > Version-Release number of selected component (if applicable): > xen-3.0.3-94.el5 > > How reproducible: > Always > > Steps to Reproduce: > > In Domain0: > > # xm pci-list-assignable-devices > 0000:03:00.0 > > # xm cr /etc/xen/test_pv pci="0000:03:00.0" > Using config file "/etc/xen/test_pv". > file /root/pv.img > Started domain PvDomain > > # xm save PvDomain PvDomain.save > Error: Migration not permitted with assigned PCI device. > Usage: xm save <Domain> <CheckpointFile> > > Save a domain state to restore later. > > # xm li > Name ID Mem(MiB) VCPUs State Time(s) > Domain-0 0 3409 4 r----- 1646.1 > migrating-PvDomain 6 511 4 -b---- 17.0 > > > Also we got error output from within PV guest after we try to save the PV > guest: > > pcifront pci-0: pciback not responding!!! > get no response from backend for disable MSI > pcifront pci-0: pciback not responding!!! > pcifront pci-0: pciback not responding!!! > pcifront pci-0: pciback not responding!!! > pcifront pci-0: 22 freeing event channel 15 > > > Actual results: > Xen prevent you from saving PV guest, but the PV guest hang there with no > response after you trying to save it. Also the name of PV guest changed from > 'PvDomain' to 'migrating-PvDomain'. > > Expected results: > Xen prevent you from saving PV guest and the guest works fine even xm save > failed. > > Additional info: Well, could you please try with the latest version of xen package Yufang? There was some code to reconnect the backend devices and obviously the PCI backend device was not reconnected correctly. You can try latest virttest version of xen package on http://people.redhat.com/mrezanin/xen . Please let us know when testing is done, unfortunately I'm not having hardware to do PCI passthrough with. Thanks, Michal (In reply to comment #2) > (In reply to comment #0) > > Created an attachment (id=376587) [details] [details] > > xend.log > > > > Description of problem: > > When trying to save a PV guest with a pci device assigned to it, the PV guest > > hang there without any response. > > > > Version-Release number of selected component (if applicable): > > xen-3.0.3-94.el5 > > > > How reproducible: > > Always > > > > Steps to Reproduce: > > > > In Domain0: > > > > # xm pci-list-assignable-devices > > 0000:03:00.0 > > > > # xm cr /etc/xen/test_pv pci="0000:03:00.0" > > Using config file "/etc/xen/test_pv". > > file /root/pv.img > > Started domain PvDomain > > > > # xm save PvDomain PvDomain.save > > Error: Migration not permitted with assigned PCI device. > > Usage: xm save <Domain> <CheckpointFile> > > > > Save a domain state to restore later. > > > > # xm li > > Name ID Mem(MiB) VCPUs State Time(s) > > Domain-0 0 3409 4 r----- 1646.1 > > migrating-PvDomain 6 511 4 -b---- 17.0 > > > > > > Also we got error output from within PV guest after we try to save the PV > > guest: > > > > pcifront pci-0: pciback not responding!!! > > get no response from backend for disable MSI > > pcifront pci-0: pciback not responding!!! > > pcifront pci-0: pciback not responding!!! > > pcifront pci-0: pciback not responding!!! > > pcifront pci-0: 22 freeing event channel 15 > > > > > > Actual results: > > Xen prevent you from saving PV guest, but the PV guest hang there with no > > response after you trying to save it. Also the name of PV guest changed from > > 'PvDomain' to 'migrating-PvDomain'. > > > > Expected results: > > Xen prevent you from saving PV guest and the guest works fine even xm save > > failed. > > > > Additional info: > > Well, could you please try with the latest version of xen package Yufang? There > was some code to reconnect the backend devices and obviously the PCI backend > device was not reconnected correctly. You can try latest virttest version of > xen package on http://people.redhat.com/mrezanin/xen . > > Please let us know when testing is done, unfortunately I'm not having hardware > to do PCI passthrough with. > > Thanks, > Michal Hi Michal, I test this bug with the latest xen package(xen-3.0.3-114.el5), problem still exists but with different behaviour: (1) Create a RHEL5.4 PV guest with PCI device assigned (2) Try to save the PV guest Xend does give error output which says trying to save a PV guest with PCI device assigned is not permitted. But the PV guest disappeared after that. And logs for xend told us that the VM was destroyed. You could find detailed information from xend.log in next comment Created attachment 433307 [details]
xend.log for above comment
(In reply to comment #4) > Created an attachment (id=433307) [details] > xend.log for above comment Well, it looks like it's disabled according to following lines: ... [2010-07-21 14:01:26 xend 3684] INFO (pciquirk:91) NO quirks found for PCI device [8086:10b9:8086:1093] [2010-07-21 14:01:26 xend 3684] DEBUG (pciquirk:131) Permissive mode NOT enabled for PCI device [8086:10b9:8086:1093] ... This is most probably the configuration issue since according to this this is not permitted for PCI device with id 8086:10b9:8086:1093. According to the code the configuration should be in /etc/xen/xend-pci-permissive.sxp file so I *think* there should be definition like: (unconstrained_dev_ids ('8086:10b9:8086:1093') ) According to the /usr/lib64/python2.4/site-packages/xen/xend/server/pciif.py file there's a call to xc.domain_ioport_permission() function but I guess this is not right and there may be a bug that it goes there even if it's not permitted which may be the reason why the guest disappear after that. When looking at the code it should also write to /sys/bus/pci/drivers/pciback/permissive node however it doesn't have effect here. It seems there's 3 (No such process) error in the xc.domain_ioport_permission() function which makes it fail. xc_domain_ioport_permission() is a hypercall to XEN_DOMCTL_ioport_permission but I'm not wise from that. Could you please attach the full xend.log ? Thanks, Michal Things go worse when testing with latest virttest version of xen package: Trying to start a PV guest with PCI device assigned just failed. # rpm -qa | grep xen kernel-xen-devel-2.6.18-206.el5 xen-3.0.3-113.el5virttest30 xen-debuginfo-3.0.3-113.el5virttest30 kernel-xen-2.6.18-206.el5 xen-devel-3.0.3-113.el5virttest30 xen-libs-3.0.3-113.el5virttest30 # xm pci-list-assignable-device 0000:03:00.0 # xm cr /tmp/rhel5.4-64-pv.cfg Using config file "/tmp/rhel5.4-64-pv.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Error: (22, 'Invalid argument') Some interesting information would be found in xend.log at next comment. Created attachment 433310 [details]
xend.log for above comment
Created attachment 433311 [details]
The whole xend.log
(In reply to comment #5) > (In reply to comment #4) > > Created an attachment (id=433307) [details] [details] > > xend.log for above comment > > Well, it looks like it's disabled according to following lines: > ... > [2010-07-21 14:01:26 xend 3684] INFO (pciquirk:91) NO quirks found for PCI > device [8086:10b9:8086:1093] > [2010-07-21 14:01:26 xend 3684] DEBUG (pciquirk:131) Permissive mode NOT > enabled for PCI device [8086:10b9:8086:1093] > ... > > This is most probably the configuration issue since according to this this is > not permitted for PCI device with id 8086:10b9:8086:1093. > > According to the code the configuration should be in > /etc/xen/xend-pci-permissive.sxp file so I *think* there should be definition > like: > > (unconstrained_dev_ids > ('8086:10b9:8086:1093') > ) > > According to the /usr/lib64/python2.4/site-packages/xen/xend/server/pciif.py > file there's a call to xc.domain_ioport_permission() function but I guess this > is not right and there may be a bug that it goes there even if it's not > permitted which may be the reason why the guest disappear after that. > > When looking at the code it should also write to > /sys/bus/pci/drivers/pciback/permissive node however it doesn't have effect > here. It seems there's 3 (No such process) error in the > xc.domain_ioport_permission() function which makes it fail. > xc_domain_ioport_permission() is a hypercall to XEN_DOMCTL_ioport_permission > but I'm not wise from that. Could you please attach the full xend.log ? > > Thanks, > Michal Michal, the full xend.log is in comment #8. Well, I don't know what caused the invalid argument issue but I'm building the new version of xen package with some debugging message added for testing purposes since I can't test this one myself. I doubt the invalid argument message is PCI-related but I *think* that for PCI device assignment you should have the device ID in the pci-list-assignable-devices output. Michal (In reply to comment #10) > Well, I don't know what caused the invalid argument issue but I'm building the > new version of xen package with some debugging message added for testing > purposes since I can't test this one myself. I doubt the invalid argument > message is PCI-related but I *think* that for PCI device assignment you should > have the device ID in the pci-list-assignable-devices output. > > Michal Could you please try using the http://people.redhat.com/minovotn/xen version of xen package and provide me the xend.log from testing? Thanks, Michal (In reply to comment #10) > Well, I don't know what caused the invalid argument issue but I'm building the > new version of xen package with some debugging message added for testing > purposes since I can't test this one myself. I doubt the invalid argument > message is PCI-related but I *think* that for PCI device assignment you should > have the device ID in the pci-list-assignable-devices output. > > Michal The problem is due to I restart xend before create the PV guest. I can reproduce this scenario even with xen-3.0.3-114: (1) Boot the host (2) Unbind pci device from original device driver and bind it to pciback (3) # xm pci-list-assignable-device 0000:03:00.0 (4) Restart xend (5) Try to create the machine with the PCI device assinged At step(5), you could see the error output: # xm cr /tmp/rhel5.4-64-pv.cfg Using config file "/tmp/rhel5.4-64-pv.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Error: (22, 'Invalid argument') Without restarting xend, I could create the PV guest with PCI device assigned successfully. Created attachment 433322 [details]
full xend.log of above comment
(In reply to comment #11) > (In reply to comment #10) > > Well, I don't know what caused the invalid argument issue but I'm building the > > new version of xen package with some debugging message added for testing > > purposes since I can't test this one myself. I doubt the invalid argument > > message is PCI-related but I *think* that for PCI device assignment you should > > have the device ID in the pci-list-assignable-devices output. > > > > Michal > > Could you please try using the http://people.redhat.com/minovotn/xen version of > xen package and provide me the xend.log from testing? > > Thanks, > Michal Reproduce this bug with your latest xen package: # xm pci-list-assignable-device 0000:03:00.0 # xm cr /tmp/rhel5.4-64-pv.cfg Using config file "/tmp/rhel5.4-64-pv.cfg". Using <class 'grub.GrubConf.GrubConfigFile'> to parse /grub/menu.lst Started domain rhel5-pv-x84_64 # xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 3409 4 r----- 90.4 rhel5-pv-x84_64 1 512 1 r----- 9.2 # xm pci-list 1 domain bus slot func 0 3 0 0 # xm save 1 1.save Error: Migration not permitted with assigned PCI device. Usage: xm save <Domain> <CheckpointFile> Save a domain state to restore later. # xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 3409 4 r----- 94.0 rhel5-pv-x84_64 2 512 1 --p--- 0.0 The vm disappear after a while: # xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 3409 4 r----- 95.2 Created attachment 433354 [details] full xend.log for comment #14 (In reply to comment #15) > Created an attachment (id=433354) [details] > full xend.log for comment #14 This is strange, this shouldn't be happening since it's been already fixed by something. Obviously not for all the cases: [2010-07-21 17:45:47 xend.XendDomainInfo 4277] ERROR (XendDomainInfo:2811) Failed to restart domain 1. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2797, in restart new_dom.waitForDevices() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 2489, in waitForDevices self.waitForDevices_(c) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 1484, in waitForDevices_ return self.getDeviceController(deviceClass).waitForDevices() File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 162, in waitForDevices return map(self.waitForDevice, self.deviceIDs()) File "/usr/lib64/python2.4/site-packages/xen/xend/server/DevController.py", line 172, in waitForDevice raise VmError("Device %s (%s) could not be connected. " VmError: Device 0 (vkbd) could not be connected. Hotplug scripts not working. I need to have a closer look. So, is it working fine to create the guest ? Honestly I don't know whether saving the PV guest with PCI device shouldn't be treated the same as migration, i.e. that it shouldn't be possible at all. Michal Now I can see the issues are in this code: rc = xc.physdev_map_pirq(domid = fe_domid, index = dev.irq, pirq = dev.irq) where fe_domid is the ID of domain where it's being attached (8 in my case), and dev_irq equals to 255 but I don't know whether this is OK. The xc_physdev_map_pirq() is the libxc function which is calling the hypervisor with the PHYSDEVOP_map_pirq operation and the return code 22 (-EINVAL) is the code coming from the xc_physdev_map_pirq() function for case the pirq is unset. The pirq seems to be set to 255 so it shouldn't be returning -EINVAL from there but there's a call to xc_physdev_op() function. According to the definition in xen/arch/x86/x86_64/compat.c there's a define for do_physdev_op to be substituted by compat_physdev_op which resides in the hypervisor code too. Isn't it possible there's something not enabled on the hypervisor command line or something like that? I don't understand PCI passthrough stuff so it's just my guess. Michal Hi, Michal 1. We assume that Xen prevent us from saving PV guest with PCI device assigned to is right, as said in bug Description/Expected results. 2. And the problem now is that after giving the Error message below: >Error: Migration not permitted with assigned PCI device. >Usage: xm save <Domain> <CheckpointFile> >Save a domain state to restore later. The VM should still on and work properly but not be destroyed/disappeared. 3. Seems there's no other special configuration that we should enable to support PCI passthrough. (In reply to comment #18) > Hi, Michal > > 1. We assume that Xen prevent us from saving PV guest with PCI device assigned > to is right, as said in bug Description/Expected results. > > 2. And the problem now is that after giving the Error message below: > >Error: Migration not permitted with assigned PCI device. > >Usage: xm save <Domain> <CheckpointFile> > > >Save a domain state to restore later. > > The VM should still on and work properly but not be destroyed/disappeared. > > 3. Seems there's no other special configuration that we should enable to > support PCI passthrough. Hi Lei, I see what you mean. If the expected behaviour is to make it fail but resume the guest, then this is the problem I've been coping with. Nevertheless, could you please give me access to some machine with PCI passthrough device connected to the guest for further testing? I was using some remote machine but I don't remember it and I can't find it in the history now :( Michal (In reply to comment #17) > Now I can see the issues are in this code: > > rc = xc.physdev_map_pirq(domid = fe_domid, > index = dev.irq, > pirq = dev.irq) > > where fe_domid is the ID of domain where it's being attached (8 in my case), > and dev_irq equals to 255 but I don't know whether this is OK. > > The xc_physdev_map_pirq() is the libxc function which is calling the hypervisor > with the PHYSDEVOP_map_pirq operation and the return code 22 (-EINVAL) is the > code coming from the xc_physdev_map_pirq() function for case the pirq is unset. > The pirq seems to be set to 255 so it shouldn't be returning -EINVAL from there > but there's a call to xc_physdev_op() function. According to the definition in > xen/arch/x86/x86_64/compat.c there's a define for do_physdev_op to be > substituted by compat_physdev_op which resides in the hypervisor code too. > Isn't it possible there's something not enabled on the hypervisor command line > or something like that? I don't understand PCI passthrough stuff so it's just > my guess. > > Michal Yeah. It seems that we forgot to add "iommu=1" in kernel command line to enable IOMMU. But I thought we don't have to do that if we only want to do PCI pass-through with PV guest. Don't know whether this has some impacts for the scenario with this bug. (In reply to comment #19) > (In reply to comment #18) > > Hi, Michal > > > > 1. We assume that Xen prevent us from saving PV guest with PCI device assigned > > to is right, as said in bug Description/Expected results. > > > > 2. And the problem now is that after giving the Error message below: > > >Error: Migration not permitted with assigned PCI device. > > >Usage: xm save <Domain> <CheckpointFile> > > > > >Save a domain state to restore later. > > > > The VM should still on and work properly but not be destroyed/disappeared. > > > > 3. Seems there's no other special configuration that we should enable to > > support PCI passthrough. > > Hi Lei, > I see what you mean. If the expected behaviour is to make it fail but resume > the guest, then this is the problem I've been coping with. > > Nevertheless, could you please give me access to some machine with PCI > passthrough device connected to the guest for further testing? I was using some > remote machine but I don't remember it and I can't find it in the history now > :( > > Michal Well, you have to have 'iommu=1' (IOMMU enabled) on hypervisor command-line and on the testing machine you don't have it enabled and since it's Dell 760 it's a hardware bug there as described in bug 541788 on Dell 760. This machine used for testing does have 2 major issues - first, iommu is disabled (i.e. no 'iommu=1' on HV command-line) and second, you can't enable it because of hardware bug. Some better machine for testing to be able to enable iommu is necessary for testing this bug. Michal (In reply to comment #20) > (In reply to comment #17) > > Now I can see the issues are in this code: > > > > rc = xc.physdev_map_pirq(domid = fe_domid, > > index = dev.irq, > > pirq = dev.irq) > > > > where fe_domid is the ID of domain where it's being attached (8 in my case), > > and dev_irq equals to 255 but I don't know whether this is OK. > > > > The xc_physdev_map_pirq() is the libxc function which is calling the hypervisor > > with the PHYSDEVOP_map_pirq operation and the return code 22 (-EINVAL) is the > > code coming from the xc_physdev_map_pirq() function for case the pirq is unset. > > The pirq seems to be set to 255 so it shouldn't be returning -EINVAL from there > > but there's a call to xc_physdev_op() function. According to the definition in > > xen/arch/x86/x86_64/compat.c there's a define for do_physdev_op to be > > substituted by compat_physdev_op which resides in the hypervisor code too. > > Isn't it possible there's something not enabled on the hypervisor command line > > or something like that? I don't understand PCI passthrough stuff so it's just > > my guess. > > > > Michal > > Yeah. It seems that we forgot to add "iommu=1" in kernel command line to enable > IOMMU. But I thought we don't have to do that if we only want to do PCI > pass-through with PV guest. Don't know whether this has some impacts for the > scenario with this bug. Yufang, I *think* this is necessary for PV guests too. And also, the test case leiwang gave me link to is having IOMMU in it's summary ([IOMMU]Try to save a PV guest when a pci device assigned to it (PV)) so I guess it's really necessary to have it enabled for both HVM and PV guests to be able to do PCI passthru. Michal (In reply to comment #22) > I *think* this is necessary for PV guests too. I don't think this is correct. PCI passthrough has been supported for PV guests since RHEL 5.0, but that parameter was introduced for RHEL 5.4 when support for passthrough to HVM guests was implemented. You would need it as well as another parameter (but on dom0's command line) if you wanted to passthrough a VF using VTd, but I don't think that currently works for PV guests anyway. Rather than making assumptions about what's needed and what isn't, I suggest you do some research and/or an experiment or two to figure the current requirements and what's supported. Adding Don to this as he should be able to help clear things up. (In reply to comment #23) > (In reply to comment #22) > > I *think* this is necessary for PV guests too. > > I don't think this is correct. PCI passthrough has been supported for PV guests > since RHEL 5.0, but that parameter was introduced for RHEL 5.4 when support for > passthrough to HVM guests was implemented. You would need it as well as another > parameter (but on dom0's command line) if you wanted to passthrough a VF using > VTd, but I don't think that currently works for PV guests anyway. Rather than > making assumptions about what's needed and what isn't, I suggest you do some > research and/or an experiment or two to figure the current requirements and > what's supported. Well, the problem on the test machine (Dell 760 one) was that I was getting -EINVAL error all the time I tried to boot up the PV guest with PCI device attached so I guess this may be either iommu specific or the BIOS bug related thing on this machine. Currently I'm working on "Intel(R) Xeon(R) CPU X5550 @ 2.67GHz" with no such issues and it boots the guest and the guest is able to see the PCI device finally. On the Dell machine with iommu disabled I was unable to even start up the guest because of -EINVAL coming from do_phydev_op() call in libxc which is accessing the HV AFAIK. I can do the experiment but after I finish work on this save issue since I'm finally able to reproduce behaviour described in this bugzilla since it's finally working for me. Michal Well, it seems the problem there is the problem with crash according to the log file. Therefore the guest is dying since it crashed and adding the option to start the new instance of the guest/reboot is not the right one since the root cause is most likely related to the crash there. There are some repetitions of "Dev %s still active, looping ..." message called from XendDomainInfo.py:testDeviceComplete() function and this is where the crash occurs. ... File "/usr/lib64/python2.4/site-packages/xen/xend/server/pciif.py", line 419, in migrate raise XendError('Migration not permitted with assigned PCI device.') XendError: Migration not permitted with assigned PCI device. [2010-07-23 19:50:56 xend.XendDomainInfo 10121] DEBUG (XendDomainInfo:2330) XendDomainInfo.resumeDomain(6) [2010-07-23 19:50:56 xend.XendDomainInfo 10121] INFO (XendDomainInfo:2454) Dev 51712 still active, looping... [2010-07-23 19:50:56 xend.XendDomainInfo 10121] INFO (XendDomainInfo:2454) Dev 51712 still active, looping... [2010-07-23 19:50:56 xend.XendDomainInfo 10121] INFO (XendDomainInfo:2454) Dev 51712 still active, looping... [2010-07-23 19:50:56 xend.XendDomainInfo 10121] WARNING (XendDomainInfo:1222) Domain has crashed: name=migrating-rhel5-pv-x84_64 id=6. [2010-07-23 19:50:56 xend.XendDomainInfo 10121] INFO (XendDomainInfo:1229) Starting automatic crash dump [2010-07-23 19:51:01 xend.XendDomainInfo 10121] DEBUG (XendDomainInfo:2341) XendDomainInfo.resumeDomain: devices released ... The issue may be caused by backend reconnection when it's not disconnected yet which could result into guest crash since we disconnect a disk from currently running guest. Obviously PV guests don't wait and kernel panic or something immediately resulting into the guest crash. Michal Andrew is correct here; for doing PV passthrough, iommu=1 is not needed (and indeed is ignored; PV passthrough doesn't go through the IOMMU, which is why it is unsafe to use in general). So the bug has to be elsewhere. Chris Lalancette Created attachment 433961 [details]
Bash script to unbind Intel 82541PI and bind to PCI back
Well, I found out that the issue was with the resume being called before it went to save. This way it was unable to work fine since it was unable to resume the guest which memory has been still allocated so the resume failed on mapping start_info which resulted into the PV guest crash. I'm already having the patch but I need to test it further for HVM guests now.
Michal
Created attachment 433968 [details]
Patch to implement correct resume handling on failed save
Hi,
this is the patch to call domain_resume only when appropriate, i.e. not before the domain is being saved and memory deallocated. The patch has been tested on Intel(R) Xeon(R) CPU X5550 and RHEL-5 PV guest with both PCI device not assigned and assigned. When saving the PV guest without PCI device assigned on a place with enough space everything went fine to both save and restore, when I tried to save the guest to place with insufficient disk space
it failed to save but it resume fine. Finally for PV guest with PCI device assigned it failed with "Migration not permitted with assigned PCI device" reason but it was still working fine. When testing with HVM guests I saw no regressive behaviour, i.e. it's been working like before the patch applied.
Before my patch applied, it failed to resume the guest because of memory were unable to be allocated returning "Couldn't map start_info" from domain_resume function of libxc.
The device attached to the guest was "37:04.0" device that was dependant on "37:09.0". The script I used to add the devices to PCIback driver was added in the previous comment.
Michal
Oh, just one more update: I was wrong. For PV guests, only modprobing pciback is required and no iommu=1 and pci_pt_e820_access=on the hypervisor command line is required. Michal Created attachment 440612 [details] log info for the "xm save" operation Host: xen-3.0.3-115.el5, kernel-xen-2.6.18-212.el5 Guest: RHEL-Server-5.4-64-pv Test with a pci device (intel 82576 VF) attached. When did "xm save" to the guest domain, the operation failed with following output: >> root@dhcp-65-129 ~]# xm save 1 save_file >> Error: Migration not permitted with assigned PCI device. >> Usage: xm save <Domain> <CheckpointFile> >> >> Save a domain state to restore later. The target guest ran well after save operation failed, so did the pci device. Bug verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-0031.html |
Created attachment 376587 [details] xend.log Description of problem: When trying to save a PV guest with a pci device assigned to it, the PV guest hang there without any response. Version-Release number of selected component (if applicable): xen-3.0.3-94.el5 How reproducible: Always Steps to Reproduce: In Domain0: # xm pci-list-assignable-devices 0000:03:00.0 # xm cr /etc/xen/test_pv pci="0000:03:00.0" Using config file "/etc/xen/test_pv". file /root/pv.img Started domain PvDomain # xm save PvDomain PvDomain.save Error: Migration not permitted with assigned PCI device. Usage: xm save <Domain> <CheckpointFile> Save a domain state to restore later. # xm li Name ID Mem(MiB) VCPUs State Time(s) Domain-0 0 3409 4 r----- 1646.1 migrating-PvDomain 6 511 4 -b---- 17.0 Also we got error output from within PV guest after we try to save the PV guest: pcifront pci-0: pciback not responding!!! get no response from backend for disable MSI pcifront pci-0: pciback not responding!!! pcifront pci-0: pciback not responding!!! pcifront pci-0: pciback not responding!!! pcifront pci-0: 22 freeing event channel 15 Actual results: Xen prevent you from saving PV guest, but the PV guest hang there with no response after you trying to save it. Also the name of PV guest changed from 'PvDomain' to 'migrating-PvDomain'. Expected results: Xen prevent you from saving PV guest and the guest works fine even xm save failed. Additional info: