Description of problem: Blocking two RTL8139 compatable NICs from the Dom0 paravirtual domain with pciback module, then passing them to a DomU domain will work on first boot but will disappear upon reboot. The PCI devices will load in DomU upon all cold starts of the DomU but will not re-load upon reboot of DomU. Version-Release number of selected component (if applicable): ### Dom0: [root@zeus ~]# rpm -qa kernel-xen xen xen-3.0.3-25.0.3.el5 kernel-xen-2.6.18-8.1.1.el5 <-- Running Kernel ### DomU: [root@gateway ~]# rpm -qa kernel-xen xen kernel-xen-2.6.18-8.1.1.el5 How reproducible: reboot DomU paravirtual domain with PCI devices, devices will not be present after reboot as seen by 'lspci' Steps to Reproduce: 1. Add "options pciback hide=(XX:XX.X)(YY:YY.Y)... to Dom0 modprobe.conf 2. Add "install 8139too /sbin/modprobe pciback ; /sbin/modprobe --first-time -- ignore-install 8139too" to Dom0 modprobe.conf 3. Rebuild initrd on Dom0 with "mkinitd /boot/initrd-2.6.18- 8.1.1.el5xen.img 'uname -r'" 4. Reboot Dom0 5. Build DomU with virt-install 6. Finish DomU install upon bootup from VNC session 7. Boot DomU after install with "xm create [DomU]" 8. Load latest updates for DomU using "yum update", reboot into new kernel 9. Shutdown DomU 10. Add "pci = ['XX:XX.X', 'YY:YY.Y']" to DomU.conf on Dom0 11. Create DomU with "xm create [DomU]" 12. Verify PCI devices are present with "lspci" *NOTE: PCI Devices are present and configured 13. Reboot DomU 14 Verify PCI devices are present with "lspci" *NOTE: PCI Devices are no longer present Actual results: Upon reboot of DomU, PCI devices are no longer visible as verified by "lspci" Expected results: PCI devices on DomU should be present after all reboots Additional info:
Created attachment 150841 [details] Configs and Before/After logs of Dom0 and DomU
"The PCI devices will load in DomU upon all cold starts of the DomU but will not re-load upon reboot of DomU." So it works if you do an xm shutdown of the domU, and then start it again from fresh, without rebooting the dom0?
Stephen, Correct if you completely shutdown the DomU and then start it cold the PCI devices will be present. But if do an xm reboot or reboot from the DomU the PCI devices will not be present after reboot.
Hi, A customer of ours would like to know about the status of this issue? Thanks! This event sent from IssueTracker by mnapolis issue 122374
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Problem reproduced on RHEL-5.1, although it needs the fix from bug 339421 to enable pci passthrough.
OK, looks like we are simply not setting up any of the pci xenstore entries when a domain reboots. xend.log also indicates that the domain info being used to recreate the domain after the reboot lacks the pci entries from the config file. Will attach logs.
Created attachment 285821 [details] xenstore listing showing working PCI config before reboot Output of "xenstore -ls" with one domain running using PCI passthrough. Config entry used was pci = [ "0000:00:09.0" ] to pass a single forcedeth NIC to domU. The entire pci config (front and back end) is missing from the xenstore listing after reboot.
Created attachment 285831 [details] xend.log output for domain create and reboot xend.log log output showing the full logs from the initial creation of the domain, through reboot. Reboot is marked clearly within the logs; the "XendDomainInfo.create" entry immediately after the reboot can be seen not to have the pci config set.
Upstream xen-unstable 9968 may be related: The PciController class lacks a configuration method to re-generate the configuration of an existing domain. This is needed for a domain to be able to reboot and retain its PCI device configuration. This patch adds such support. The symptoms appear to be the same here: we completely lose the pci config when we reboot. But the code from this cset is definitely present in our xend.
Reassign to Dan, set flags.
I can't reproduce this on the pending 5.2 xen RPMs. Using xen-3.0.3-50.el5 + the patch from bug 339421. Host kernel is 2.6.18-58.el5xen and guest is 2.6.18-20.el5xen I can reboot with 'xm reboot' (from dom0) and 'shutdown -r nw' (from domU). In both cases the PCI device is present upon completion of the reboot. I notice this bug was originally opened against RHEL-5.0. I suspect we got the neccessary fixes during the updates for 5.1.
*** Bug 431442 has been marked as a duplicate of this bug. ***
I am still unable to reproduce this problem on RHEL-5.2 beta packages. Unless someone can provide a reliable reproducer I'm going to close this ticket WORKSFORME
Finally figured out what's going on here... When re-starting a guest XenD calls to server/pciif.py to get details on the configured PCI devices. This reads the data about PCi devices out of XenStore Meanwhile the guest which just shutdown has its hotplug scripts being triggered. /etc/xen/scripts/xen-hotplug-cleanup script is run by the PCI backend and blows away all the XenStore entries for PCI devices These 2 things run in parallel and in fact race with each other. On some machines the hotplug scripts always win (and thus PCI devices disappear on reboot), on others XenD always wins (and PCI devices stay around on reboot). Two questions remain: - What is telling/triggering the hotplug scripts to run - can this be delayed to a safer time - Why are the hotplug scripts removing device data from xenstore when XenD already does this too.
This is a kernel issue. The difference between block/net devices which work well and pci devices which disappear after rebooting a domain is that pciback calls device_unregister() when frontend's state changes to XenbusStateClosed while blockback/netback do not. When a domain reboots, information about pci devices may be removed from xenstore too early, before xend reads it back to create a configuration for the rebooted domain. The referred code is in pciback_frontend_changed() in drivers/{blkback,netback,pciback}/xenbus.c Just a note for reproducing the bug: to reliably reproduce it I had to put time.sleep(5) as the first line in XendDomainInfo::restart() method in xend/XenDomainInfo.py to delay xend a bit and let xen-hotplug-cleanup script always win the race.
Created attachment 328394 [details] A works-for-me patch
Located upstream patch set that addresses the issue. It has the change Jiri provided. See http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/5644f68a7912
Created attachment 328690 [details] Proposed patch Port of upstream patch.
Created attachment 328735 [details] Posted patch. Actual posted patch.
I've uploaded a test kernel that contains this fix (along with several others) to this location: http://people.redhat.com/clalance/virttest Could the original reporter try out the test kernels there, and report back if it fixes the problem? Thanks, Chris Lalancette
in kernel-2.6.18-132.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html
Clearing a needinfo request. Chris Lalancette