Bug 233801 - PCI devices disappear in Xen Paravirtual DomU on reboot/reset
Summary: PCI devices disappear in Xen Paravirtual DomU on reboot/reset
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Bill Burns
QA Contact: Martin Jenner
URL:
Whiteboard:
: 431442 (view as bug list)
Depends On: 339421
Blocks: 448899
TreeView+ depends on / blocked
 
Reported: 2007-03-25 05:39 UTC by Adam Vance
Modified: 2018-10-20 00:39 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 08:17:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Configs and Before/After logs of Dom0 and DomU (267.42 KB, application/x-gzip)
2007-03-25 05:39 UTC, Adam Vance
no flags Details
xenstore listing showing working PCI config before reboot (8.95 KB, text/plain)
2007-12-12 16:02 UTC, Stephen Tweedie
no flags Details
xend.log output for domain create and reboot (19.57 KB, text/plain)
2007-12-12 16:04 UTC, Stephen Tweedie
no flags Details
A works-for-me patch (569 bytes, patch)
2009-01-07 15:30 UTC, Jiri Denemark
no flags Details | Diff
Proposed patch (3.31 KB, patch)
2009-01-12 02:18 UTC, Bill Burns
no flags Details | Diff
Posted patch. (3.31 KB, patch)
2009-01-12 13:51 UTC, Bill Burns
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Adam Vance 2007-03-25 05:39:40 UTC
Description of problem:

Blocking two RTL8139 compatable NICs from the Dom0 paravirtual domain with 
pciback module, then passing them to a DomU domain will work on first boot but 
will disappear upon reboot. The PCI devices will load in DomU upon all cold 
starts of the DomU but will not re-load upon reboot of DomU.

Version-Release number of selected component (if applicable):
### Dom0:
[root@zeus ~]# rpm -qa kernel-xen xen
xen-3.0.3-25.0.3.el5
kernel-xen-2.6.18-8.1.1.el5 <-- Running Kernel

### DomU:
[root@gateway ~]# rpm -qa kernel-xen xen
kernel-xen-2.6.18-8.1.1.el5

How reproducible:

reboot DomU paravirtual domain with PCI devices, devices will not be present 
after reboot as seen by 'lspci'


Steps to Reproduce:
1. Add "options pciback hide=(XX:XX.X)(YY:YY.Y)... to Dom0 modprobe.conf
2. Add "install 8139too /sbin/modprobe pciback ; /sbin/modprobe --first-time --
ignore-install 8139too" to Dom0 modprobe.conf
3. Rebuild initrd on Dom0 with "mkinitd /boot/initrd-2.6.18-
8.1.1.el5xen.img 'uname -r'"
4. Reboot Dom0
5. Build DomU with virt-install
6. Finish DomU install upon bootup from VNC session
7. Boot DomU after install with "xm create [DomU]"
8. Load latest updates for DomU using "yum update", reboot into new kernel
9. Shutdown DomU
10. Add "pci = ['XX:XX.X', 'YY:YY.Y']" to DomU.conf on Dom0
11. Create DomU with "xm create [DomU]"
12. Verify PCI devices are present with "lspci" *NOTE: PCI Devices are present 
and configured
13. Reboot DomU
14 Verify PCI devices are present with "lspci" *NOTE: PCI Devices are no longer 
present
  
Actual results:

Upon reboot of DomU, PCI devices are no longer visible as verified by "lspci"

Expected results:

PCI devices on DomU should be present after all reboots

Additional info:

Comment 1 Adam Vance 2007-03-25 05:39:40 UTC
Created attachment 150841 [details]
Configs and Before/After logs of Dom0 and DomU

Comment 2 Stephen Tweedie 2007-03-26 16:38:35 UTC
"The PCI devices will load in DomU upon all cold starts of the DomU but will not
re-load upon reboot of DomU."  

So it works if you do an xm shutdown of the domU, and then start it again from
fresh, without rebooting the dom0?


Comment 3 Adam Vance 2007-03-26 16:43:50 UTC
Stephen,

Correct if you completely shutdown the DomU and then start it cold the PCI 
devices will be present. But if do an xm reboot or reboot from the DomU the PCI 
devices will not be present after reboot.

Comment 4 Issue Tracker 2007-08-10 06:22:33 UTC
Hi,

A customer of ours would like to know about the status of this issue?
Thanks!


This event sent from IssueTracker by mnapolis 
 issue 122374

Comment 5 RHEL Program Management 2007-10-16 04:03:04 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Stephen Tweedie 2007-12-11 17:02:07 UTC
Problem reproduced on RHEL-5.1, although it needs the fix from bug 339421 to
enable pci passthrough.

Comment 7 Stephen Tweedie 2007-12-12 15:59:28 UTC
OK, looks like we are simply not setting up any of the pci xenstore entries when
a domain reboots.  xend.log also indicates that the domain info being used to
recreate the domain after the reboot lacks the pci entries from the config file.
 Will attach logs.


Comment 8 Stephen Tweedie 2007-12-12 16:02:32 UTC
Created attachment 285821 [details]
xenstore listing showing working PCI config before reboot

Output of "xenstore -ls" with one domain running using PCI passthrough.  Config
entry used was 

    pci = [ "0000:00:09.0" ]

to pass a single forcedeth NIC to domU.

The entire pci config (front and back end) is missing from the xenstore listing
after reboot.

Comment 9 Stephen Tweedie 2007-12-12 16:04:46 UTC
Created attachment 285831 [details]
xend.log output for domain create and reboot

xend.log log output showing the full logs from the initial creation of the
domain, through reboot.  Reboot is marked clearly within the logs; the
"XendDomainInfo.create" entry immediately after the reboot can be seen not to
have the pci config set.

Comment 10 Stephen Tweedie 2007-12-12 16:51:14 UTC
Upstream xen-unstable 9968 may be related:

 The PciController class lacks a configuration method to re-generate the
 configuration of an existing domain. This is needed for a domain to be
 able to reboot and retain its PCI device configuration. This patch adds
 such support.

The symptoms appear to be the same here: we completely lose the pci config when
we reboot.  But the code from this cset is definitely present in our xend.



Comment 11 Bill Burns 2008-01-23 21:35:00 UTC
Reassign to Dan, set flags.


Comment 12 Daniel Berrangé 2008-01-23 21:50:11 UTC
I can't reproduce this on the pending  5.2  xen RPMs.

Using  xen-3.0.3-50.el5 + the patch from bug 339421.

Host kernel is 2.6.18-58.el5xen and guest is 2.6.18-20.el5xen

I can reboot with 'xm reboot' (from dom0)  and 'shutdown -r nw' (from domU). In
both cases the PCI device is present upon completion of the reboot.

I notice this bug was originally opened against RHEL-5.0. I suspect we got the
neccessary fixes during the updates for 5.1.

Comment 13 Andreas Thienemann 2008-02-19 07:31:48 UTC
*** Bug 431442 has been marked as a duplicate of this bug. ***

Comment 14 Daniel Berrangé 2008-03-19 20:06:02 UTC
I am still unable to reproduce this problem on RHEL-5.2 beta packages. Unless
someone can provide a reliable reproducer I'm going to close this ticket WORKSFORME


Comment 15 Daniel Berrangé 2008-03-20 20:28:45 UTC
Finally figured out what's going on here...

When re-starting a guest XenD calls to  server/pciif.py to get details on the
configured PCI devices. This reads the data about PCi devices out of XenStore

Meanwhile the guest which just shutdown has its hotplug scripts being triggered.
/etc/xen/scripts/xen-hotplug-cleanup script is run by the PCI backend and blows
away all the XenStore entries for PCI devices

These 2 things run in parallel and in fact race with each other. On some
machines the hotplug scripts always win (and thus PCI devices disappear on
reboot), on others XenD always wins (and PCI devices stay around on reboot).

Two questions remain:

 - What is telling/triggering the hotplug scripts to run - can this be delayed
to a safer time
 - Why are the hotplug scripts removing device data from xenstore when XenD
already does this too.



Comment 17 RHEL Program Management 2008-06-02 20:38:21 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 20 Jiri Denemark 2009-01-07 15:28:40 UTC
This is a kernel issue.

The difference between block/net devices which work well and pci devices which disappear after rebooting a domain is that pciback calls device_unregister() when frontend's state changes to XenbusStateClosed while blockback/netback do not. When a domain reboots, information about pci devices may be removed from xenstore too early, before xend reads it back to create a configuration for the rebooted domain.

The referred code is in pciback_frontend_changed() in drivers/{blkback,netback,pciback}/xenbus.c

Just a note for reproducing the bug: to reliably reproduce it I had to put time.sleep(5) as the first line in XendDomainInfo::restart() method in xend/XenDomainInfo.py to delay xend a bit and let xen-hotplug-cleanup script always win the race.

Comment 21 Jiri Denemark 2009-01-07 15:30:30 UTC
Created attachment 328394 [details]
A works-for-me patch

Comment 22 Bill Burns 2009-01-12 02:15:35 UTC
Located upstream patch set that addresses the issue. It has the change Jiri provided. See http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/5644f68a7912

Comment 23 Bill Burns 2009-01-12 02:18:53 UTC
Created attachment 328690 [details]
Proposed patch

Port of upstream patch.

Comment 24 Bill Burns 2009-01-12 13:51:10 UTC
Created attachment 328735 [details]
Posted patch.

Actual posted patch.

Comment 25 Chris Lalancette 2009-01-15 14:42:51 UTC
I've uploaded a test kernel that contains this fix (along with several others) to this location:

http://people.redhat.com/clalance/virttest

Could the original reporter try out the test kernels there, and report back if it fixes the problem?

Thanks,
Chris Lalancette

Comment 28 Don Zickus 2009-02-23 20:00:31 UTC
in kernel-2.6.18-132.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 31 Chris Ward 2009-07-03 17:57:09 UTC
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.

Comment 33 errata-xmlrpc 2009-09-02 08:17:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Comment 34 Chris Lalancette 2010-07-19 13:06:44 UTC
Clearing a needinfo request.

Chris Lalancette


Note You need to log in before you can comment on or make changes to this bug.