Bug 294811 - kernel BUG at mm/rmap.c:590 during suspend/resume testing
Summary: kernel BUG at mm/rmap.c:590 during suspend/resume testing
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.0
Hardware: All
OS: Linux
low
low
Target Milestone: ---
: ---
Assignee: Chris Lalancette
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: 311431
TreeView+ depends on / blocked
 
Reported: 2007-09-18 14:33 UTC by Ian Campbell
Modified: 2008-05-21 14:55 UTC (History)
3 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 14:55:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
test case to reproduce issue (1.38 KB, text/x-csrc)
2007-09-18 14:33 UTC, Ian Campbell
no flags Details
Patch to fix protnone problem during PV save/restore (18.67 KB, patch)
2007-12-13 22:08 UTC, Chris Lalancette
no flags Details | Diff
xen-unstable 12402:ade94aa072c5 ported to 2.6.9-67.EL (5.20 KB, patch)
2007-12-14 09:06 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 12545:50467f56ed65 ported to 2.6.9-67.EL (4.05 KB, patch)
2007-12-14 09:07 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 13998:d2dff286994d ported to 2.6.9-67.EL (16.90 KB, patch)
2007-12-14 09:08 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 12402:ade94aa072c5 ported to 2.6.18-53.el5 (6.51 KB, patch)
2007-12-14 09:09 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 12545:50467f56ed65 ported to 2.6.18-53.el5 (4.05 KB, patch)
2007-12-14 09:10 UTC, Ian Campbell
no flags Details | Diff
xen-unstable 13998:d2dff286994d ported to 2.6.18-53.el5 (16.80 KB, patch)
2007-12-14 09:11 UTC, Ian Campbell
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Ian Campbell 2007-09-18 14:33:57 UTC
The kernel needs to ensure that not-present PTEs contain a PFN and not an MFN.
This is because the suspend-resume code will not canonicalize not present PTEs
since they may contain values which are not PFN or MFNs.

This was observed with 2.6.18-8.1.10.el5 but I think it might apply to the
rhel4u5 Xen kernel as well.

The problem was solved upstream with
http://xenbits.xensource.com/xen-unstable.hg?rev/d2dff286994d
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/4fd6832bb54f

Reproducible by running the attached main.c (gcc -O2 main.c) over a save restore
iteration.

Comment 1 Ian Campbell 2007-09-18 14:33:57 UTC
Created attachment 198411 [details]
test case to reproduce issue

Comment 4 Chris Lalancette 2007-12-13 22:08:01 UTC
Created attachment 288051 [details]
Patch to fix protnone problem during PV save/restore

OK, I was able to easily reproduce the problem, and come up with a backport. 
This backport is really a combination of 3 changesets from upstream
Xen-unstable:
12402, 13998, 14006.  It's a pretty straightforward backport; I only had to
remove one chunk (that removed a line we didn't have to begin with), and to
adjust line numbers appropriately.  Before the patch, running the test program
in a loop would end up in a reliable oops when doing a save/restore cycle on
the domain.  I'm still testing, but at least on i686 I've gone 20 save/restore
cycles already without oopsing.

Chris Lalancette

Comment 5 Ian Campbell 2007-12-14 09:06:09 UTC
Created attachment 288751 [details]
xen-unstable 12402:ade94aa072c5 ported to 2.6.9-67.EL

Comment 6 Ian Campbell 2007-12-14 09:07:18 UTC
Created attachment 288761 [details]
xen-unstable 12545:50467f56ed65 ported to 2.6.9-67.EL

Comment 7 Ian Campbell 2007-12-14 09:08:09 UTC
Created attachment 288771 [details]
xen-unstable 13998:d2dff286994d ported to 2.6.9-67.EL

Comment 8 Ian Campbell 2007-12-14 09:09:35 UTC
Created attachment 288781 [details]
xen-unstable 12402:ade94aa072c5 ported to 2.6.18-53.el5

Comment 9 Ian Campbell 2007-12-14 09:10:15 UTC
Created attachment 288791 [details]
xen-unstable 12545:50467f56ed65 ported to 2.6.18-53.el5

Comment 10 Ian Campbell 2007-12-14 09:11:00 UTC
Created attachment 288801 [details]
xen-unstable 13998:d2dff286994d ported to 2.6.18-53.el5

Comment 11 Ian Campbell 2007-12-14 09:12:21 UTC
We recently stopped using the rhel4x.hg port from xenbits and switched to using
a set of targetted fixes to your kernels. I have attached the patches from our
queue relevant to this issue.

I'm not sure why it never occured to me to attach our rhel5x version of the
patches here (it seems a pretty obvious thing to do now I think about it!). I
see you've got a patch of your own now but I've attached our versions in case
they are of use.

Comment 12 Don Zickus 2008-01-10 20:42:35 UTC
in 2.6.18-66.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 14 Scott LaCroix 2008-04-17 18:53:10 UTC
Is there a fix for this in the PAE kernel as well? I am seeing the problem in
2.6.18-53PAE.

Thanks

Comment 15 Chris Lalancette 2008-04-17 19:03:45 UTC
No, that doesn't make sense.  This is specifically for a RHEL-5 Xen PV kernel,
not for a bare-metal kernel.  You seem to be having a different issue; please
open a different BZ about it.

Chris Lalancette

Comment 16 John Madden 2008-04-25 19:07:31 UTC
Don, do your more recent kernel builds still contain the fix?  I'm seeing this
bug and would be happy to test, just wanted to make sure they're patched
appropriately.


Comment 17 Chris Lalancette 2008-04-25 19:27:14 UTC
Yep, these fixes should be in later 5.2 kernel builds.  Note that there were
additional patches that went into -89 to fix up a regression caused by this
patchset, so you'll want to test later than that.  Any testing is welcome!

Thanks,
Chris Lalancette

Comment 18 John Madden 2008-04-25 20:47:57 UTC
Tested with -91, no failures.  This appears to fix the crash.  



Comment 20 errata-xmlrpc 2008-05-21 14:55:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.