Bug 311431

Summary: kernel BUG at mm/rmap.c:479 during suspend/resume testing
Product: Red Hat Enterprise Linux 4 Reporter: Chris Lalancette <clalance>
Component: kernel-xenAssignee: Chris Lalancette <clalance>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 4.5CC: ddutile, sputhenp, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHSA-2008-0665 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-24 19:17:29 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 294811    
Bug Blocks:    
Attachments:
Description Flags
Test case to reproduce the crash across save/restore
none
Patch based on upstream Xen to fix the PROT_NONE crash
none
Updated patch, with x86_64 fixes
none
Another version of the PROT_NONE patch that actually compiles on x86_64 none

Description Chris Lalancette 2007-09-28 17:58:25 UTC
+++ This bug was initially created as a clone of Bug #294811 +++

The kernel needs to ensure that not-present PTEs contain a PFN and not an MFN.
This is because the suspend-resume code will not canonicalize not present PTEs
since they may contain values which are not PFN or MFNs.

This was observed with 2.6.18-8.1.10.el5 but I think it might apply to the
rhel4u5 Xen kernel as well.

The problem was solved upstream with
http://xenbits.xensource.com/xen-unstable.hg?rev/d2dff286994d
http://xenbits.xensource.com/kernels/rhel4x.hg?rev/4fd6832bb54f

Reproducible by running the attached main.c (gcc -O2 main.c) over a save restore
iteration.

-- Additional comment from ijc.uk on 2007-09-18 10:33 EST --
Created an attachment (id=198411)
test case to reproduce issue

-------------------------
Verified that this is a problem on the RHEL-4 PV kernel as well.  Test case will
be attached.

Comment 1 Chris Lalancette 2007-09-28 17:58:25 UTC
Created attachment 210741 [details]
Test case to reproduce the crash across save/restore

Comment 2 Chris Lalancette 2007-09-28 18:11:29 UTC
Created attachment 210751 [details]
Patch based on upstream Xen to fix the PROT_NONE crash

This patch is a backport of the upstream RHEL-4 tree to fix the mm/rmap.c crash
on i386 when running the test case during save/restore.  Note that this patch
is a combination of the upstream http://xenbits.xensource.com/kernels/rhel4x.hg
changesets 219, 229, and 277.  Using this patch, save/restore successfully
completes when running the above test case.

Chris Lalancette

Comment 4 Bill Burns 2008-01-05 14:11:39 UTC
Set dev ack for Chris Lalancette.


Comment 5 Chris Lalancette 2008-02-24 18:46:54 UTC
Created attachment 295746 [details]
Updated patch, with x86_64 fixes

This is an updated version of the patch to fix the PROT_NONE crash.  This
version fixes the pte_same bug in the previous version, as well as adding all
of the fixes that should be necessary for x86_64.  With this version, the
RHEL-4 code is much closer to the RHEL-5 code.

Chris Lalancette

Comment 6 Chris Lalancette 2008-02-25 05:12:54 UTC
Created attachment 295765 [details]
Another version of the PROT_NONE patch that actually compiles on x86_64

This is the same as the previous patch, except this one actually compiles on
x86_64.

Chris Lalancette

Comment 7 Vivek Goyal 2008-03-03 20:40:07 UTC
Committed in 68.16.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 9 Ian Campbell 2008-05-23 15:27:46 UTC
Id like to draw your attention to 448115 which contains an additional patch
which fixes those attached here when the host has >= 64G of RAM:
http://xenbits.xensource.com/xen-unstable.hg?rev/f36700819453

I haven't actually tried the test kernels given here but I see no reason why the
fix shouldn't be necessary.

Comment 12 errata-xmlrpc 2008-07-24 19:17:29 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2008-0665.html