Bug 130843 - (IT_47550) not-present translations for region 5(vmalloc'd area) not handled
not-present translations for region 5(vmalloc'd area) not handled
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
ia64 Linux
medium Severity high
: ---
: ---
Assigned To: Dave Anderson
Brian Brock
http://linux.bkbits.net:8080/linux-2....
:
Depends On:
Blocks: 123574
  Show dependency treegraph
 
Reported: 2004-08-24 22:39 EDT by Suresh Siddha
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-12-20 15:55:59 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch handling not-present faults for region 5 (1.35 KB, patch)
2004-08-24 22:41 EDT, Suresh Siddha
no flags Details | Diff

  None (edit)
Description Suresh Siddha 2004-08-24 22:39:02 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
This bug is fixed in latest 2.4 and 2.6 bktrees.

We might get into page fault handler even if the region 5 address is 
valid, due to the VHPT walker inserting a non present translation 
that becomes stale. And as page fault handler in EL3 doesn't handle 
not-present translations for region 5, it will oops.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Kernel will fail to boot if there is lot of interrupt activity 
handled by the modules(vmalloc'd text)
2.
3.
    

Additional info:
Comment 1 Suresh Siddha 2004-08-24 22:41:08 EDT
Created attachment 103049 [details]
Patch handling not-present faults for region 5

Patch is straight from bkbits.

http://linux.bkbits.net:8080/linux-2.4/gnupatch@3ec5621bdgHJtDWJBf1fOJP3ZZA8hA
Comment 2 Larry Woodman 2004-08-27 07:34:17 EDT
Suresh, pardon my ignorance here but how does this happen?  If the
kernel only performs atomic updates to the ptes(never clears one bit
at a time leaving the pte in an inconsistant state) how does the
VHPTwalker insert a TLBentry thats half-baked?  If there are cases
that the pte is in some inconsistant/interm state, should we fix that
instead?

Thanks, Larry 
Comment 3 Suresh Siddha 2004-08-27 13:21:04 EDT
Here is the failing sequence

t0: On cpu1, while the kernel is servicing requests from driver 
module A, hardware VHPT walker inserts the empty pte's(page not 
present entries) around the module code address 'A' into the TLB's

t1: On cpu0, as part of loading new module 'B', vmalloc_area_pages() 
sets up the pte's for module 'B' in swapper_pg_dir without doing 
flush_tlb_all() (This is OK because we do flush_tlb_all() in 
vmfree_area_pages()). But this module 'B' address happens to be same 
as the empty pte's(page not present entries) that got loaded onto 
cpu1 tlbs in step 't0' above.

t2: When the module 'B' code starts executing on cpu1, because of 
page not present entries in cpu1's TLB it gets a page_not_present 
fault. And as the page_fault handler doesn't handle  faults in 
region '5' it simply oops.

As page_not_present handler purges the corresponding not present TLB 
entry, next page rewalk will succeed.
Comment 4 Larry Woodman 2004-08-27 15:33:38 EDT
OK.

Larry
Comment 5 Dave Anderson 2004-09-13 14:09:52 EDT
Either my patch:

http://post-office.corp.redhat.com/archives/rhkernel-list/2004-August/msg00394.html

or Norm Murray's patch:

http://post-office.corp.redhat.com/archives/rhkernel-list/2004-August/msg00405.html

will address this issue.  Norm's was generated from an LLNL IT, but
is identical except for the addition of a KERN_CRIT to the beginning
of a printk() in do_page_fault().

Comment 8 Suresh Siddha 2004-09-13 21:25:08 EDT
I can't access the above mentioned post-office URL. Please let me 
know if you need any more info or if you think patch posted in 
comment #1 isn't enough
Comment 9 Ernie Petrides 2004-09-13 21:51:12 EDT
Hi, Suresh.  The URLs in comment #5 are restricted to Red Hat.
A minor variation of your patch (due to a RHEL3 porting issue)
is on track for U4.  I'll update this bug report when the patch
is committed (in the next day or two).

Thanks for isolating the problem and providing the patch.
Comment 10 Ernie Petrides 2004-09-14 20:10:22 EDT
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.6.EL).
Comment 11 John Flanagan 2004-12-20 15:55:59 EST
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.