130843 – (IT_47550) not-present translations for region 5(vmalloc'd area) not handled

Bug 130843 (IT_47550) - not-present translations for region 5(vmalloc'd area) not handled

Summary: not-present translations for region 5(vmalloc'd area) not handled

Keywords:
Status:	CLOSED ERRATA
Alias:	IT_47550
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	ia64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Dave Anderson
QA Contact:	Brian Brock
Docs Contact:
URL:	http://linux.bkbits.net:8080/linux-2....
Whiteboard:
Depends On:
Blocks:	123574
TreeView+	depends on / blocked

Reported:	2004-08-25 02:39 UTC by Suresh Siddha
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:55:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch handling not-present faults for region 5 (1.35 KB, patch) 2004-08-25 02:41 UTC, Suresh Siddha	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Suresh Siddha 2004-08-25 02:39:02 UTC

From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)

Description of problem:
This bug is fixed in latest 2.4 and 2.6 bktrees.

We might get into page fault handler even if the region 5 address is 
valid, due to the VHPT walker inserting a non present translation 
that becomes stale. And as page fault handler in EL3 doesn't handle 
not-present translations for region 5, it will oops.


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.Kernel will fail to boot if there is lot of interrupt activity 
handled by the modules(vmalloc'd text)
2.
3.
    

Additional info:

Comment 1 Suresh Siddha 2004-08-25 02:41:08 UTC

Created attachment 103049 [details]
Patch handling not-present faults for region 5

Patch is straight from bkbits.

http://linux.bkbits.net:8080/linux-2.4/gnupatch@3ec5621bdgHJtDWJBf1fOJP3ZZA8hA

Comment 2 Larry Woodman 2004-08-27 11:34:17 UTC

Suresh, pardon my ignorance here but how does this happen?  If the
kernel only performs atomic updates to the ptes(never clears one bit
at a time leaving the pte in an inconsistant state) how does the
VHPTwalker insert a TLBentry thats half-baked?  If there are cases
that the pte is in some inconsistant/interm state, should we fix that
instead?

Thanks, Larry

Comment 3 Suresh Siddha 2004-08-27 17:21:04 UTC

Here is the failing sequence

t0: On cpu1, while the kernel is servicing requests from driver 
module A, hardware VHPT walker inserts the empty pte's(page not 
present entries) around the module code address 'A' into the TLB's

t1: On cpu0, as part of loading new module 'B', vmalloc_area_pages() 
sets up the pte's for module 'B' in swapper_pg_dir without doing 
flush_tlb_all() (This is OK because we do flush_tlb_all() in 
vmfree_area_pages()). But this module 'B' address happens to be same 
as the empty pte's(page not present entries) that got loaded onto 
cpu1 tlbs in step 't0' above.

t2: When the module 'B' code starts executing on cpu1, because of 
page not present entries in cpu1's TLB it gets a page_not_present 
fault. And as the page_fault handler doesn't handle  faults in 
region '5' it simply oops.

As page_not_present handler purges the corresponding not present TLB 
entry, next page rewalk will succeed.

Comment 4 Larry Woodman 2004-08-27 19:33:38 UTC

OK.

Larry

Comment 5 Dave Anderson 2004-09-13 18:09:52 UTC

Either my patch:

http://post-office.corp.redhat.com/archives/rhkernel-list/2004-August/msg00394.html

or Norm Murray's patch:

http://post-office.corp.redhat.com/archives/rhkernel-list/2004-August/msg00405.html

will address this issue.  Norm's was generated from an LLNL IT, but
is identical except for the addition of a KERN_CRIT to the beginning
of a printk() in do_page_fault().

Comment 8 Suresh Siddha 2004-09-14 01:25:08 UTC

I can't access the above mentioned post-office URL. Please let me 
know if you need any more info or if you think patch posted in 
comment #1 isn't enough

Comment 9 Ernie Petrides 2004-09-14 01:51:12 UTC

Hi, Suresh.  The URLs in comment #5 are restricted to Red Hat.
A minor variation of your patch (due to a RHEL3 porting issue)
is on track for U4.  I'll update this bug report when the patch
is committed (in the next day or two).

Thanks for isolating the problem and providing the patch.

Comment 10 Ernie Petrides 2004-09-15 00:10:22 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.6.EL).

Comment 11 John Flanagan 2004-12-20 20:55:59 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.