Bug 154639 (IT_71391) - kernel thread current->mm dereference in grab_swap_token causes oops
Summary: kernel thread current->mm dereference in grab_swap_token causes oops
Keywords:
Status: CLOSED ERRATA
Alias: IT_71391
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Rik van Riel
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 231639
TreeView+ depends on / blocked
 
Reported: 2005-04-13 09:08 UTC by Keith Holder
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-06-08 15:14:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:420 0 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 1 2005-06-08 04:00:00 UTC

Description Keith Holder 2005-04-13 09:08:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0

Description of problem:
Veritas has a kernel thread that performs asynchronous direct i/o on behalf of user programs. When the i/o data is available the kernel thread calls get_user_pages(), to make sure the user pages are present before moving the
data into user buffers. A side effect of that is, if the user program happens to be paged out, we end up in grab_swap_token, with current->mm set to NULL. Unfortunately the code dereferences current->mm without checking whether it is NULL.

stack functions back trace :-

grab_swap_token()
do_swap_page()
handle_pte_fault()
handle_mm_fault()
get_user_pages()
...
-----

A suitable fix would be to add the following to start of grab_swap_token()


if (!current->mm)
        return;



Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Install veritas software stack onto system
2. Run Veritas' Oracle Data Manager stress suite
3.
  

Actual Results:  After several hours and under heavy stress the system oops in grab_swap_token()

Expected Results:  Function should just return if current->mm is NULL *or* allow an mm_sruct pointer
to be passed in as an argument. However, it still needs to check for mm_struct
pointer being NULL.

Additional info:

Comment 1 Rik van Riel 2005-04-13 12:55:14 UTC
I'll submit a (trivial) patch for the RHEL4 kernel to ignore a current->mm of NULL.

Btw, note that for most kernel threads current->mm tends to be &init_mm...

Comment 2 Rik van Riel 2005-04-13 14:24:16 UTC
Btw, have you verified that adding the check really fixes the issue, or does the
kernel simply crash elsewhere?  I am not 100% sure that running any task with a
NULL ->mm is valid...

Comment 3 Keith Holder 2005-04-14 10:44:42 UTC
When calling daemonize() to create a kernel thread, it calls exit_mm().
This sets tsk->mm to NULL and the thread/process runs with a 'lazy_tlb'.
Also, I thought all kernel threads didn't have an address space, hence mm
is supposed to be NULL.

Comment 4 Rik van Riel 2005-04-14 10:54:12 UTC
You're right.  Hmmm, I could've sworn they got moved to &init_mm.

Anyway, the patch has been submitted for inclusion into RHEL4 yesterday, and got
approved. Thank you for alerting us to this bug.

Comment 14 Keith Holder 2005-05-09 13:43:59 UTC
I have run a Veritas specific stress test (odmstress) non-stop for over 100 
hours with this patch fix (U1-kernel-2.6.9-6.43) and the testing was successful.

Comment 15 Tim Powers 2005-06-08 15:14:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html



Note You need to log in before you can comment on or make changes to this bug.