Bug 154639 (IT_71391)
| Summary: | kernel thread current->mm dereference in grab_swap_token causes oops | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Keith Holder <keith.holder> |
| Component: | kernel | Assignee: | Rik van Riel <riel> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.0 | CC: | davej, linux26port, rkenna, tao |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2005-06-08 15:14:07 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 231639 | ||
I'll submit a (trivial) patch for the RHEL4 kernel to ignore a current->mm of NULL. Btw, note that for most kernel threads current->mm tends to be &init_mm... Btw, have you verified that adding the check really fixes the issue, or does the kernel simply crash elsewhere? I am not 100% sure that running any task with a NULL ->mm is valid... When calling daemonize() to create a kernel thread, it calls exit_mm(). This sets tsk->mm to NULL and the thread/process runs with a 'lazy_tlb'. Also, I thought all kernel threads didn't have an address space, hence mm is supposed to be NULL. You're right. Hmmm, I could've sworn they got moved to &init_mm. Anyway, the patch has been submitted for inclusion into RHEL4 yesterday, and got approved. Thank you for alerting us to this bug. I have run a Veritas specific stress test (odmstress) non-stop for over 100 hours with this patch fix (U1-kernel-2.6.9-6.43) and the testing was successful. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-420.html |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0 Description of problem: Veritas has a kernel thread that performs asynchronous direct i/o on behalf of user programs. When the i/o data is available the kernel thread calls get_user_pages(), to make sure the user pages are present before moving the data into user buffers. A side effect of that is, if the user program happens to be paged out, we end up in grab_swap_token, with current->mm set to NULL. Unfortunately the code dereferences current->mm without checking whether it is NULL. stack functions back trace :- grab_swap_token() do_swap_page() handle_pte_fault() handle_mm_fault() get_user_pages() ... ----- A suitable fix would be to add the following to start of grab_swap_token() if (!current->mm) return; Version-Release number of selected component (if applicable): kernel-smp-2.6.9-5.EL How reproducible: Sometimes Steps to Reproduce: 1. Install veritas software stack onto system 2. Run Veritas' Oracle Data Manager stress suite 3. Actual Results: After several hours and under heavy stress the system oops in grab_swap_token() Expected Results: Function should just return if current->mm is NULL *or* allow an mm_sruct pointer to be passed in as an argument. However, it still needs to check for mm_struct pointer being NULL. Additional info: