Bug 174174 - kernel crash in page_referenced() in kswapd context
Summary: kernel crash in page_referenced() in kswapd context
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Rik van Riel
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-11-25 13:09 UTC by norbert
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-29 00:14:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
some data structures I extracted from the dump (7.77 KB, text/plain)
2005-11-25 13:32 UTC, norbert
no flags Details

Description norbert 2005-11-25 13:09:38 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20050920

Description of problem:
I haven't found this particular stack trace in any of the bugs reported regarding page_referenced. I may not have looked at all reported bugs but I am choosing to report our kernel crash here:

  SYSTEM MAP: ../rh_3.1/System.map-2.4.21-9.ELsmp                      
DEBUG KERNEL: ../rh_3.1/vmlinux_smp_dbg (2.4.21-9.ELsmp)
    DUMPFILE: ./vmcore
        CPUS: 4
        DATE: Tue Nov 15 09:08:40 2005
      UPTIME: 129 days, 12:38:58
LOAD AVERAGE: 1.41, 1.24, 1.10
       TASKS: 335
    NODENAME: XXXXX
     RELEASE: 2.4.21-9.ELsmp
     VERSION: #1 SMP Thu Jan 8 17:08:56 EST 2004
     MACHINE: i686  (3065 Mhz)
      MEMORY: 3 GB
       PANIC: ""
         PID: 11
     COMMAND: "kswapd"
        TASK: c3d70000  
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

The kernel crashes in page_referenced:
Unable to handle kernel NULL pointer dereference at virtual address 00000084
 printing eip:
c015c67d
*pde = 256f4001
*pte = 00000000
Oops: 0000
Tam ocl cpqci netconsole autofs bcm5700 bonding 8021q sg microcode keybdev mousedev hid input usb-ohci usbcore ext3 jbd qla2300_conf cciss sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c015c67d>]    Tainted: P
EFLAGS: 00010216

stack trace:
PID: 11     TASK: c3d70000  CPU: 1   COMMAND: "kswapd"
 #0 [c3d71c70] netconsole_netdump at fe78263e
 #1 [c3d71e04] die at c010c4b1
 #2 [c3d71e18] do_page_fault at c011f785
 #3 [c3d71edc] error_code (via page_fault) at c03f21b0
    EAX: c29127cc  EBX: fff94fd8  ECX: 00000000  EDX: c100002c  EBP: 00000fd8 
    DS:  0068      ESI: c29127cc  ES:  0068      EDI: c100002c 
    CS:  0060      EIP: c015c67d  ERR: ffffffff  EFLAGS: 00010216 
 #4 [c3d71f18] page_referenced at c015c67d
 #5 [c3d71f4c] refill_inactive_zone at c015269d
 #6 [c3d71f98] rebalance_inactive_zone at c0153518
 #7 [c3d71fac] do_try_to_free_pages_kswapd at c01537f0
 #8 [c3d71fd0] kswapd at c0153a33
 #9 [c3d71ff0] kernel_thread_helper at c010958b

I think it crashes here:
eip=c015c67d, which causes the crash
0xc015c677 <page_referenced+199>:       lea    (%eax,%edx,4),%eax; eax is of
type struct page
0xc015c67a <page_referenced+202>:       mov    0x8(%eax),%ecx;ecx=mm=page->mapping
0xc015c67d <page_referenced+205>:       mov    0x84(%ecx),%eax; eax=mm->rlimit_rss  <===

and thus:
int page_referenced(struct page * page, int * rsslimit)
{
        int referenced = 0, under_rsslimit = 0;
        struct mm_struct * mm;
        struct pte_chain * pc;

        if (PageTestandClearReferenced(page))
                referenced++;

        if (PageDirect(page)) {
                pte_t *pte = rmap_ptep_map(page->pte.direct);
                if (pte_young(*pte) && ptep_test_and_clear_young(pte))
                        referenced++;

                mm = ptep_to_mm(pte);            <===
                if (mm->rss < mm->rlimit_rss)    <===
                        under_rsslimit++;
                rmap_ptep_unmap(pte);
        } else {
...
}
The problem is that ptep_to_mm() returns 0.
crash> p/x &((struct mm_struct*)0)->rlimit_rss
$14 = 0x84
Thus, it crashes at address 0x84 as the oops messages reports.

Version-Release number of selected component (if applicable):
kernel 2.4.21-9.ELsmp

How reproducible:
Didn't try


Additional info:

Comment 1 norbert 2005-11-25 13:32:03 UTC
Created attachment 121486 [details]
some data structures I extracted from the dump

Comment 2 Ernie Petrides 2005-11-29 00:14:13 UTC
This problem was fixed in RHEL3 U3 with a change committed on 25-Jun-2004.

Please upgrade to a recent kernel (latest is U6, kernel version 2.4.21-37.EL).


Note You need to log in before you can comment on or make changes to this bug.