Bug 149543
Summary: | The writepage() race | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Wendy Cheng <nobody+wcheng> | ||||
Component: | kernel | Assignee: | Larry Woodman <lwoodman> | ||||
Status: | CLOSED WONTFIX | QA Contact: | |||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 3.0 | CC: | anderson, cel, k.georgiou, petrides, riel, sct, tao | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2007-10-19 19:07:07 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Wendy Cheng
2005-02-23 21:14:54 UTC
Created attachment 111352 [details]
Test program to reproduce the hang.
Test program to recreate the hang.
Created attachment 111353 [details]
An experimental kernel patch
Larry Woodman drafted a test patch for this issue but it *doesn't* stop the
hang.
A test kernel with this patch has been installed in customer's test machine -
waiting for their result.
Look like GFS can trigger this too (a GFS crash via IT#65377): Unable to handle kernel paging request at virtual address 00460032 printing eip: c0154641 *pde = 00000000 Oops: 0002 netconsole gfs lock_gulm crc32 lock_harness pool e1000 bonding ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables floppy microcode loop lvm-mod keybde CPU: 0 EIP: 0060:[<c0154641>] Not tainted EFLAGS: 00010246 EIP is at launder_page [kernel] 0x81 (2.4.21-27.0.2.ELsmp/i686) eax: c03a8240 ebx: c48e5fe0 ecx: c03a7080 edx: 00460032 esi: c48e5ffc edi: 000011c4 ebp: c03a7080 esp: c6671f64 ds: 0068 es: 0068 ss: 0068 Process kswapd (pid: 11, stackpage=c6671000) Stack: c0134ef0 00000000 c48e5ff4 00000000 0008bd88 c03a7080 000011c4 00000001 c015660b c03a7080 000001d0 c48e5fe0 c03a8240 00000067 c03a7080 00026bdd 00000001 00000040 c0156c0b c03a7080 00000100 000001d0 0002fb14 00000000 Call Trace: [<c0134ef0>] process_timeout [kernel] 0x0 (0xc6671f64) [<c015660b>] rebalance_dirty_zone [kernel] 0xab (0xc6671f84) [<c0156c0b>] do_try_to_free_pages_kswapd [kernel] 0x1eb (0xc6671fac) [<c0156d38>] kswapd [kernel] 0x68 (0xc6671fd0) [<c0156cd0>] kswapd [kernel] 0x0 (0xc6671fe4) [<c01095ad>] kernel_thread_helper [kernel] 0x5 1) Conf call with Anthony Golia (the customer that has the hang) - we discussed the possibility that the deadlock might be caused by the network stacks short of memory to ship the dirty pages back to NFS server (and that's the only way to free these pages). Two hours later, he reported that the hang went away via the following vm tuning: vm.kswapd='16384 1024 256' vm.pagecache='2 15 30' 2) The panic customer hasn't reported any new panic (they used to crash on a daily basis) using Larry's patch. In short, I think we're in good shape on the problem so far. Will double-check the status sometime next week. The test kernel with Larry's patch passed customer's QA testing and is added into their production system this morning. Waiting for further results. So far so good. Any further results on this testing? Larry Here's the latest info that I posted in the associated IT, re: today's vmcore, which was still running stock 2.4.21-27. In this dumpfile, kswapd was simply trying to free an available slab cache page: crash> bt PID: 11 TASK: cbb62000 CPU: 3 COMMAND: "kswapd" #0 [cbb63cc0] netconsole_netdump at f8a1c77a #1 [cbb63e58] try_crashdump at c0128c83 #2 [cbb63e68] die at c010c672 #3 [cbb63e7c] do_page_fault at c011fff9 #4 [cbb63f40] error_code (via page_fault) at c03f21c0 EAX: cbb3f648 EBX: cbb3f638 ECX: e7756000 EDX: 00000000 EBP: 00000040 DS: 0068 ESI: 0000071e ES: 0068 EDI: cbb3f648 CS: 0060 EIP: c0151a9f ERR: ffffffff EFLAGS: 00010016 #5 [cbb63f7c] __kmem_cache_shrink_locked at c0151a9f #6 [cbb63f94] kmem_cache_shrink at c0151b74 #7 [cbb63fa0] shrink_dcache_memory at c017e008 #8 [cbb63fac] do_try_to_free_pages_kswapd at c0156adb #9 [cbb63fd0] kswapd at c0156ca3 #10 [cbb63ff0] kernel_thread_helper at c01095ab crash> A full "kmem -s" showed that the dentry cache, which is the one being shrunk, has a corrupted slab; just looking at it alone shows: crash> kmem -s dentry_cache CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE kmem: dentry_cache: free list: slab: e7756000 bad prev pointer: 0 kmem: dentry_cache: free list: slab: e7756000 bad s_mem pointer: 0 cbb3f638 dentry_cache 128 82228 118710 3957 4k crash> __kmem_cache_shrink_locked() was in the process of unlinking the last slab contained on the dentry_cache's slabs_free list: slabp = list_entry(cachep->slabs_free.prev, slab_t, list); #if DEBUG if (slabp->inuse) BUG(); #endif list_del(&slabp->list); <== oops occurred here The dentry_cache's "slabs_free" chain head shows that it's "prev" pointer has just been updated with the contents of the "prev" pointer contained in the slab at e7756000 that it is unlinking -- which is a zero (not good): crash> kmem_cache_s cbb3f638 struct kmem_cache_s { slabs_full = { next = 0xe2dbf000, prev = 0xe678e000 }, slabs_partial = { next = 0xf24e8000, prev = 0xdc9d2000 }, slabs_free = { next = 0xd0a25000, prev = 0x0 <== copied from the prev pointer of the }, slab being unlinked. objsize = 0x80, flags = 0x22000, num = 0x1e, spinlock = { lock = 0x0 }, batchcount = 0xfc, gfporder = 0x0, gfpflags = 0x0, colour = 0x0, colour_off = 0x80, colour_next = 0x0, slabp_cache = 0x0, growing = 0x0, dflags = 0x1, ctor = 0, dtor = 0, failures = 0x0, name = "dentry_cache\000\000\000\000\000\000\000", ... And examining the actual slab_t being unlinked, the corruption is evident. The slab cache page should contain a slab_t data structure at the beginning, followed by 30 dentry structures. However, it looks like this: crash> rd e7756000 1024 e7756000: cbb3f648 00000000 00000000 00000000 H............... e7756010: 00000000 00000000 00000000 00000000 ................ e7756020: 00000000 00000000 00000000 00000000 ................ e7756030: 00000000 00000000 00000000 00000000 ................ e7756040: 00000000 00000000 00000000 00000000 ................ e7756050: 00000000 00000000 00000000 00000000 ................ e7756060: 00000000 00000000 00000000 00000000 ................ e7756070: 00000000 00000000 00000000 00000000 ................ e7756080: 00000000 00000000 00000000 00000000 ................ e7756090: 00000000 00000000 00000000 00000000 ................ e77560a0: 00000000 00000000 00000000 00000000 ................ e77560b0: 00000000 00000000 00000000 00000000 ................ e77560c0: 00000000 00000000 00000000 00000000 ................ e77560d0: 00000000 00000000 00000000 00000000 ................ e77560e0: 00000000 00000000 00000000 00000000 ................ e77560f0: 00000000 00000000 00000000 00000000 ................ e7756100: 00000000 00000000 00000000 00000000 ................ e7756110: 00000000 00000000 00000000 00000000 ................ e7756120: 00000000 00000000 00000000 00000000 ................ e7756130: 00000000 00000000 00000000 00000000 ................ e7756140: 00000000 00000000 00000000 00000000 ................ e7756150: 00000000 00000000 00000000 00000000 ................ e7756160: 00000000 00000000 00000000 00000000 ................ e7756170: 00000000 00000000 00000000 00000000 ................ e7756180: 00000000 00000000 00000000 00000000 ................ e7756190: 00000000 00000000 00000000 00000000 ................ e77561a0: 00000000 00000000 00000000 00000000 ................ e77561b0: 00000000 00000000 00000000 00000000 ................ e77561c0: 00000000 00000000 00000000 00000000 ................ e77561d0: 00000000 00000000 00000000 00000000 ................ e77561e0: 00000000 00000000 00000000 00000000 ................ e77561f0: 00000000 00000000 00000000 00000000 ................ e7756200: 00000000 00000000 00000000 00000000 ................ e7756210: 00000000 00000000 00000000 00000000 ................ e7756220: 00000000 00000000 00000000 00000000 ................ e7756230: 00000000 00000000 00000000 00000000 ................ e7756240: 00000000 00000000 00000000 00000000 ................ e7756250: 00000000 00000000 00000000 00000000 ................ e7756260: 00000000 00000000 00000000 00000000 ................ e7756270: 00000000 00000000 00000000 00000000 ................ e7756280: 00000000 00000000 00000000 00000000 ................ e7756290: 00000000 00000000 00000000 00000000 ................ e77562a0: 00000000 00000000 00000000 00000000 ................ e77562b0: 00000000 00000000 00000000 00000000 ................ e77562c0: 00000000 00000000 00000000 00000000 ................ e77562d0: 00000000 00000000 00000000 00000000 ................ e77562e0: 00000000 00000000 00000000 00000000 ................ e77562f0: 00000000 00000000 00000000 00000000 ................ e7756300: 00000000 00000000 00000000 00000000 ................ e7756310: 00000000 00000000 00000000 00000000 ................ e7756320: 00000000 00000000 00000000 00000000 ................ e7756330: 00000000 00000000 00000000 00000000 ................ e7756340: 00000000 00000000 046597b9 f8a6ad24 ..........e.$... e7756350: e4f9e400 00000000 00000000 00000000 ................ e7756360: 7119b738 7119b778 7119b7b8 7119b7f8 8..qx..q...q...q e7756370: 7119b838 7119b878 00000000 7119b8f8 8..qx..q.......q e7756380: 00000000 00000000 00000000 e0eff380 ................ e7756390: e7756390 e7756390 e7756398 e7756398 .cu..cu..cu..cu. e77563a0: 00000000 00000000 e77563a8 e77563a8 .........cu..cu. e77563b0: e77563b0 e77563b0 00000000 d54ab700 .cu..cu.......J. e77563c0: 00000025 e8adef84 046597b9 f8a6ad24 %.........e.$... e77563d0: e4f9e400 00000000 00000000 00000000 ................ e77563e0: 711a1748 711a1cd0 711a1d10 711a1d50 H..q...q...qP..q e77563f0: 711a1d90 711a1e10 00000000 711a1f30 ...q...q....0..q e7756400: 00000000 00000000 00000000 e0eff380 ................ e7756410: e7756410 e7756410 e7756418 e7756418 .du..du..du..du. e7756420: 00000000 00000000 e7756428 e7756428 ........(du.(du. e7756430: e7756430 e7756430 00000000 d54ab780 0du.0du.......J. e7756440: 00000025 e8869dd4 046597b9 f8a6ad24 %.........e.$... e7756450: e4f9e400 00000000 00000000 00000000 ................ e7756460: 711cec20 711cec60 711ceca0 711cece0 ..q`..q...q...q e7756470: 711ced20 711ced60 00000000 711d8278 ..q`..q....x..q e7756480: 00000000 00000000 00000000 e0eff380 ................ e7756490: e7756490 e7756490 e7756498 e7756498 .du..du..du..du. e77564a0: 00000000 00000000 e77564a8 e77564a8 .........du..du. e77564b0: e77564b0 e77564b0 00000000 d54ab800 .du..du.......J. e77564c0: 00000025 e85f4c24 046597b9 f8a6ad24 %...$L_...e.$... e77564d0: e4f9e400 00000000 00000000 00000000 ................ e77564e0: 711d8938 711d89b8 711d89f8 711d8a38 8..q...q...q8..q e77564f0: 711d8a78 711d8ab8 00000000 711d8b38 x..q...q....8..q e7756500: 00000000 00000000 00000000 e0eff380 ................ e7756510: e7756510 e7756510 e7756518 e7756518 .eu..eu..eu..eu. e7756520: 00000000 00000000 e7756528 e7756528 ........(eu.(eu. e7756530: e7756530 e7756530 00000000 d54ab880 0eu.0eu.......J. e7756540: 00000025 e837fa74 046597b9 f8a6ad24 %...t.7...e.$... e7756550: e4f9e400 00000000 00000000 00000000 ................ e7756560: 711d91f8 711d9238 71205e20 71205e60 ...q8..q ^ q`^ q e7756570: 71205ea0 71205ee0 00000000 71205f60 .^ q.^ q....`_ q e7756580: 00000000 00000000 00000000 e0eff380 ................ e7756590: e7756590 e7756590 e7756598 e7756598 .eu..eu..eu..eu. e77565a0: 00000000 00000000 e77565a8 e77565a8 .........eu..eu. e77565b0: e77565b0 e77565b0 00000000 d54ab900 .eu..eu.......J. e77565c0: 00000025 e810a8c4 046597b9 f8a6ad24 %.........e.$... e77565d0: e4f9e400 00000000 00000000 00000000 ................ e77565e0: 71206b38 71206b78 71206bb8 71206ec0 8k qxk q.k q.n q e77565f0: 71206f00 71206f40 00000000 71206fc0 .o q@o q.....o q e7756600: 00000000 00000000 00000000 e0eff380 ................ e7756610: e7756610 e7756610 e7756618 e7756618 .fu..fu..fu..fu. e7756620: 00000000 00000000 e7756628 e7756628 ........(fu.(fu. e7756630: e7756630 e7756630 00000000 d54ab980 0fu.0fu.......J. e7756640: 00000025 e7e95714 046597b9 f8a6ad24 %....W....e.$... e7756650: e4f9e400 00000000 00000000 00000000 ................ e7756660: 71207740 71207780 712077c0 71207800 @w q.w q.w q.x q e7756670: 71207840 71207880 00000000 71208568 @x q.x q....h. q e7756680: 00000000 00000000 00000000 e0eff380 ................ e7756690: e7756690 e7756690 e7756698 e7756698 .fu..fu..fu..fu. e77566a0: 00000000 00000000 e77566a8 e77566a8 .........fu..fu. e77566b0: e77566b0 e77566b0 00000000 d54aba00 .fu..fu.......J. e77566c0: 00000025 e7c20564 046597b9 f8a6ad24 %...d.....e.$... e77566d0: e4f9e400 00000000 00000000 00000000 ................ e77566e0: 71209588 712095c8 71209608 71209648 .. q.. q.. qH. q e77566f0: 71209688 712096c8 00000000 71209f10 .. q.. q...... q e7756700: 00000000 00000000 00000000 e0eff380 ................ e7756710: e7756710 e7756710 e7756718 e7756718 .gu..gu..gu..gu. e7756720: 00000000 00000000 e7756728 e7756728 ........(gu.(gu. e7756730: e7756730 e7756730 00000000 d54aba80 0gu.0gu.......J. e7756740: 00000025 e79ab3b4 046597b9 f8a6ad24 %.........e.$... e7756750: e4f9e400 00000000 00000000 00000000 ................ e7756760: 0000006c ffffffff 00000001 00000067 l...........g... e7756770: ffffffff 00000002 00000000 ffffffff ................ e7756780: 00000000 00000000 00000000 e0eff380 ................ e7756790: e7756790 e7756790 e7756798 e7756798 .gu..gu..gu..gu. e77567a0: 00000000 00000000 e77567a8 e77567a8 .........gu..gu. e77567b0: e77567b0 e77567b0 00000000 d54abb00 .gu..gu.......J. e77567c0: 00000025 e74c1054 046597b9 f8a6ad24 %...T.L...e.$... e77567d0: e4f9e400 00000000 00000000 00000000 ................ e77567e0: 00000051 00000023 ffffffff 00000066 Q...#.......f... e77567f0: 0000003c ffffffff 00000000 00000078 <...........x... e7756800: 00000000 00000000 00000000 e0eff380 ................ e7756810: e7756810 e7756810 e7756818 e7756818 .hu..hu..hu..hu. e7756820: 00000000 00000000 e7756828 e7756828 ........(hu.(hu. e7756830: e7756830 e7756830 00000000 d54abb80 0hu.0hu.......J. e7756840: 00000025 e724bea4 046597b9 f8a6ad24 %.....$...e.$... e7756850: e4f9e400 00000000 00000000 00000000 ................ e7756860: 00000000 00000000 00000000 00000000 ................ e7756870: 00000000 00000000 00000000 00000000 ................ e7756880: 00000000 00000000 00000000 e0eff380 ................ e7756890: e7756890 e7756890 e7756898 e7756898 .hu..hu..hu..hu. e77568a0: 00000000 00000000 e77568a8 e77568a8 .........hu..hu. e77568b0: e77568b0 e77568b0 00000000 d54abc00 .hu..hu.......J. e77568c0: 00000025 e6fd6cf4 046597b9 f8a6ad24 %....l....e.$... e77568d0: e4f9e400 00000000 00000000 00000000 ................ e77568e0: 00000000 00000000 00000000 00000000 ................ e77568f0: 00000000 00000000 00000000 00000000 ................ e7756900: 00000000 00000000 00000000 e0eff380 ................ e7756910: e7756910 e7756910 e7756918 e7756918 .iu..iu..iu..iu. e7756920: 00000000 00000000 e7756928 e7756928 ........(iu.(iu. e7756930: e7756930 e7756930 00000000 d54abc80 0iu.0iu.......J. e7756940: 00000025 e6d61b44 046597b9 f8a6ad24 %...D.....e.$... e7756950: e4f9e400 00000000 00000000 00000000 ................ e7756960: 00000000 00000000 00000000 00000000 ................ e7756970: 00000000 00000000 00000000 00000000 ................ e7756980: 00000000 00000000 00000000 e0eff380 ................ e7756990: e7756990 e7756990 e7756998 e7756998 .iu..iu..iu..iu. e77569a0: 00000000 00000000 e77569a8 e77569a8 .........iu..iu. e77569b0: e77569b0 e77569b0 00000000 d54abd00 .iu..iu.......J. e77569c0: 00000025 e6aec994 046597b9 f8a6ad24 %.........e.$... e77569d0: e4f9e400 00000000 00000000 00000000 ................ e77569e0: 00000000 00000000 00000000 00000000 ................ e77569f0: 00000000 00000000 00000000 00000000 ................ e7756a00: 00000000 00000000 00000000 e0eff380 ................ e7756a10: e7756a10 e7756a10 e7756a18 e7756a18 .ju..ju..ju..ju. e7756a20: 00000000 00000000 e7756a28 e7756a28 ........(ju.(ju. e7756a30: e7756a30 e7756a30 00000000 d54abd80 0ju.0ju.......J. e7756a40: 00000025 e68777e4 046597b9 f8a6ad24 %....w....e.$... e7756a50: e4f9e400 00000000 00000000 00000000 ................ e7756a60: 00000000 00000000 00000000 00000000 ................ e7756a70: 00000000 00000000 00000000 00000000 ................ e7756a80: 00000000 00000000 00000000 e0eff380 ................ e7756a90: e7756a90 e7756a90 e7756a98 e7756a98 .ju..ju..ju..ju. e7756aa0: 00000000 00000000 e7756aa8 e7756aa8 .........ju..ju. e7756ab0: e7756ab0 e7756ab0 00000000 d54abe00 .ju..ju.......J. e7756ac0: 00000025 e6602634 046597b9 f8a6ad24 %...4&`...e.$... e7756ad0: e4f9e400 00000000 00000000 00000000 ................ e7756ae0: 00000000 00000000 00000000 00000000 ................ e7756af0: 00000000 00000000 00000000 00000000 ................ e7756b00: 00000000 00000000 00000000 e0eff380 ................ e7756b10: e7756b10 e7756b10 e7756b18 e7756b18 .ku..ku..ku..ku. e7756b20: 00000000 00000000 e7756b28 e7756b28 ........(ku.(ku. e7756b30: e7756b30 e7756b30 00000000 d54abe80 0ku.0ku.......J. e7756b40: 00000025 e638d484 046597b9 f8a6ad24 %.....8...e.$... e7756b50: e4f9e400 00000000 00000000 00000000 ................ e7756b60: 00000001 aeff6218 00000100 aeff16d8 .....b.......... e7756b70: af0d3018 af1fac18 00000000 af1fac68 .0..........h... e7756b80: 00000000 00000000 00000000 e0eff380 ................ e7756b90: e7756b90 e7756b90 e7756b98 e7756b98 .ku..ku..ku..ku. e7756ba0: 00000000 00000000 e7756ba8 e7756ba8 .........ku..ku. e7756bb0: e7756bb0 e7756bb0 00000000 d54abf00 .ku..ku.......J. e7756bc0: 00000025 e61182d4 046597b9 f8a6ad24 %.........e.$... e7756bd0: e4f9e400 00000000 00000000 00000000 ................ e7756be0: af235728 af235760 711f01e8 71205938 (W#.`W#....q8Y q e7756bf0: 71206020 71205dd8 00000000 712061f8 ` q.] q.....a q e7756c00: 00000000 00000000 00000000 e0eff380 ................ e7756c10: e7756c10 e7756c10 e7756c18 e7756c18 .lu..lu..lu..lu. e7756c20: 00000000 00000000 e7756c28 e7756c28 ........(lu.(lu. e7756c30: e7756c30 e7756c30 00000000 d54abf80 0lu.0lu.......J. e7756c40: 00000025 e5ea3124 046597b9 f8a6ad24 %...$1....e.$... e7756c50: e4f9e400 00000000 00000000 00000000 ................ e7756c60: 00000000 00000000 00000000 00000000 ................ e7756c70: 00000000 00000000 00000000 00000000 ................ e7756c80: 00000000 00000000 00000000 e0eff380 ................ e7756c90: e7756c90 e7756c90 e7756c98 e7756c98 .lu..lu..lu..lu. e7756ca0: 00000000 00000000 e7756ca8 e7756ca8 .........lu..lu. e7756cb0: e7756cb0 e7756cb0 00000000 e20bf100 .lu..lu......... e7756cc0: 00000025 e59b8dc4 046597b9 f8a6ad24 %.........e.$... e7756cd0: e4f9e400 00000000 00000000 00000000 ................ e7756ce0: 00000000 00000000 00000000 00000000 ................ e7756cf0: 00000000 00000000 00000000 00000000 ................ e7756d00: 00000000 00000000 00000000 e0eff380 ................ e7756d10: e7756d10 e7756d10 e7756d18 e7756d18 .mu..mu..mu..mu. e7756d20: 00000000 00000000 e7756d28 e7756d28 ........(mu.(mu. e7756d30: e7756d30 e7756d30 00000000 e20bf180 0mu.0mu......... e7756d40: 00000025 e5743c14 046597b9 f8a6ad24 %....<t...e.$... e7756d50: e4f9e400 00000000 00000000 00000000 ................ e7756d60: 00000000 00000000 00000000 00000000 ................ e7756d70: 00000000 00000000 00000000 00000000 ................ e7756d80: 00000000 00000000 00000000 e0eff380 ................ e7756d90: e7756d90 e7756d90 e7756d98 e7756d98 .mu..mu..mu..mu. e7756da0: 00000000 00000000 e7756da8 e7756da8 .........mu..mu. e7756db0: e7756db0 e7756db0 00000000 e20bf200 .mu..mu......... e7756dc0: 00000025 e54cea64 046597b9 f8a6ad24 %...d.L...e.$... e7756dd0: e4f9e400 00000000 00000000 00000000 ................ e7756de0: 00000000 00000000 00000000 00000000 ................ e7756df0: 00000000 00000000 00000000 00000000 ................ e7756e00: 00000000 00000000 00000000 e0eff380 ................ e7756e10: e7756e10 e7756e10 e7756e18 e7756e18 .nu..nu..nu..nu. e7756e20: 00000000 00000000 e7756e28 e7756e28 ........(nu.(nu. e7756e30: e7756e30 e7756e30 00000000 e20bf280 0nu.0nu......... e7756e40: 00000025 e52598b4 046597b9 f8a6ad24 %.....%...e.$... e7756e50: e4f9e400 00000000 00000000 00000000 ................ e7756e60: 00000000 00000000 00000000 00000000 ................ e7756e70: 00000000 00000000 00000000 00000000 ................ e7756e80: 00000000 00000000 00000000 e0eff380 ................ e7756e90: e7756e90 e7756e90 e7756e98 e7756e98 .nu..nu..nu..nu. e7756ea0: 00000000 00000000 e7756ea8 e7756ea8 .........nu..nu. e7756eb0: e7756eb0 e7756eb0 00000000 e20bf300 .nu..nu......... e7756ec0: 00000025 e4fe4704 046597b9 f8a6ad24 %....G....e.$... e7756ed0: e4f9e400 00000000 00000000 00000000 ................ e7756ee0: 00000000 00000000 00000000 00000000 ................ e7756ef0: 00000000 00000000 00000000 00000000 ................ e7756f00: 00000000 00000000 00000000 e0eff380 ................ e7756f10: e7756f10 e7756f10 e7756f18 e7756f18 .ou..ou..ou..ou. e7756f20: 00000000 00000000 e7756f28 e7756f28 ........(ou.(ou. e7756f30: e7756f30 e7756f30 00000000 e20bf380 0ou.0ou......... e7756f40: 00000025 e4d6f554 046597b9 f8a6ad24 %...T.....e.$... e7756f50: e4f9e400 00000000 00000000 00000000 ................ e7756f60: 00000000 00000000 00000000 00000000 ................ e7756f70: 00000001 af1ee720 00000000 00000000 .... ........... e7756f80: 00000000 00000000 00000000 e0eff380 ................ e7756f90: e7756f90 e7756f90 e7756f98 e7756f98 .ou..ou..ou..ou. e7756fa0: 00000000 00000000 e7756fa8 e7756fa8 .........ou..ou. e7756fb0: e7756fb0 e7756fb0 00000000 e20bf400 .ou..ou......... e7756fc0: 00000025 e4afa3a4 046597b9 f8a6ad24 %.........e.$... e7756fd0: e4f9e400 00000000 00000000 00000000 ................ e7756fe0: 00000000 00000000 00000000 00000000 ................ e7756ff0: 00000000 00000000 00000000 00000000 ................ crash> Obviously the the first 0x348 (840) bytes have been corrupted (i.e., up to e7756348), *except* for the first word in the buffer (cbb3f648), which correctly points back to the "slabs_free" list_head in the dentry_cache's kmem_cache_s. But the second (and subsequent) words containing zeroes is the problem; the second work's supposed to contain the prev pointer back to the previous slab_t in the chain. That value was transferred to dentry_cache's slabs_free.prev as seen above, but when it tries to use that NULL pointer, the oops occurred. What is kind of interesting, but not illuminating in any wasy, is that the first word in the corrupted slab contains a valid pointer. But the remaining set of zeroed-out memory is completely unexplainable. It is somewhat reminiscent of the /proc/kcore issue, but in that case the last 496 bytes of a task_struct would be copied over the first 496 bytes of a slab page (or any other unlucky page for that matter). But there would be a definite signature in the corruption data that could be recognized, i.e., not a bunch of zeroes, but rather recognizable task_struct data. In any case, I haven't a clue as to how the slab page got corrupted in such a manner. There are other "known" corrupters out there, but none with this signature. That being said, I still wish that they would please upgrade to 2.4.21-31 from 2.4.21-27, so that we can debug from a current kernel. Also, the messages just prior to the oops are a bit troubling, and they harken back to the NFS discussions earlier in this case: ... NFS: Buggy server - nlink == 0! __nfs_fhget: iget failed NFS: Buggy server - nlink == 0! __nfs_fhget: iget failed Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0151a9f *pde = 1300f001 *pte = 00000000 Oops: 0002 ... Did those messages appear prior to any of the other crashes? The original problem traces did show race conditions between kswapd and kupdated across three different customer reports. However, with Larry's patch, we got different results: 1. IT65377 (GFS panic): customer did heavy poundings on the test kernel built with Larry's patch. The system sustained. They seem to be happy and have asked the patch to be included into our formal releases. 2. IT65627: the system still encounters different panics and crashes (as described in previous update by Dave Anderson) 3. IT65377 (hang in kswapd/kupdated): problem had proved to be caused by deadlock when kswapd tried to sync dirty pages back to nfs server but network stacks ran out of memory (kmalloc). I've removed this IT ticket out of this bugzilla. I would say we leave this bugzilla for IT65377 (so Larry can prepare to get the patch into formal release) and open other bugzillas for (2) and (3) ? Apparently IT 65377 was not related to this bug, so unlinking it. This bug is filed against RHEL 3, which is in maintenance phase. During the maintenance phase, only security errata and select mission critical bug fixes will be released for enterprise products. Since this bug does not meet that criteria, it is now being closed. For more information of the RHEL errata support policy, please visit: http://www.redhat.com/security/updates/errata/ If you feel this bug is indeed mission critical, please contact your support representative. You may be asked to provide detailed information on how this bug is affecting you. |