Bug 678175
Summary: | [Intel 6.2 Bug] Lustre 1.8.5 causes Kernel Panics | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Michael Hebenstreit <michael.hebenstreit> | ||||||
Component: | kernel | Assignee: | Red Hat Kernel Manager <kernel-mgr> | ||||||
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 6.0 | CC: | hui.xiao, jane.lv, jvillalo, jwilleford, keve.a.gabbert, luyu, lwoodman, rdoty, rwheeler | ||||||
Target Milestone: | rc | Keywords: | Reopened | ||||||
Target Release: | 6.2 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-04-26 19:09:43 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 670196 | ||||||||
Attachments: |
|
Description
Michael Hebenstreit
2011-02-17 00:34:20 UTC
Created attachment 479243 [details]
possibly helpful files
###########################################################################
attached file containing sources, compiled modules, rnel config file, objdumps
############################################################################
objdump -M intel -S -d -r llite_lloop.ko > objdump_llite_lloop.ko
20110215_b/
20110215_b/src/
20110215_b/src/dcache.c
20110215_b/src/dir.c
20110215_b/src/file.c
20110215_b/src/llite_close.c
20110215_b/src/llite_lib.c
20110215_b/src/llite_lloop.mod.c
20110215_b/src/llite_mmap.c
20110215_b/src/llite_nfs.c
20110215_b/src/lloop.c
20110215_b/src/lproc_llite.c
20110215_b/src/lustre.mod.c
20110215_b/src/namei.c
20110215_b/src/rw.c
20110215_b/src/rw26.c
20110215_b/src/statahead.c
20110215_b/src/super25.c
20110215_b/src/symlink.c
20110215_b/src/xattr.c
20110215_b/src/llite_internal.h
20110215_b/llite_lloop.ko
20110215_b/lustre.ko
20110215_b/config
20110215_b/et06_x.log
20110215_b/objdump_lustre.ko
20110215_b/objdump_llite_lloop.ko
We don't build or support lustre - using it requires rebuilding the kernel. I think that your best bet is to raise this with the upstream lists. Please reopen this BZ if you can reproduce the issue with our kernel, thanks! incorrect - this is the patchless Lustre client - so the Kernel was NOT patched. Essentially it's the RH 6 kernel, with IB support removed and some other options set. InfiniBand was installed separetely (OFED 1.5.3RC4), but as you can see from the traces we are on the VFS layer, IB modules have not been touched. It would be nice if you would answer my questions regarding grab_cache_page_nowait(mapping, index): a) If the page was not in the cache and has been freshly allocated - should page->private be 0? Cause it looks to me it this is not always the case. The page->private==2 error is already present when the function returns. b) Is there a simple way to distinguish between a page freshly allocated and one already residing in the page cache? best regards Michael (In reply to comment #4) > a) If the page was not in the cache and has been freshly allocated - should > page->private be 0? Cause it looks to me it this is not always the case. The > page->private==2 error is already present when the function returns. 1.You could have to also dump page->flags before knowing what the page->private is. 2. A crash dump would be more helpful.. 3. Does upstream works or not? I've uploaded crash dump onto intel-s3e36-02.lab.bos.redhat.com. Would anyone take a look? Please let me know. [root@intel-s3e36-02 Lustre_kernel_crash]# uname -a Linux intel-s3e36-02.lab.bos.redhat.com 2.6.32 #1 SMP Fri Feb 18 00:25:20 EST 2011 x86_64 x86_64 x86_64 GNU/Linux [root@intel-s3e36-02 Lustre_kernel_crash]# pwd /root/Lustre_kernel_crash [root@intel-s3e36-02 Lustre_kernel_crash]# crash vmcore3 ./vmlinux The following are the back trace of tasks relevant dumped from crash dump PID: 5330 TASK: ffff8806303e6b30 CPU: 2 COMMAND: "bonnie" --- <NMI exception stack> --- #6 [ffff88034dc5d468] _spin_lock at ffffffff814cb09e #7 [ffff88034dc5d470] osc_teardown_async_page at ffffffffa06cbdbf #8 [ffff88034dc5d510] lov_teardown_async_page at ffffffffa0762d3e #9 [ffff88034dc5d590] ll_removepage at ffffffffa07f82c5 #10 [ffff88034dc5d630] ll_invalidatepage at ffffffffa0813575 #11 [ffff88034dc5d640] llap_shrink_cache_internal at ffffffffa07f626d #12 [ffff88034dc5d790] llap_from_page_with_lockh.clone.8 at ffffffffa07f6f7f #13 [ffff88034dc5d870] ll_readahead at ffffffffa07f9c89 #14 [ffff88034dc5d9f0] ll_readpage at ffffffffa07fc4d0 #15 [ffff88034dc5daf0] generic_file_aio_read at ffffffff8110d3e0 #16 [ffff88034dc5dbd0] ll_file_aio_read at ffffffffa07d6e8c #17 [ffff88034dc5ddd0] ll_file_read at ffffffffa07d88e0 #18 [ffff88034dc5def0] vfs_read at ffffffff8116d905 #19 [ffff88034dc5df30] sys_read at ffffffff8116da41 #20 [ffff88034dc5df80] system_call_fastpath at ffffffff81013172 PID: 5334 TASK: ffff880216ada100 CPU: 11 COMMAND: "bonnie" RIP: 0000000000401c60 RSP: 00007fffe095eb50 RFLAGS: 00000246 RAX: 0000000000000001 RBX: 0000000003f70381 RCX: 00000000ffffffff RDX: 00000000005f3e30 RSI: 0000000000000001 RDI: 0000000000000001 RBP: 00007fffe0a5ec18 R8: 0000000000000001 R9: 00002b542d9f7b20 R10: 00000000ffffffff R11: 0000000000000246 R12: 000000000000000d R13: 00002b542d478000 R14: 0000000000000007 R15: 000000007d000000 ORIG_RAX: ffffffffffffffff CS: 0033 SS: 002b --- <NMI exception stack> --- PID: 5337 TASK: ffff88062ae44b30 CPU: 13 COMMAND: "bonnie" [exception RIP: llap_cast_private+87] RIP: ffffffffa07f1d47 RSP: ffff8804170fb738 RFLAGS: 00010202 RAX: 0000000000000022 RBX: 0000000000000002 RCX: 00000000000025ce RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 RBP: ffff8804170fb788 R8: ffffffff818a7d60 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea001540d6f8 R13: ffffea001540d6f8 R14: ffff8805bbdd4190 R15: ffff880216239000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff8804170fb790] llap_from_page_with_lockh.clone.8 at ffffffffa07f6961 #9 [ffff8804170fb870] ll_readahead at ffffffffa07f9c89 #10 [ffff8804170fb9f0] ll_readpage at ffffffffa07fc4d0 #11 [ffff8804170fbaf0] generic_file_aio_read at ffffffff8110d3e0 #12 [ffff8804170fbbd0] ll_file_aio_read at ffffffffa07d6e8c #13 [ffff8804170fbdd0] ll_file_read at ffffffffa07d88e0 #14 [ffff8804170fbef0] vfs_read at ffffffff8116d905 #15 [ffff8804170fbf30] sys_read at ffffffff8116da41 #16 [ffff8804170fbf80] system_call_fastpath at ffffffff81013172 RIP: 00002abdc6cfe1b0 RSP: 00007fffee410d10 RFLAGS: 00000206 RAX: 0000000000000000 RBX: ffffffff81013172 RCX: 00002abdc6cfe1b0 RDX: 0000000000100000 RSI: 00002abdc6a2a000 RDI: 000000000000000e RBP: 00007fffee510df0 R8: 00002abdc6fa9b20 R9: 00002abdc6fa9b20 R10: 00007fffee410a90 R11: 0000000000000246 R12: 00002abdc6a2a000 R13: 0000000000000042 R14: 000000000000000b R15: 00002abdc6a2a000 ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b PID: 5339 TASK: ffff88032e6c60c0 CPU: 19 COMMAND: "bonnie" [exception RIP: _spin_lock_irq+25] RIP: ffffffff814cafb9 RSP: ffff88016af397c8 RFLAGS: 00000086 RAX: 000000008d6a8d6a RBX: ffffea000e462178 RCX: 0000000000001000 RDX: 000000000000000b RSI: ffff88063caa17e8 RDI: ffff8805d8466ac8 RBP: ffff88016af397c8 R8: ffffea000e462178 R9: ffff8804759b9cb0 R10: 0000000000000000 R11: 0000000000000010 R12: ffff8805d8466ab0 R13: 0000000000049a59 R14: ffff8805d8466ac8 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- <NMI exception stack> --- #6 [ffff88016af397c8] _spin_lock_irq at ffffffff814cafb9 #7 [ffff88016af397d0] add_to_page_cache_locked at ffffffff8110c5ff #8 [ffff88016af39810] add_to_page_cache_lru at ffffffff8110c6cc #9 [ffff88016af39840] grab_cache_page_nowait at ffffffff8110d1cb #10 [ffff88016af39870] ll_readahead at ffffffffa07f9703 #11 [ffff88016af399f0] ll_readpage at ffffffffa07fc4d0 #12 [ffff88016af39af0] generic_file_aio_read at ffffffff8110d3e0 #13 [ffff88016af39bd0] ll_file_aio_read at ffffffffa07d6e8c #14 [ffff88016af39dd0] ll_file_read at ffffffffa07d88e0 #15 [ffff88016af39ef0] vfs_read at ffffffff8116d905 #16 [ffff88016af39f30] sys_read at ffffffff8116da41 #17 [ffff88016af39f80] system_call_fastpath at ffffffff81013172 From the dump, the following task is interesting PID: 5330 TASK: ffff8806303e6b30 CPU: 2 COMMAND: "bonnie" which is doing llap_shrink_cache_internal. Following the related code path, I spot an interesting code /* this unconditional free is only safe because the page lock * is providing exclusivity to memory pressure/truncate/writeback..*/ __clear_page_ll_data(page); in the code how lustre evicts page from its internal cache. I'm wondering if the assumption above causes free a page that is still referenced and causing use-after-free problem. I'm not an page cache expert. Would any MM expert confirm if the comment# 8 makes any sense to the problem. And please also pointed out any suggestions to resolve the problem. Thanks, Luming Luming, __clear_page_ll_data() is in Lustre, not the kernel. Therefore I cant tell what its doing internally. Larry Woodman here you go: #define __clear_page_ll_data(page) \ do { \ ClearPagePrivate(page); \ set_page_private(page, 0); \ page_cache_release(page); \ } while(0) #define set_page_private(page, v) ((page)->private = (v)) #define page_cache_release(page) __free_pages(page, 0) thanks for your help Michael p.s. I have the nasty suspicion one should not redefine page_cache_release() Michael, you really should take it up with the lustre community - their client code needs to understand the locking requirements for pages, etc. I suggest testing against upstream and reposting to the right people/development list. Thanks! some updates here: a) after a BIOS upgrade we could not reproduce this error any more with it's original frequency b) as someone from Lustre showed me, the page_cache_release() redefinition is only for the user space Lustre tools and does not happen for the kernel tools c) the code is working on RH5.4 - and we did not find a place in the Lustre sources that would set page->private to 2. That means the change to page->private most likely occurred within the standard RH kernel. To me this rises the possibility something is wrong in the RH kernel; I think there are very few people around who did a stress-test the way I did it (hammering it with 12 parallel processes doing reads/writes to a backend on a 3.2GB/s link). This rises the possibility of a race conditions like the one described in https://patchwork.kernel.org/patch/564801/ I do NOT expect RH to solve this bug. I documented it here on the odd chance someone would step forward and say "oh, yes, there might be an issue with....". And I think you can close this, even if it is not really resolved (aka we never found the real root cause) happy hacking Michael re: comment 13 Michael, thank you for filing this bug. We are always interested in feedback and potential issues with RHEL. And, as you noted, there is a good chance that someone has seen this problem (or a similar one) before. As Ric noted, it would be helpful to have the Lustre upstream community involved. If you have the chance, it would also be very helpful to know if the upstream kernel has the same issue. Lustre is not supported for anything > 2.6.32, so no chance to test that Lustre community - especially Whamcloud - is informed and involved best regards Michael Michael, Do you want to leave this bug open or can we close it as WONTFIX? as this is mostly for documentation - sure, close it |