Bug 678175

Summary:

[Intel 6.2 Bug] Lustre 1.8.5 causes Kernel Panics

Product:

Red Hat Enterprise Linux 6

Reporter:

Michael Hebenstreit <michael.hebenstreit>

Component:

kernel

Assignee:

Red Hat Kernel Manager <kernel-mgr>

Status:

CLOSED WONTFIX

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

6.0

CC:

hui.xiao, jane.lv, jvillalo, jwilleford, keve.a.gabbert, luyu, lwoodman, rdoty, rwheeler

Target Milestone:

Keywords:

Reopened

Target Release:

6.2

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-04-26 19:09:43 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

670196

Attachments:

Description	Flags
sos report	none
possibly helpful files	none

Description Michael Hebenstreit 2011-02-17 00:34:20 UTC

Created attachment 479241 [details]
sos report

Lustre 1.8.5 works without problems in RH5. Switching to RH6 I get Kernel Panic errors halting the system. As far as I can tell this happens only if Read and Writes calls are used at the same time. Using 12 threads Read or 12 threads Writing has not caused this issue to appear. Note - it's not possible to completely reproduce this issue. Sometimes the error happens after 2min, sometimes it takes 15min. So I'm pretty sure this is a race condition.

The Kernel Panic happens in conjunction with the "page" structure. Lustre uses "page->private" to store a pointer to private information. Sometimes this value - which obviously should be either NULL or a pointer to a valid Kernel address - is set to "2". This disables all checks in Lustre for "page->private == NULL", and a subsequent access "page->private->ll_magic" causes the panic. I probably could set "page->private" to NULL in such a case, but that would only hide the sympton, not remedy the root cause and thereby most likley introduce a new error.

From the Kernel traces I was able to recreate the callpath:

llap_cast_private+0x18/0xa0
llap_from_page_with_lockh.clone.8+0x1484/0x1cf0 [lustre]
(inline: ll_readahead_page)
(inline: ll_readahead_pages)
ll_readahead+0xf59/0x1ca0 [lustre]
ll_readpage+0x29b/0x2270 [lustre]
generic_file_aio_read+0x1f0/0x730
ll_file_aio_read+0xcd6/0x2720 [lustre]
ll_file_read+0xd0/0xf0 [lustre]
vfs_read+0xb5/0x1a0
sys_read+0x51/0x90
system_call_fastpath+0x16/0x1b

tracing back the origins of page->private==2 lead me to the call

  page = grab_cache_page_nowait(mapping, index);

This function returns sometimes page->private is already set to "2". From what I could see in the Lustre sources "page->private" is always treated to store a pointer into kernel space. It is therefore very unlikely that Lustre causes this error itself, especially as the code works on RH5

This leads me to the following questions: 

a) If the page was not in the cache and has been freshly allocated - should page->private be 0? Cause it looks to me it this is not always the case. The page->private==2 error is already present when the function returns. 
b) Is there a simple way to distinguish between a page freshly allocated and one already residing in the page cache?

I attached a sosreport as well

thanks
Michael


############################################################################

############################################################################

 
[root@et06 crt]# uname -r
2.6.32-71.14.1.el6.x86_64.crt



############################################################################
A typical panic originally looked like:

############################################################################

 
BUG: unable to handle kernel NULL pointer dereference at 0000000000000002

IP: [<ffffffffa0889d08>] llap_cast_private+0x18/0xa0 [lustre]
PGD 32d72a067 PUD 330da0067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
CPU 20

Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash dm_log dm_mod radeon ttm drm_kms_helper drm i2c_algo_bit serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support pata_jmicron i7core_edac edac_core shpchp igb dca ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage ahci ata_generic pata_acpi [last unloaded: cpufreq_ondemand]

Modules linked in: mgc(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) nfs lockd fscache nfs_acl auth_rpcgss acpi_cpufreq freq_table sunrpc rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) ib_mad(U) ib_core(U) mlx4_core(U) ext2 dm_mirror dm_region_hash dm_log dm_mod radeon ttm drm_kms_helper drm i2c_algo_bit serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support pata_jmicron i7core_edac edac_core shpchp igb dca ext3 jbd mbcache sr_mod cdrom sd_mod crc_t10dif usb_storage ahci ata_generic pata_acpi [last unloaded: cpufreq_ondemand]
Pid: 3477, comm: bonnie Not tainted 2.6.32-71.14.1.el6.x86_64.crt #1 X8DTN
RIP: 0010:[<ffffffffa0889d08>]  [<ffffffffa0889d08>] llap_cast_private+0x18/0xa0 [lustre]
RSP: 0018:ffff880553371748  EFLAGS: 00010202
RAX: 0000000000000002 RBX: ffff880264d18000 RCX: ffff88009e7487e8
RDX: ffff8805533719a8 RSI: 0000000000000002 RDI: ffffea0002988c48
RBP: ffff880553371788 R08: 0000000000000002 R09: ffffea0002988c50
R10: ffff8805fe3b43c0 R11: 0000000000000010 R12: ffff88032f6e2df0
R13: ffffea0002988c48 R14: ffff88032f6e2cd0 R15: ffff88032fca8800
FS:  00002b219d82fb20(0000) GS:ffff88034ad00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000002 CR3: 000000032d7ae000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process bonnie (pid: 3477, threadinfo ffff880553370000, task ffff8805761f54a0)
Stack:
 000000003f818000 ffffea0002988c10 ffff8805533717b8 ffffffffa05cbaef
<0> ffff88062f0a2200 0000000000000000 ffff880553371798 ffff88062f0a2200
<0> ffff880553371868 ffffffffa088e921 ffff880553371818 ffffffffa0648daa
Call Trace:
 [<ffffffffa05cbaef>] ? class_handle2object+0x8f/0x1c0 [obdclass]
 [<ffffffffa088e921>] llap_from_page_with_lockh.clone.8+0xa1/0x1120 [lustre]
 [<ffffffffa0648daa>] ? ldlm_lock_decref_internal+0xba/0x880 [ptlrpc]
 [<ffffffffa0645bbf>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
 [<ffffffffa088d961>] ? ll_issue_page_read+0x131/0x420 [lustre]
 [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
 [<ffffffffa0891869>] ll_readahead+0xf59/0x1ca0 [lustre]
 [<ffffffffa0643651>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc]
 [<ffffffffa0643879>] ? ldlm_lock_add_to_lru+0x49/0x110 [ptlrpc]
 [<ffffffffa088eb21>] ? llap_from_page_with_lockh.clone.8+0x2a1/0x1120 [lustre]
 [<ffffffffa0894475>] ll_readpage+0x165/0x1da0 [lustre]
 [<ffffffffa0643984>] ? ldlm_lock_remove_from_lru+0x44/0x110 [ptlrpc]
 [<ffffffffa0643bb2>] ? ldlm_lock_fast_match+0xc2/0x130 [ptlrpc]
 [<ffffffff812637ed>] ? copy_user_generic_string+0x2d/0x40
 [<ffffffff8110d3e0>] generic_file_aio_read+0x1f0/0x730
 [<ffffffffa086ee8c>] ll_file_aio_read+0xc8c/0x2610 [lustre]
 [<ffffffffa08708e0>] ll_file_read+0xd0/0xf0 [lustre]
 [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20
 [<ffffffff8116cf7d>] ? rw_verify_area+0x5d/0xc0
 [<ffffffff8116d905>] vfs_read+0xb5/0x1a0
 [<ffffffff8116da41>] sys_read+0x51/0x90
 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
Code: ff ff 90 4c 8b 5d c8 4d 29 f3 4c 03 5d c0 e9 d3 fe ff ff 55 48 89 e5 48 83 ec 40 0f 1f 44 00 00 48 8b 47 10 48 85 c0 75 02 c9 c3 <8b> 10 81 fa 21 06 e3 05 74 f4 89 54 24 28 48 89 44 24 20 ba 00

RIP  [<ffffffffa0889d08>] llap_cast_private+0x18/0xa0 [lustre]
 RSP <ffff880553371748>
CR2: 0000000000000002
---[ end trace 2a22ba0ec9d8ad43 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3477, comm: bonnie Tainted: G      D    ----------------  2.6.32-71.14.1.el6.x86_64.crt #1
Call Trace:
 [<ffffffff814c8183>] panic+0x78/0x137
 [<ffffffff814cc254>] oops_end+0xe4/0x100
Feb  4 10:56:01  [<ffffffff8104651b>] no_context+0xfb/0x260
et06 kernel: BUG [<ffffffff810467a5>] __bad_area_nosemaphore+0x125/0x1e0
: unable to hand [<ffffffff810468ce>] bad_area+0x4e/0x60
le kernel NULL p [<ffffffff814cdda0>] do_page_fault+0x390/0x3a0
ointer dereferen [<ffffffff814cb5a5>] page_fault+0x25/0x30
ce at 0000000000 [<ffffffffa0889d08>] ? llap_cast_private+0x18/0xa0 [lustre]
000002
Feb  4 10 [<ffffffffa05cbaef>] ? class_handle2object+0x8f/0x1c0 [obdclass]
:56:01 et06 kern [<ffffffffa088e921>] llap_from_page_with_lockh.clone.8+0xa1/0x1120 [lustre]
el: BUG: unable  [<ffffffffa0648daa>] ? ldlm_lock_decref_internal+0xba/0x880 [ptlrpc]
to handle kernel [<ffffffffa0645bbf>] ? __ldlm_handle2lock+0x9f/0x3d0 [ptlrpc]
 NULL pointer de [<ffffffffa088d961>] ? ll_issue_page_read+0x131/0x420 [lustre]
reference at 000 [<ffffffff8110c07e>] ? find_get_page+0x1e/0xa0
0000000000002
Fe [<ffffffffa0891869>] ll_readahead+0xf59/0x1ca0 [lustre]
b  4 10:56:01 et [<ffffffffa0643651>] ? ldlm_lock_add_to_lru_nolock+0x51/0xe0 [ptlrpc]
06 kernel: IP: [ [<ffffffffa0643879>] ? ldlm_lock_add_to_lru+0x49/0x110 [ptlrpc]
<ffffffffa0889d0 [<ffffffffa088eb21>] ? llap_from_page_with_lockh.clone.8+0x2a1/0x1120 [lustre]
8>] llap_cast_pr [<ffffffffa0894475>] ll_readpage+0x165/0x1da0 [lustre]
ivate+0x18/0xa0  [<ffffffffa0643984>] ? ldlm_lock_remove_from_lru+0x44/0x110 [ptlrpc]
[lustre]
Feb  4  [<ffffffffa0643bb2>] ? ldlm_lock_fast_match+0xc2/0x130 [ptlrpc]
10:56:01 et06 ke [<ffffffff812637ed>] ? copy_user_generic_string+0x2d/0x40
rnel: IP: [<ffff [<ffffffff8110d3e0>] generic_file_aio_read+0x1f0/0x730
ffffa0889d08>] l [<ffffffffa086ee8c>] ll_file_aio_read+0xc8c/0x2610 [lustre]
lap_cast_private [<ffffffffa08708e0>] ll_file_read+0xd0/0xf0 [lustre]
+0x18/0xa0 [lust [<ffffffff81091de0>] ? autoremove_wake_function+0x0/0x40
re]
Feb  4 10:56 [<ffffffff811ffb76>] ? security_file_permission+0x16/0x20
:01 et06 kernel: [<ffffffff8116cf7d>] ? rw_verify_area+0x5d/0xc0
 PGD 32d72a067 P [<ffffffff8116d905>] vfs_read+0xb5/0x1a0
UD 330da0067 PMD [<ffffffff8116da41>] sys_read+0x51/0x90
 0
Feb  4 10:56 [<ffffffff81013172>] system_call_fastpath+0x16/0x1b
:01 et06 kernel:panic occurred, switching back to text console
 
 

###########################################################################
############################################################################

[root@et06 crt]# modinfo lustre
filename:       /lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko
license:        GPL
description:    Lustre Lite Client File System
author:         Sun Microsystems, Inc. <http://www.lustre.org/>
srcversion:     83AC865556E2AE50715D831
depends:        obdclass,mdc,ptlrpc,libcfs,lvfs,lov,osc,lnet
vermagic:       2.6.32-71.14.1.el6.x86_64.crt SMP mod_unload modversions
[root@et06 crt]# gdb /lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko
...

Reading symbols from /lib/modules/2.6.32-71.14.1.el6.x86_64.crt/updates/kernel/fs/lustre/lustre.ko...done.
(gdb) disassemble /m *(llap_cast_private+0x18/0xa0)
Dump of assembler code for function llap_cast_private:
538     {
   0x0000000000036d20 <+0>:     push   %rbp
   0x0000000000036d21 <+1>:     mov    %rsp,%rbp
   0x0000000000036d24 <+4>:     sub    $0x40,%rsp
   0x0000000000036d28 <+8>:     callq  0x36d2d <llap_cast_private+13>

539             struct ll_async_page *llap = (struct ll_async_page *)page_private(page);
   0x0000000000036d2d <+13>:    mov    0x10(%rdi),%rax

540
541             LASSERTF(llap == NULL || llap->llap_magic == LLAP_MAGIC,
   0x0000000000036d31 <+17>:    test   %rax,%rax
   0x0000000000036d34 <+20>:    jne    0x36d38 <llap_cast_private+24>
   0x0000000000036d38 <+24>:    mov    (%rax),%edx
   0x0000000000036d3a <+26>:    cmp    $0x5e30621,%edx
   0x0000000000036d40 <+32>:    je     0x36d36 <llap_cast_private+22>
   0x0000000000036d42 <+34>:    mov    %edx,0x28(%rsp)
   0x0000000000036d46 <+38>:    mov    %rax,0x20(%rsp)
   0x0000000000036d4b <+43>:    mov    $0x40000,%edx
   0x0000000000036d50 <+48>:    mov    %rdi,0x18(%rsp)
   0x0000000000036d55 <+53>:    mov    $0x80,%esi
   0x0000000000036d5a <+58>:    xor    %edi,%edi
   0x0000000000036d5c <+60>:    mov    $0x21f,%r9d
   0x0000000000036d62 <+66>:    mov    $0x0,%r8
   0x0000000000036d69 <+73>:    mov    $0x0,%rcx
   0x0000000000036d70 <+80>:    xor    %eax,%eax
   0x0000000000036d72 <+82>:    movl   $0x5e30621,0x30(%rsp)
   0x0000000000036d7a <+90>:    movq   $0x0,0x10(%rsp)
   0x0000000000036d83 <+99>:    movq   $0x0,0x8(%rsp)
   0x0000000000036d8c <+108>:   movq   $0x0,(%rsp)
   0x0000000000036d94 <+116>:   callq  0x36d99 <llap_cast_private+121>
   0x0000000000036d99 <+121>:   mov    $0x21f,%edx
   0x0000000000036d9e <+126>:   mov    $0x0,%rsi
   0x0000000000036da5 <+133>:   mov    $0x0,%rdi
   0x0000000000036dac <+140>:   callq  0x36db1
   0x0000000000036db1:  data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)

542                      "page %p private %lu gave magic %d which != %d\n",
543                      page, page_private(page), llap->llap_magic, LLAP_MAGIC);
544
545             return llap;
546     }
   0x0000000000036d36 <+22>:    leaveq
   0x0000000000036d37 <+23>:    retq

End of assembler dump.

Comment 1 Michael Hebenstreit 2011-02-17 00:38:01 UTC

Created attachment 479243 [details]
possibly helpful files


###########################################################################
attached file containing sources, compiled modules, rnel config file, objdumps
############################################################################

objdump -M intel -S -d -r llite_lloop.ko > objdump_llite_lloop.ko


20110215_b/
20110215_b/src/
20110215_b/src/dcache.c
20110215_b/src/dir.c
20110215_b/src/file.c
20110215_b/src/llite_close.c
20110215_b/src/llite_lib.c
20110215_b/src/llite_lloop.mod.c
20110215_b/src/llite_mmap.c
20110215_b/src/llite_nfs.c
20110215_b/src/lloop.c
20110215_b/src/lproc_llite.c
20110215_b/src/lustre.mod.c
20110215_b/src/namei.c
20110215_b/src/rw.c
20110215_b/src/rw26.c
20110215_b/src/statahead.c
20110215_b/src/super25.c
20110215_b/src/symlink.c
20110215_b/src/xattr.c
20110215_b/src/llite_internal.h
20110215_b/llite_lloop.ko
20110215_b/lustre.ko
20110215_b/config
20110215_b/et06_x.log
20110215_b/objdump_lustre.ko
20110215_b/objdump_llite_lloop.ko

Comment 3 Ric Wheeler 2011-02-17 15:06:30 UTC

We don't build or support lustre - using it requires rebuilding the kernel.

I think that your best bet is to raise this with the upstream lists.

Please reopen this BZ if you can reproduce the issue with our kernel, thanks!

Comment 4 Michael Hebenstreit 2011-02-17 16:27:00 UTC

incorrect - this is the patchless Lustre client - so the Kernel was NOT patched. Essentially it's the RH 6 kernel, with IB support removed and some other options set. InfiniBand was installed separetely (OFED 1.5.3RC4), but as you can see from the traces we are on the VFS layer, IB modules have not been touched.

It would be nice if you would answer my questions regarding grab_cache_page_nowait(mapping, index):

a) If the page was not in the cache and has been freshly allocated - should
page->private be 0? Cause it looks to me it this is not always the case. The
page->private==2 error is already present when the function returns. 
b) Is there a simple way to distinguish between a page freshly allocated and
one already residing in the page cache?

best regards
Michael

Comment 5 Luming Yu 2011-02-21 07:00:50 UTC

(In reply to comment #4)
> a) If the page was not in the cache and has been freshly allocated - should
> page->private be 0? Cause it looks to me it this is not always the case. The
> page->private==2 error is already present when the function returns. 

1.You could have to also dump page->flags before knowing what the page->private is.

2. A crash dump would be more helpful..

3. Does upstream works or not?

Comment 6 Luming Yu 2011-02-28 06:49:21 UTC

I've uploaded crash dump onto intel-s3e36-02.lab.bos.redhat.com. Would anyone take a look? Please let me know.

[root@intel-s3e36-02 Lustre_kernel_crash]# uname -a
Linux intel-s3e36-02.lab.bos.redhat.com 2.6.32 #1 SMP Fri Feb 18 00:25:20 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-s3e36-02 Lustre_kernel_crash]# pwd
/root/Lustre_kernel_crash
[root@intel-s3e36-02 Lustre_kernel_crash]# crash vmcore3 ./vmlinux

Comment 7 Luming Yu 2011-02-28 14:37:57 UTC

The following are the back trace of tasks relevant dumped from crash dump

 PID: 5330   TASK: ffff8806303e6b30  CPU: 2   COMMAND: "bonnie"
 --- <NMI exception stack> ---
 #6 [ffff88034dc5d468] _spin_lock at ffffffff814cb09e
 #7 [ffff88034dc5d470] osc_teardown_async_page at ffffffffa06cbdbf
 #8 [ffff88034dc5d510] lov_teardown_async_page at ffffffffa0762d3e
 #9 [ffff88034dc5d590] ll_removepage at ffffffffa07f82c5
#10 [ffff88034dc5d630] ll_invalidatepage at ffffffffa0813575
#11 [ffff88034dc5d640] llap_shrink_cache_internal at ffffffffa07f626d
#12 [ffff88034dc5d790] llap_from_page_with_lockh.clone.8 at ffffffffa07f6f7f
#13 [ffff88034dc5d870] ll_readahead at ffffffffa07f9c89
#14 [ffff88034dc5d9f0] ll_readpage at ffffffffa07fc4d0
#15 [ffff88034dc5daf0] generic_file_aio_read at ffffffff8110d3e0
#16 [ffff88034dc5dbd0] ll_file_aio_read at ffffffffa07d6e8c
#17 [ffff88034dc5ddd0] ll_file_read at ffffffffa07d88e0
#18 [ffff88034dc5def0] vfs_read at ffffffff8116d905
#19 [ffff88034dc5df30] sys_read at ffffffff8116da41
#20 [ffff88034dc5df80] system_call_fastpath at ffffffff81013172


PID: 5334   TASK: ffff880216ada100  CPU: 11  COMMAND: "bonnie"
     RIP: 0000000000401c60  RSP: 00007fffe095eb50  RFLAGS: 00000246
    RAX: 0000000000000001  RBX: 0000000003f70381  RCX: 00000000ffffffff
    RDX: 00000000005f3e30  RSI: 0000000000000001  RDI: 0000000000000001
    RBP: 00007fffe0a5ec18   R8: 0000000000000001   R9: 00002b542d9f7b20
    R10: 00000000ffffffff  R11: 0000000000000246  R12: 000000000000000d
    R13: 00002b542d478000  R14: 0000000000000007  R15: 000000007d000000
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b
--- <NMI exception stack> ---


PID: 5337   TASK: ffff88062ae44b30  CPU: 13  COMMAND: "bonnie"
    [exception RIP: llap_cast_private+87]
    RIP: ffffffffa07f1d47  RSP: ffff8804170fb738  RFLAGS: 00010202
    RAX: 0000000000000022  RBX: 0000000000000002  RCX: 00000000000025ce
    RDX: 0000000000000000  RSI: 0000000000000046  RDI: 0000000000000246
    RBP: ffff8804170fb788   R8: ffffffff818a7d60   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffffea001540d6f8
    R13: ffffea001540d6f8  R14: ffff8805bbdd4190  R15: ffff880216239000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8804170fb790] llap_from_page_with_lockh.clone.8 at ffffffffa07f6961
 #9 [ffff8804170fb870] ll_readahead at ffffffffa07f9c89
#10 [ffff8804170fb9f0] ll_readpage at ffffffffa07fc4d0
#11 [ffff8804170fbaf0] generic_file_aio_read at ffffffff8110d3e0
#12 [ffff8804170fbbd0] ll_file_aio_read at ffffffffa07d6e8c
#13 [ffff8804170fbdd0] ll_file_read at ffffffffa07d88e0
#14 [ffff8804170fbef0] vfs_read at ffffffff8116d905
#15 [ffff8804170fbf30] sys_read at ffffffff8116da41
#16 [ffff8804170fbf80] system_call_fastpath at ffffffff81013172
    RIP: 00002abdc6cfe1b0  RSP: 00007fffee410d10  RFLAGS: 00000206
    RAX: 0000000000000000  RBX: ffffffff81013172  RCX: 00002abdc6cfe1b0
    RDX: 0000000000100000  RSI: 00002abdc6a2a000  RDI: 000000000000000e
    RBP: 00007fffee510df0   R8: 00002abdc6fa9b20   R9: 00002abdc6fa9b20
    R10: 00007fffee410a90  R11: 0000000000000246  R12: 00002abdc6a2a000
    R13: 0000000000000042  R14: 000000000000000b  R15: 00002abdc6a2a000
    ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b


PID: 5339   TASK: ffff88032e6c60c0  CPU: 19  COMMAND: "bonnie"
     [exception RIP: _spin_lock_irq+25]
    RIP: ffffffff814cafb9  RSP: ffff88016af397c8  RFLAGS: 00000086
    RAX: 000000008d6a8d6a  RBX: ffffea000e462178  RCX: 0000000000001000
    RDX: 000000000000000b  RSI: ffff88063caa17e8  RDI: ffff8805d8466ac8
    RBP: ffff88016af397c8   R8: ffffea000e462178   R9: ffff8804759b9cb0
    R10: 0000000000000000  R11: 0000000000000010  R12: ffff8805d8466ab0
    R13: 0000000000049a59  R14: ffff8805d8466ac8  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #6 [ffff88016af397c8] _spin_lock_irq at ffffffff814cafb9
 #7 [ffff88016af397d0] add_to_page_cache_locked at ffffffff8110c5ff
 #8 [ffff88016af39810] add_to_page_cache_lru at ffffffff8110c6cc
 #9 [ffff88016af39840] grab_cache_page_nowait at ffffffff8110d1cb
#10 [ffff88016af39870] ll_readahead at ffffffffa07f9703
#11 [ffff88016af399f0] ll_readpage at ffffffffa07fc4d0
#12 [ffff88016af39af0] generic_file_aio_read at ffffffff8110d3e0
#13 [ffff88016af39bd0] ll_file_aio_read at ffffffffa07d6e8c
#14 [ffff88016af39dd0] ll_file_read at ffffffffa07d88e0
#15 [ffff88016af39ef0] vfs_read at ffffffff8116d905
#16 [ffff88016af39f30] sys_read at ffffffff8116da41
#17 [ffff88016af39f80] system_call_fastpath at ffffffff81013172

Comment 8 Luming Yu 2011-02-28 14:54:01 UTC

From the dump, the following task is interesting
PID: 5330   TASK: ffff8806303e6b30  CPU: 2   COMMAND: "bonnie"

which is doing llap_shrink_cache_internal.

Following the related code path, I spot an interesting code

        /* this unconditional free is only safe because the page lock
         * is providing exclusivity to memory pressure/truncate/writeback..*/
        __clear_page_ll_data(page);

in the code how lustre evicts page from its internal cache.
 
I'm wondering if the assumption above causes free a page that is still referenced and causing use-after-free problem.

Comment 9 Luming Yu 2011-02-28 14:58:40 UTC

I'm not an page cache expert. Would any MM expert confirm if the comment# 8 makes any sense to the problem. And please also pointed out any suggestions to resolve the problem.

Thanks,
Luming

Comment 10 Larry Woodman 2011-02-28 19:26:42 UTC

Luming, __clear_page_ll_data() is in Lustre, not the kernel.  Therefore I cant tell what its doing internally.

Larry Woodman

Comment 11 Michael Hebenstreit 2011-02-28 19:40:20 UTC

here you go:

#define __clear_page_ll_data(page) \
        do {       \
                ClearPagePrivate(page); \
                set_page_private(page, 0); \
                page_cache_release(page); \
        } while(0)


#define set_page_private(page, v) ((page)->private = (v))
#define page_cache_release(page) __free_pages(page, 0)


thanks for your help
Michael

p.s. I have the nasty suspicion one should not redefine page_cache_release()

Comment 12 Ric Wheeler 2011-04-01 19:43:06 UTC

Michael, you really should take it up with the lustre community - their client code needs to understand the locking requirements for pages, etc.

I suggest testing against upstream and reposting to the right people/development list.

Thanks!

Comment 13 Michael Hebenstreit 2011-04-01 20:56:57 UTC

some updates here:

a) after a BIOS upgrade we could not reproduce this error any more with it's original frequency
b) as someone from Lustre showed me, the page_cache_release() redefinition is only for the user space Lustre tools and does not happen for the kernel tools
c) the code is working on RH5.4 - and we did not find a place in the Lustre sources that would set page->private to 2. That means the change to page->private most likely occurred within the standard RH kernel. To me this rises the possibility something is wrong in the RH kernel; I think there are very few people around who did a stress-test the way I did it (hammering it with 12 parallel processes doing reads/writes to a backend on a 3.2GB/s link). 

This rises the possibility of a race conditions like the one described in 
https://patchwork.kernel.org/patch/564801/ 


I do NOT expect RH to solve this bug. I documented it here on the odd chance someone would step forward and say "oh, yes, there might be an issue with....".
And I think you can close this, even if it is not really resolved (aka we never found the real root cause)

happy hacking
Michael

Comment 14 Russell Doty 2011-04-01 21:23:18 UTC

re: comment 13

Michael, thank you for filing this bug. We are always interested in feedback and potential issues with RHEL. And, as you noted, there is a good chance that someone has seen this problem (or a similar one) before.

As Ric noted, it would be helpful to have the Lustre upstream community involved. If you have the chance, it would also be very helpful to know if the upstream kernel has the same issue.

Comment 15 Michael Hebenstreit 2011-04-04 16:48:23 UTC

Lustre is not supported for anything > 2.6.32, so no chance to test that

Lustre community - especially Whamcloud - is informed and involved

best regards
Michael

Comment 16 John Villalovos 2011-04-26 18:59:04 UTC

Michael,

Do you want to leave this bug open or can we close it as WONTFIX?

Comment 17 Michael Hebenstreit 2011-04-26 19:03:20 UTC

as this is mostly for documentation - sure, close it