Bug 593689 - kernel panic on NFSv4 server running bonnie++ over NFS
kernel panic on NFSv4 server running bonnie++ over NFS
Status: CLOSED DUPLICATE of bug 576202
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
low Severity medium
: rc
: ---
Assigned To: Red Hat Kernel Manager
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-05-19 09:58 EDT by Matt Bernstein
Modified: 2010-05-19 11:43 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-05-19 11:43:16 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
all I could capture of the kernel panic (5.89 KB, image/png)
2010-05-19 09:58 EDT, Matt Bernstein
no flags Details

  None (edit)
Description Matt Bernstein 2010-05-19 09:58:53 EDT
Created attachment 415118 [details]
all I could capture of the kernel panic

Description of problem:

With an RHEL6 beta NFS client running bonnie++ 1.93 on an RHEL6 beta NFS server, it writes 15GB then kernel panics.

Version-Release number of selected component (if applicable):

2.6.32-19.el6.x86_64

How reproducible:

not reliably

Steps to Reproduce:
1. install bonnie++ from Fedora 13 src RPM on RHEL6 beta NFSv4 client
2. run bonnie++ -f -d /path/to/nfsmount
3. wait a few minutes
  
Actual results:

client process freezes, server kernel-panics (see attachment, nothing in logs)

Expected results:

no crashes, benchmark results

Additional info:

Here are the mount options:

landin:/iso on /import/iso type nfs4 (rw,nosuid,hard,intr,proto=tcp,rsize=65536,wsize=65536,sloppy,addr=138.37.88.245,clientaddr=138.37.88.218)

Client and server are Dells (M910 and R810 respectively), both with Xeon 6542 chips. Client has 128GB RAM, server 64GB.
Comment 2 Matt Bernstein 2010-05-19 11:20:27 EDT
I ran it a second time, and it wrote 225G before crashing. This time I caught more debugging:

May 19 15:35:56 landin kernel: kernel BUG at fs/ext4/inode.c:1852!
May 19 15:35:56 landin kernel: invalid opcode: 0000 [#1] SMP 
May 19 15:35:56 landin kernel: last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
May 19 15:35:56 landin kernel: CPU 9 
May 19 15:35:56 landin kernel: Modules linked in: mptctl(U) mptbase(U) ipmi_msghandler(U) dell_rbu(U) nfsd(U) nfs_acl(U) auth_rpcgss(U) exportfs(U) lockd(U) sunrpc(U) bonding(U) nf_conntrack_ftp(U) ts_kmp(U) nf_conntrack_amanda(U) ip6t_REJECT(U) nf_conntrack_ipv6(U) ip6table_filter(U) ip6_tables(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) sr_mod(U) cdrom(U) bnx2(U) iTCO_wdt(U) iTCO_vendor_support(U) ses(U) serio_raw(U) power_meter(U) enclosure(U) joydev(U) dcdbas(U) hwmon(U) sg(U) ext4(U) mbcache(U) jbd2(U) pata_acpi(U) ata_generic(U) dm_multipath(U) sd_mod(U) crc_t10dif(U) ata_piix(U) megaraid_sas(U) dm_mod(U) [last unloaded: speedstep_lib]
May 19 15:35:56 landin kernel: Pid: 2757, comm: nfsd Not tainted 2.6.32-19.el6.x86_64 #1 PowerEdge R810
May 19 15:35:56 landin kernel: RIP: 0010:[<ffffffffa009dcd3>]  [<ffffffffa009dcd3>] ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: RSP: 0018:ffff8808473d3590  EFLAGS: 00010297
May 19 15:35:56 landin kernel: RAX: 0000000000001c94 RBX: ffff881052eb24e0 RCX: 0000000000000154
May 19 15:35:56 landin kernel: RDX: 0000000000001c95 RSI: 0000000000001c94 RDI: 0000000000000153
May 19 15:35:56 landin kernel: RBP: ffff8808473d35f0 R08: 0000000000001c94 R09: ffff881059548ce0
May 19 15:35:56 landin kernel: R10: 0000000004198000 R11: 0000000000000000 R12: ffff88068dbca408
May 19 15:35:56 landin kernel: R13: ffff881052eb27b0 R14: ffff881052eb2430 R15: 0000000000000000
May 19 15:35:56 landin kernel: FS:  0000000000000000(0000) GS:ffff88089c480000(0000) knlGS:0000000000000000
May 19 15:35:56 landin kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
May 19 15:35:56 landin kernel: CR2: 00000000f77ab000 CR3: 0000000001001000 CR4: 00000000000006e0
May 19 15:35:56 landin kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 19 15:35:56 landin kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 19 15:35:56 landin kernel: Process nfsd (pid: 2757, threadinfo ffff8808473d2000, task ffff8808473d1580)
May 19 15:35:56 landin kernel: Stack:
May 19 15:35:56 landin kernel: ffff880880010c80 ffffea00014121c0 ffff88105b9a9000 ffffffffffff0000
May 19 15:35:56 landin kernel: <0> ffff881052eb24e0 00000000c2100000 ffff8808473d35f0 0000000000001000
May 19 15:35:56 landin kernel: <0> 0000000000001000 0000000000001000 00000000037c2100 0000000000000000
May 19 15:35:56 landin kernel: Call Trace:
May 19 15:35:56 landin kernel: [<ffffffff8118dde3>] __block_prepare_write+0x1e3/0x590
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118e334>] block_write_begin+0x64/0x100
May 19 15:35:56 landin kernel: [<ffffffffa00a052d>] ext4_da_write_begin+0x17d/0x290 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff81102c0e>] generic_file_buffered_write+0x10e/0x2a0
May 19 15:35:56 landin kernel: [<ffffffff811047a0>] __generic_file_aio_write+0x250/0x480
May 19 15:35:56 landin kernel: [<ffffffff81104a3f>] generic_file_aio_write+0x6f/0xe0
May 19 15:35:56 landin kernel: [<ffffffffa0096160>] ? ext4_file_write+0x0/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa0096199>] ext4_file_write+0x39/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8115e29b>] do_sync_readv_writev+0xfb/0x140
May 19 15:35:56 landin kernel: [<ffffffff813edb2f>] ? release_sock+0xaf/0xc0
May 19 15:35:56 landin kernel: [<ffffffff8108dc00>] ? autoremove_wake_function+0x0/0x40
May 19 15:35:56 landin kernel: [<ffffffff811f967b>] ? selinux_file_permission+0xfb/0x150
May 19 15:35:56 landin kernel: [<ffffffff811ec646>] ? security_file_permission+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff8115f24f>] do_readv_writev+0xcf/0x1f0
May 19 15:35:56 landin kernel: [<ffffffff81120ea9>] ? kmemdup+0x29/0x50
May 19 15:35:56 landin kernel: [<ffffffff81096396>] ? groups_alloc+0x46/0xf0
May 19 15:35:56 landin kernel: [<ffffffff811ec9c6>] ? security_task_setgroups+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff81096165>] ? set_groups+0x25/0x1a0
May 19 15:35:56 landin kernel: [<ffffffff8115f3b6>] vfs_writev+0x46/0x60
May 19 15:35:56 landin kernel: [<ffffffffa0271fd0>] nfsd_vfs_write+0xe0/0x440 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa02707b2>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff810d4a15>] ? call_rcu_sched+0x15/0x20
May 19 15:35:56 landin kernel: [<ffffffffa0271aa1>] ? nfsd_permission+0x131/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0274589>] nfsd_write+0x99/0x100 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027f4d0>] nfsd4_write+0x100/0x130 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027fe51>] nfsd4_proc_compound+0x3d1/0x4d0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026d3fa>] nfsd_dispatch+0xba/0x250 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0208ec4>] svc_process_common+0x344/0x610 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa02094d0>] svc_process+0x110/0x150 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa026daf6>] nfsd+0xd6/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026da20>] ? nfsd+0x0/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff8108d8a6>] kthread+0x96/0xa0
May 19 15:35:56 landin kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
May 19 15:35:56 landin kernel: [<ffffffff8108d810>] ? kthread+0x0/0xa0
May 19 15:35:56 landin kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
May 19 15:35:56 landin kernel: Code: 48 8b 40 18 49 89 44 24 20 f0 41 80 0c 24 40 f0 41 80 4c 24 01 02 e9 e2 fd ff ff 0f 1f 44 00 00 41 bf e4 ff ff ff e9 d2 fd ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 
May 19 15:35:56 landin kernel: RIP  [<ffffffffa009dcd3>] ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: RSP <ffff8808473d3590>
May 19 15:35:56 landin kernel: ---[ end trace 234f986d30d8da3c ]---
May 19 15:35:56 landin kernel: Kernel panic - not syncing: Fatal exception
May 19 15:35:56 landin kernel: Pid: 2757, comm: nfsd Tainted: G      D    2.6.32-19.el6.x86_64 #1
May 19 15:35:56 landin kernel: Call Trace:
May 19 15:35:56 landin kernel: [<ffffffff814bfd69>] panic+0x78/0x137
May 19 15:35:56 landin kernel: [<ffffffff814c3d1c>] oops_end+0xdc/0xf0
May 19 15:35:56 landin kernel: [<ffffffff8101723b>] die+0x5b/0x90
May 19 15:35:56 landin kernel: [<ffffffff814c35c4>] do_trap+0xc4/0x160
May 19 15:35:56 landin kernel: [<ffffffff81014cb5>] do_invalid_op+0x95/0xb0
May 19 15:35:56 landin kernel: [<ffffffffa009dcd3>] ? ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff811489e5>] ? ____cache_alloc_node+0x95/0x150
May 19 15:35:56 landin kernel: [<ffffffff81013f5b>] invalid_op+0x1b/0x20
May 19 15:35:56 landin kernel: [<ffffffffa009dcd3>] ? ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118dde3>] __block_prepare_write+0x1e3/0x590
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118e334>] block_write_begin+0x64/0x100
May 19 15:35:56 landin kernel: [<ffffffffa00a052d>] ext4_da_write_begin+0x17d/0x290 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff81102c0e>] generic_file_buffered_write+0x10e/0x2a0
May 19 15:35:56 landin kernel: [<ffffffff811047a0>] __generic_file_aio_write+0x250/0x480
May 19 15:35:56 landin kernel: [<ffffffff81104a3f>] generic_file_aio_write+0x6f/0xe0
May 19 15:35:56 landin kernel: [<ffffffffa0096160>] ? ext4_file_write+0x0/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa0096199>] ext4_file_write+0x39/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8115e29b>] do_sync_readv_writev+0xfb/0x140
May 19 15:35:56 landin kernel: [<ffffffff813edb2f>] ? release_sock+0xaf/0xc0
May 19 15:35:56 landin kernel: [<ffffffff8108dc00>] ? autoremove_wake_function+0x0/0x40
May 19 15:35:56 landin kernel: [<ffffffff811f967b>] ? selinux_file_permission+0xfb/0x150
May 19 15:35:56 landin kernel: [<ffffffff811ec646>] ? security_file_permission+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff8115f24f>] do_readv_writev+0xcf/0x1f0
May 19 15:35:56 landin kernel: [<ffffffff81120ea9>] ? kmemdup+0x29/0x50
May 19 15:35:56 landin kernel: [<ffffffff81096396>] ? groups_alloc+0x46/0xf0
May 19 15:35:56 landin kernel: [<ffffffff811ec9c6>] ? security_task_setgroups+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff81096165>] ? set_groups+0x25/0x1a0
May 19 15:35:56 landin kernel: [<ffffffff8115f3b6>] vfs_writev+0x46/0x60
May 19 15:35:56 landin kernel: [<ffffffffa0271fd0>] nfsd_vfs_write+0xe0/0x440 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa02707b2>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff810d4a15>] ? call_rcu_sched+0x15/0x20
May 19 15:35:56 landin kernel: [<ffffffffa0271aa1>] ? nfsd_permission+0x131/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0274589>] nfsd_write+0x99/0x100 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027f4d0>] nfsd4_write+0x100/0x130 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027fe51>] nfsd4_proc_compound+0x3d1/0x4d0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026d3fa>] nfsd_dispatch+0xba/0x250 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0208ec4>] svc_process_common+0x344/0x610 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa02094d0>] svc_process+0x110/0x150 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa026daf6>] nfsd+0xd6/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026da20>] ? nfsd+0x0/0x190 [nfsd]

FWIW the memory on both machines survives memtest86+.
Comment 3 Josef Bacik 2010-05-19 11:43:16 EDT

*** This bug has been marked as a duplicate of bug 576202 ***

Note You need to log in before you can comment on or make changes to this bug.