Red Hat Bugzilla – Bug 486645
[NFS] Bug in shared page handling over NFS
Last modified: 2016-05-22 19:28:00 EDT
Description of problem: A bug exists in the NFS implementation within the 2.6.24 kernel under certain conditions where multiple nodes have the same shared file mapped at the same time. This can result in a backtrace similar to: Jan 22 08:06:33 lft0104 kernel: ------------[ cut here ]------------ Jan 22 08:06:33 lft0104 kernel: kernel BUG at fs/nfs/pagelist.c:82! Jan 22 08:06:33 lft0104 kernel: invalid opcode: 0000 [1] PREEMPT SMP Jan 22 08:06:33 lft0104 kernel: CPU 3 Jan 22 08:06:33 lft0104 kernel: Modules linked in: nfs lockd nfs_acl autofs4 sunrpc bonding dm_multipath video output sbs sbshc battery ac parport_pc lp parport tg3 bnx2 button serio_raw ipmi_si ipmi_msghandler iTCO_wdt i5000_edac iTCO_vendor_support edac_core pcspkr shpchp dm_snapshot dm_zero dm_mirror dm_mod cciss sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd Jan 22 08:06:33 lft0104 kernel: Pid: 3386, comm: tail Not tainted 2.6.24.7-81.el5rt #1 Jan 22 08:06:33 lft0104 kernel: RIP: 0010:[<ffffffff8820ea73>] [<ffffffff8820ea73>] :nfs:nfs_create_request+0xd0/0x129 Jan 22 08:06:33 lft0104 kernel: RSP: 0018:ffff81076c091bc8 EFLAGS: 00010202 Jan 22 08:06:33 lft0104 kernel: RAX: 0000000000000821 RBX: ffff81075a9daec0 RCX: 0000000000000000 Jan 22 08:06:33 lft0104 kernel: RDX: ffffe2002c705580 RSI: 0000000000000000 RDI: ffff81075a9daf18 Jan 22 08:06:33 lft0104 kernel: RBP: ffff81076c091c08 R08: 0000000000000000 R09: ffff81075a9daec0 Jan 22 08:06:33 lft0104 kernel: R10: 0000000000000040 R11: 0000000000000040 R12: ffff81075a9daec0 Jan 22 08:06:33 lft0104 kernel: R13: ffffe2002c705580 R14: ffff81077dc5bb80 R15: 0000000000000000 Jan 22 08:06:33 lft0104 kernel: FS: 00002af5bdce26e0(0000) GS:ffff81082c9453c0(0000) knlGS:0000000000000000 Jan 22 08:06:33 lft0104 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jan 22 08:06:33 lft0104 kernel: CR2: 0000000005d36010 CR3: 000000076c1ec000 CR4: 00000000000006e0 Jan 22 08:06:33 lft0104 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 22 08:06:33 lft0104 kernel: DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400 Jan 22 08:06:33 lft0104 kernel: Process tail (pid: 3386, threadinfo ffff81076c090000, task ffff81075a978b20) Jan 22 08:06:33 lft0104 kernel: Stack: 0000000000000000 000008770006f000 ffff810829dfd7c0 ffff8107ea1a4c40 Jan 22 08:06:33 lft0104 kernel: ffffe2002c705580 ffff81077dc5bb80 0000000000000877 0000000000000000 Jan 22 08:06:33 lft0104 kernel: ffff81076c091c68 ffffffff88210e76 ffff81076c091c38 ffff810829dfd7c0 Jan 22 08:06:33 lft0104 kernel: Call Trace: Jan 22 08:06:33 lft0104 kernel: [<ffffffff88210e76>] :nfs:nfs_readpage+0x1ce/0x2c0 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81084c7e>] do_generic_mapping_read+0x1f0/0x335 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81083c61>] ? file_read_actor+0x0/0x12a Jan 22 08:06:33 lft0104 kernel: [<ffffffff810864b1>] generic_file_aio_read+0x122/0x161 Jan 22 08:06:33 lft0104 kernel: [<ffffffff88208ce3>] :nfs:nfs_file_read+0x126/0x135 Jan 22 08:06:33 lft0104 kernel: [<ffffffff810afd59>] do_sync_read+0xe2/0x126 Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b349c>] ? cp_new_stat+0xf6/0x10f Jan 22 08:06:33 lft0104 kernel: [<ffffffff8105158f>] ? autoremove_wake_function+0x0/0x38 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81077905>] ? audit_syscall_entry+0x148/0x17e Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b05e6>] vfs_read+0xc4/0x16d Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b0a14>] sys_read+0x4a/0x75 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8100c37e>] traceret+0x0/0x5 Jan 22 08:06:33 lft0104 kernel: Jan 22 08:06:33 lft0104 kernel: Jan 22 08:06:33 lft0104 kernel: Code: 18 01 00 00 02 74 09 48 c7 c3 00 fe ff ff eb 62 e8 f2 6d 07 f9 e9 71 ff ff ff 49 8b 55 10 f0 ff 42 08 41 8b 45 00 f6 c4 08 74 04 <0f> 0b eb fe 41 8b 45 00 a8 01 75 04 0f 0b eb fe 49 8b 45 18 4c Jan 22 08:06:33 lft0104 kernel: RIP [<ffffffff8820ea73>] :nfs:nfs_create_request+0xd0/0x129 Jan 22 08:06:33 lft0104 kernel: RSP <ffff81076c091bc8> Jan 22 08:06:33 lft0104 kernel: Kernel panic - not syncing: Fatal exception Jan 22 08:06:33 lft0104 kernel: Pid: 3386, comm: tail Tainted: G D 2.6.24.7-81.el5rt #1 Jan 22 08:06:33 lft0104 kernel: Jan 22 08:06:33 lft0104 kernel: Call Trace: Jan 22 08:06:33 lft0104 kernel: [<ffffffff8103ca9c>] panic+0xaf/0x160 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8100c846>] ? retint_kernel+0x26/0x30 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81288054>] ? oops_end+0x3d/0x5d Jan 22 08:06:33 lft0104 kernel: [<ffffffff8128806b>] oops_end+0x54/0x5d Jan 22 08:06:33 lft0104 kernel: [<ffffffff8100db96>] die+0x4a/0x54 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8128847e>] do_trap+0x101/0x110 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8100e0a7>] do_invalid_op+0x93/0x9c Jan 22 08:06:33 lft0104 kernel: [<ffffffff8820ea73>] ? :nfs:nfs_create_request+0xd0/0x129 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81083589>] ? cpupri_set+0xc5/0xd8 Jan 22 08:06:33 lft0104 kernel: [<ffffffff88211e52>] ? :nfs:nfs_scan_commit+0x28/0x3b Jan 22 08:06:33 lft0104 kernel: [<ffffffff81287cf9>] error_exit+0x0/0x51 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8820ea73>] ? :nfs:nfs_create_request+0xd0/0x129 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8820e9fe>] ? :nfs:nfs_create_request+0x5b/0x129 Jan 22 08:06:33 lft0104 kernel: [<ffffffff88210e76>] ? :nfs:nfs_readpage+0x1ce/0x2c0 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81084c7e>] ? do_generic_mapping_read+0x1f0/0x335 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81083c61>] ? file_read_actor+0x0/0x12a Jan 22 08:06:33 lft0104 kernel: [<ffffffff810864b1>] ? generic_file_aio_read+0x122/0x161 Jan 22 08:06:33 lft0104 kernel: [<ffffffff88208ce3>] ? :nfs:nfs_file_read+0x126/0x135 Jan 22 08:06:33 lft0104 kernel: [<ffffffff810afd59>] ? do_sync_read+0xe2/0x126 Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b349c>] ? cp_new_stat+0xf6/0x10f Jan 22 08:06:33 lft0104 kernel: [<ffffffff8105158f>] ? autoremove_wake_function+0x0/0x38 Jan 22 08:06:33 lft0104 kernel: [<ffffffff81077905>] ? audit_syscall_entry+0x148/0x17e Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b05e6>] ? vfs_read+0xc4/0x16d Jan 22 08:06:33 lft0104 kernel: [<ffffffff810b0a14>] ? sys_read+0x4a/0x75 Jan 22 08:06:33 lft0104 kernel: [<ffffffff8100c37e>] ? traceret+0x0/0x5
Created attachment 332750 [details] Fix to nfs_wb_page The attached fix is a direct backport from the upstream fix for this bug.
Patch in -104.
Found JCM's patch as mrg-rt-v1.git commit d852bb3a0ff425a644b03c1fcbf513737d8f99fe implemented in 2.6.24.7-107. Upstream patch discussed here: http://www.spinics.net/lists/linux-nfs/msg00298.html
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0360.html