Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 486645

Summary: [NFS] Bug in shared page handling over NFS
Product: Red Hat Enterprise MRG Reporter: Jon Masters <jcm>
Component: realtime-kernelAssignee: Jon Masters <jcm>
Status: CLOSED ERRATA QA Contact: David Sommerseth <davids>
Severity: high Docs Contact:
Priority: urgent    
Version: 1.1CC: bhu, jcm, ovasik
Target Milestone: 1.1.1   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-03-27 00:15:03 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix to nfs_wb_page none

Description Jon Masters 2009-02-20 20:25:18 UTC
Description of problem:

A bug exists in the NFS implementation within the 2.6.24 kernel under certain conditions where multiple nodes have the same shared file mapped at the same time. This can result in a backtrace similar to:

Jan 22 08:06:33 lft0104 kernel: ------------[ cut here ]------------
Jan 22 08:06:33 lft0104 kernel: kernel BUG at fs/nfs/pagelist.c:82!
Jan 22 08:06:33 lft0104 kernel: invalid opcode: 0000 [1] PREEMPT SMP 
Jan 22 08:06:33 lft0104 kernel: CPU 3 
Jan 22 08:06:33 lft0104 kernel: Modules linked in: nfs lockd nfs_acl autofs4 sunrpc bonding dm_multipath video output sbs sbshc battery ac parport_pc lp parport tg3 bnx2 button serio_raw ipmi_si ipmi_msghandler iTCO_wdt i5000_edac iTCO_vendor_support edac_core pcspkr shpchp dm_snapshot dm_zero dm_mirror dm_mod cciss sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd
Jan 22 08:06:33 lft0104 kernel: Pid: 3386, comm: tail Not tainted 2.6.24.7-81.el5rt #1
Jan 22 08:06:33 lft0104 kernel: RIP: 0010:[<ffffffff8820ea73>]  [<ffffffff8820ea73>] :nfs:nfs_create_request+0xd0/0x129
Jan 22 08:06:33 lft0104 kernel: RSP: 0018:ffff81076c091bc8  EFLAGS: 00010202
Jan 22 08:06:33 lft0104 kernel: RAX: 0000000000000821 RBX: ffff81075a9daec0 RCX: 0000000000000000
Jan 22 08:06:33 lft0104 kernel: RDX: ffffe2002c705580 RSI: 0000000000000000 RDI: ffff81075a9daf18
Jan 22 08:06:33 lft0104 kernel: RBP: ffff81076c091c08 R08: 0000000000000000 R09: ffff81075a9daec0
Jan 22 08:06:33 lft0104 kernel: R10: 0000000000000040 R11: 0000000000000040 R12: ffff81075a9daec0
Jan 22 08:06:33 lft0104 kernel: R13: ffffe2002c705580 R14: ffff81077dc5bb80 R15: 0000000000000000
Jan 22 08:06:33 lft0104 kernel: FS:  00002af5bdce26e0(0000) GS:ffff81082c9453c0(0000) knlGS:0000000000000000
Jan 22 08:06:33 lft0104 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jan 22 08:06:33 lft0104 kernel: CR2: 0000000005d36010 CR3: 000000076c1ec000 CR4: 00000000000006e0
Jan 22 08:06:33 lft0104 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jan 22 08:06:33 lft0104 kernel: DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Jan 22 08:06:33 lft0104 kernel: Process tail (pid: 3386, threadinfo ffff81076c090000, task ffff81075a978b20)
Jan 22 08:06:33 lft0104 kernel: Stack:  0000000000000000 000008770006f000 ffff810829dfd7c0 ffff8107ea1a4c40
Jan 22 08:06:33 lft0104 kernel:  ffffe2002c705580 ffff81077dc5bb80 0000000000000877 0000000000000000
Jan 22 08:06:33 lft0104 kernel:  ffff81076c091c68 ffffffff88210e76 ffff81076c091c38 ffff810829dfd7c0
Jan 22 08:06:33 lft0104 kernel: Call Trace:
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff88210e76>] :nfs:nfs_readpage+0x1ce/0x2c0
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81084c7e>] do_generic_mapping_read+0x1f0/0x335
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81083c61>] ? file_read_actor+0x0/0x12a
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810864b1>] generic_file_aio_read+0x122/0x161
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff88208ce3>] :nfs:nfs_file_read+0x126/0x135
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810afd59>] do_sync_read+0xe2/0x126
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b349c>] ? cp_new_stat+0xf6/0x10f
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8105158f>] ? autoremove_wake_function+0x0/0x38
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81077905>] ? audit_syscall_entry+0x148/0x17e
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b05e6>] vfs_read+0xc4/0x16d
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b0a14>] sys_read+0x4a/0x75
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8100c37e>] traceret+0x0/0x5
Jan 22 08:06:33 lft0104 kernel: 
Jan 22 08:06:33 lft0104 kernel: 
Jan 22 08:06:33 lft0104 kernel: Code: 18 01 00 00 02 74 09 48 c7 c3 00 fe ff ff eb 62 e8 f2 6d 07 f9 e9 71 ff ff ff 49 8b 55 10 f0 ff 42 08 41 8b 45 00 f6 c4 08 74 04 <0f> 0b eb fe 41 8b 45 00 a8 01 75 04 0f 0b eb fe 49 8b 45 18 4c 
Jan 22 08:06:33 lft0104 kernel: RIP  [<ffffffff8820ea73>] :nfs:nfs_create_request+0xd0/0x129
Jan 22 08:06:33 lft0104 kernel:  RSP <ffff81076c091bc8>
Jan 22 08:06:33 lft0104 kernel: Kernel panic - not syncing: Fatal exception
Jan 22 08:06:33 lft0104 kernel: Pid: 3386, comm: tail Tainted: G      D  2.6.24.7-81.el5rt #1
Jan 22 08:06:33 lft0104 kernel: 
Jan 22 08:06:33 lft0104 kernel: Call Trace:
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8103ca9c>] panic+0xaf/0x160
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8100c846>] ? retint_kernel+0x26/0x30
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81288054>] ? oops_end+0x3d/0x5d
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8128806b>] oops_end+0x54/0x5d
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8100db96>] die+0x4a/0x54
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8128847e>] do_trap+0x101/0x110
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8100e0a7>] do_invalid_op+0x93/0x9c
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8820ea73>] ? :nfs:nfs_create_request+0xd0/0x129
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81083589>] ? cpupri_set+0xc5/0xd8
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff88211e52>] ? :nfs:nfs_scan_commit+0x28/0x3b
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81287cf9>] error_exit+0x0/0x51
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8820ea73>] ? :nfs:nfs_create_request+0xd0/0x129
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8820e9fe>] ? :nfs:nfs_create_request+0x5b/0x129
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff88210e76>] ? :nfs:nfs_readpage+0x1ce/0x2c0
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81084c7e>] ? do_generic_mapping_read+0x1f0/0x335
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81083c61>] ? file_read_actor+0x0/0x12a
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810864b1>] ? generic_file_aio_read+0x122/0x161
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff88208ce3>] ? :nfs:nfs_file_read+0x126/0x135
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810afd59>] ? do_sync_read+0xe2/0x126
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b349c>] ? cp_new_stat+0xf6/0x10f
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8105158f>] ? autoremove_wake_function+0x0/0x38
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff81077905>] ? audit_syscall_entry+0x148/0x17e
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b05e6>] ? vfs_read+0xc4/0x16d
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff810b0a14>] ? sys_read+0x4a/0x75
Jan 22 08:06:33 lft0104 kernel:  [<ffffffff8100c37e>] ? traceret+0x0/0x5

Comment 1 Jon Masters 2009-02-20 20:27:07 UTC
Created attachment 332750 [details]
Fix to nfs_wb_page

The attached fix is a direct backport from the upstream fix for this bug.

Comment 2 Beth Uptagrafft 2009-03-06 15:36:48 UTC
Patch in -104.

Comment 5 David Sommerseth 2009-03-24 15:32:06 UTC
Found JCM's patch as mrg-rt-v1.git commit d852bb3a0ff425a644b03c1fcbf513737d8f99fe implemented in 2.6.24.7-107.

Upstream patch discussed here: http://www.spinics.net/lists/linux-nfs/msg00298.html

Comment 7 errata-xmlrpc 2009-03-27 00:15:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0360.html