Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 660580

Summary: [REG][5.6] kernel panic occurs by writing a file on optional mount "sync/noac" of NFSv4.
Product: Red Hat Enterprise Linux 5 Reporter: Masayoshi Yamazaki <myamazak>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Petr Beňas <pbenas>
Severity: high Docs Contact:
Priority: urgent    
Version: 5.6CC: bfields, dhoward, dhowells, jlayton, jpirko, pbenas, pstehlik, rwheeler, sprabhu, steved, yanwang
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 22:03:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 502912, 640580, 663381    
Attachments:
Description Flags
patch -- set lock_context field in nfs_writepage_sync none

Comment 4 Sachin Prabhu 2010-12-07 15:14:47 UTC
This backtrace information is as follows.

crash> bt
PID: 9181 TASK: ffff8100bec32860 CPU: 0 COMMAND: "runhello.sh"
#0 [ffff8100a05316f0] crash_kexec at ffffffff800af83a
#1 [ffff8100a05317b0] __die at ffffffff80065117
#2 [ffff8100a05317f0] do_page_fault at ffffffff8006748d
#3 [ffff8100a05318e0] error_exit at ffffffff8005dde9
[exception RIP: encode_stateid+78]
RIP: ffffffff888cf4fc RSP: ffff8100a0531998 RFLAGS: 00010282
RAX: ffff8100984bb098 RBX: 0000000000000000 RCX: ffff8100984bb098
RDX: 0000000000000010 RSI: ffff8100acee8840 RDI: ffff8100a05319c8
RBP: ffff8100984bb098 R8: ffff8100984bb094 R9: 0000000000000004
R10: ffff8100aac6be00 R11: ffffffff888cf531 R12: ffff8100acf51bc0
R13: ffff81009efc8000 R14: ffff81008e98fa80 R15: ffff8100acf51bc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#4 [ffff8100a05319c0] nfs4_xdr_enc_write at ffffffff888cf5dc
#5 [ffff8100a0531a20] call_transmit at ffffffff886b33f0
#6 [ffff8100a0531a50] __rpc_execute at ffffffff886b8ae4
#7 [ffff8100a0531a70] rpc_call_sync at ffffffff886b3aa5
#8 [ffff8100a0531aa0] nfs4_proc_write at ffffffff888c5b71
#9 [ffff8100a0531b00] nfs_writepage_sync at ffffffff888be6e3
#10 [ffff8100a0531b50] nfs_updatepage at ffffffff888be865
#11 [ffff8100a0531b90] nfs_write_end at ffffffff888b4074
#12 [ffff8100a0531bc0] generic_file_buffered_write at ffffffff8000fe24
#13 [ffff8100a0531cc0] __generic_file_aio_write_nolock at ffffffff800166e8
#14 [ffff8100a0531d70] generic_file_aio_write at ffffffff8002187e
#15 [ffff8100a0531dc0] nfs_file_write at ffffffff888b4805
#16 [ffff8100a0531e00] do_sync_write at ffffffff80018338
#17 [ffff8100a0531f10] vfs_write at ffffffff80016af0
#18 [ffff8100a0531f40] sys_write at ffffffff800173a8
#19 [ffff8100a0531f80] system_call at ffffffff8005d116
RIP: 00000037c86c6420 RSP: 00007fff3c32bf50 RFLAGS: 00000206
RAX: 0000000000000001 RBX: ffffffff8005d116 RCX: 00000037c86c5ca0
RDX: 0000000000000008 RSI: 00002b1f34e33000 RDI: 0000000000000001
RBP: 0000000000000008 R8: 00000000ffffffff R9: 00002b1f3185ef50
R10: 000000000000001a R11: 0000000000000246 R12: 00000037c8952780
R13: 00002b1f34e33000 R14: 0000000000000008 R15: 00000000bdc046c0
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
crash>

Comment 5 Jeff Layton 2010-12-07 15:24:02 UTC
Looks like it crashed here:

nfs4_copy_stateid(&stateid, ctx->state, l_ctx->lockowner, l_ctx->pid);

...probably indicating that either ctx or l_ctx is NULL. I can reproduce and I think this could be related to the other regressions from the lockowner patches.

Comment 6 Jeff Layton 2010-12-07 15:53:37 UTC
Created attachment 465259 [details]
patch -- set lock_context field in nfs_writepage_sync

This patch seems to fix the problem for me. Would you be able to test it too?

The problem was that the patches to add the lock_context fixes didn't account for nfs_writepage_sync. Upstream, this function has been removed altogether, but it currently remains in RHEL5.

Comment 7 RHEL Program Management 2010-12-07 16:19:46 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 13 Jarod Wilson 2010-12-14 14:28:45 UTC
in kernel-2.6.18-237.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 15 Petr Beňas 2010-12-14 15:42:51 UTC
Reproduced in 2.6.18-236.el5 and verified in 2.6.18.237.el5.

Comment 24 errata-xmlrpc 2011-01-13 22:03:14 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html