Bug 229780

Summary: kernel panic when attempting to mount nfs4 filesystem twice
Product: Red Hat Enterprise Linux 4 Reporter: Jeff Layton <jlayton>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: jbaron, staubach, steved
Target Milestone: ---Keywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2007-0304 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-05-08 04:53:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 234547    
Attachments:
Description Flags
patch to check for NULL pointer before freeing
none
backported upstream patch for same problem
none
updated patch -- remove added nfs_free_iostats that would have caused double-free
none
patch -- don't free NULL pointer on error, also dont leak iostats none

Description Jeff Layton 2007-02-23 14:33:01 UTC
Easily reproducable kernel panic -- mount a nfsv4 filesystem, and then attempt
to mount it again.

For instance, run this twice:

# mount -t nfs4 server:/ /mnt/server

Oops looks like this:

Unable to handle kernel paging request at ffffffffffffffff RIP: 
<ffffffff8015bd5a>{free_percpu+24}
PML4 103067 PGD 1727067 PMD 0 
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5
auth_rpcgss des sunrpc loop xennet dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
xenblk sd_mod scsi_mod
Pid: 2133, comm: mount Not tainted 2.6.9-48.EL.mntcrash.1xenU
RIP: e030:[<ffffffff8015bd5a>] <ffffffff8015bd5a>{free_percpu+24}
RSP: e02b:ffffff801b7d5c18  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffffffffffffffff RCX: ffffff801fe2be00
RDX: ffffff8001000000 RSI: 0000000000000042 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000c43910ac R09: ffffff801fe2be00
R10: ffffff801fe2be00 R11: ffffff801fe2be00 R12: 0000000000000000
R13: ffffff801b037000 R14: ffffffffa019a1e0 R15: ffffff801b029000
FS:  0000002a95573b00(0000) GS:ffffffff8041d700(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process mount (pid: 2133, threadinfo ffffff801b7d4000, task ffffff801bdab030)
Stack: ffffffffff5fd000 ffffff801fe2be00 ffffffffa019a1e0 ffffffffa016561c 
       ffffff801fe03400 ffffffff801ce4de 0000000000000000 0000000000000000 
       0000000000000000 0000000000000000 
Call Trace:<ffffffffa016561c>{:nfs:nfs4_get_sb+1759}
<ffffffff801ce4de>{selinux_sb_copy_data+47} 
       <ffffffff8017b02d>{do_kern_mount+161} <ffffffff80190d33>{do_mount+1690} 
       <ffffffff802360c9>{sock_common_recvmsg+48}
<ffffffff80232d8a>{sock_aio_read+297} 
       <ffffffff80265895>{tcp_transmit_skb+2037}
<ffffffff80235e1f>{sk_reset_timer+15} 
       <ffffffff802663b0>{tcp_write_xmit+314}
<ffffffff80156ff4>{buffered_rmqueue+384} 
       <ffffffff8010ddc3>{error_exit+0} <ffffffff801571e4>{__alloc_pages+200} 
       <ffffffff801910d6>{sys_mount+186} <ffffffff8010d66e>{system_call+134} 
       <ffffffff8010d5e8>{system_call+0} 

Code: 48 8b 3b e8 9d f4 ff ff ff c5 48 83 c3 08 83 fd 1f 7e e0 58 
RIP <ffffffff8015bd5a>{free_percpu+24} RSP <ffffff801b7d5c18>
CR2: ffffffffffffffff
 <0>Kernel panic - not syncing: Oops

reproduced so far on an x86_64 xen guest running a -48.EL kernel with the patch
for bz 226983. Not certain yet if other arches are affected.

Comment 1 Jeff Layton 2007-02-23 14:53:01 UTC
To clarify, I've also seen the same panic on a stock -48 xenU kernel. I just
tried the patch in 226983 to see if it might fix this as well, but it didn't.


Comment 2 Jeff Layton 2007-02-23 15:15:21 UTC
Same panic on i686 xen guest as well:

general protection fault: 0000 [#1]
SMP 
Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 sunrpc dm_mirror dm_mod
xennet ext3 jbd xenblk sd_mod scsi_mod
CPU:    0
EIP:    0061:[<c01424a3>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-48.ELxenU) 
EIP is at free_percpu+0x17/0x29
eax: ffffffff   ebx: 00000000   ecx: df357000   edx: f5392000
esi: ffffffff   edi: c1627200   ebp: dcf621a0   esp: df357e98
ds: 007b   es: 007b   ss: 0068
Process mount (pid: 2556, threadinfo=df357000 task=de933970)
Stack: c16244f8 c1624400 e1231c45 00000000 dceef000 00000000 c1630980 e125f7c0 
       c015eb35 e125f7c0 00000000 dceef000 dcf16000 dded5000 dd2bc000 dcf16000 
       00000015 dceef000 c0172c9d dd2bc000 00000000 dceef000 dcf16000 00000000 
Call Trace:
 [<e1231c45>] nfs4_get_sb+0x265/0x275 [nfs]
 [<c015eb35>] do_kern_mount+0x85/0x143
 [<c0172c9d>] do_new_mount+0x67/0xa4
 [<c01732ea>] do_mount+0x15f/0x179
 [<c0107507>] error_code+0x2b/0x30
 [<c026a298>] iret_exc+0xeb4/0x159c
 [<c0173140>] copy_mount_options+0x49/0x94
 [<c0173655>] sys_mount+0x9b/0x115
 [<c010737f>] syscall_call+0x7/0xb
Code: 0e 80 3a 00 74 09 5b 5e 5f 5d e9 b1 4b 0b 00 5b 5e 5f 5d c3 56 53 8b 74 24
0c 31 db f7 d6 0f a3 1d 24 59 3a c0 19 c0 85 c0 74 09 <ff> 34 9e e8 55 ff ff ff
59 43 83 fb 1f 7e e4 5b 5e c3 8b 44 24 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

It looks like -42.26 does not panic on i686, so my guess is that this is a
regression introduced somewhere between those two releases. I'll see if I can
confirm when it was introduced.


Comment 3 Jeff Layton 2007-02-23 16:32:41 UTC
Looks like this was introduced in -42.27. The most likely culprit is the
nfs-stats patch detailed here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199263

I'm going to try backing this out and seeing if it fixes the problem.


Comment 4 Jeff Layton 2007-02-23 17:33:12 UTC
Created attachment 148684 [details]
patch to check for NULL pointer before freeing

This patch corrected the oops. nfs4_get_sb needs to check if server->io_stats
is NULL before trying to free it.

Comment 6 RHEL Program Management 2007-02-23 17:44:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Program Management 2007-02-23 17:44:52 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 9 Jeff Layton 2007-02-26 17:31:21 UTC
Created attachment 148816 [details]
backported upstream patch for same problem

Actually, this patch, backported from here might be better:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=01d0ae8beaee75d954900109619b700fe68707d9


This looks like it fixes the original problem and also addresses Peter's
concerns.

In addition to what was in the upstream patch, I also added a call to the error
path of nfs_sb_init. It looked like if getting a root inode or dentry failed
then the iostats would leak.

Comment 13 Jeff Layton 2007-02-26 23:03:35 UTC
Created attachment 148838 [details]
updated patch -- remove added nfs_free_iostats that would have caused double-free

Peter pointed out that that nfs_free_iostats that I added could cause a double
free, since kill_sb gets called in an error condition anyway. This patch gets
rid of that and should be pretty much the same as what the upstream patch was.

Comment 14 Jeff Layton 2007-03-06 19:46:00 UTC
Created attachment 149372 [details]
patch -- don't free NULL pointer on error, also dont leak iostats

This patch should fix the problem as well, and doesn't pull in the changes to
nfs_sb_init.

Comment 15 Jason Baron 2007-03-07 19:13:57 UTC
committed in stream U5 build 50. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/


Comment 18 Mike Gahagan 2007-04-02 17:40:35 UTC
Patch is in -52, already verified by at least one partner.


Comment 21 Red Hat Bugzilla 2007-05-08 04:53:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html