Bug 229780
| Summary: | kernel panic when attempting to mount nfs4 filesystem twice | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Jeff Layton <jlayton> |
| Component: | kernel | Assignee: | Jeff Layton <jlayton> |
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.4 | CC: | jbaron, staubach, steved |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2007-05-08 04:53:07 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 234547 | ||
| Attachments: | |||
To clarify, I've also seen the same panic on a stock -48 xenU kernel. I just tried the patch in 226983 to see if it might fix this as well, but it didn't. Same panic on i686 xen guest as well:
general protection fault: 0000 [#1]
SMP
Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 sunrpc dm_mirror dm_mod
xennet ext3 jbd xenblk sd_mod scsi_mod
CPU: 0
EIP: 0061:[<c01424a3>] Not tainted VLI
EFLAGS: 00010286 (2.6.9-48.ELxenU)
EIP is at free_percpu+0x17/0x29
eax: ffffffff ebx: 00000000 ecx: df357000 edx: f5392000
esi: ffffffff edi: c1627200 ebp: dcf621a0 esp: df357e98
ds: 007b es: 007b ss: 0068
Process mount (pid: 2556, threadinfo=df357000 task=de933970)
Stack: c16244f8 c1624400 e1231c45 00000000 dceef000 00000000 c1630980 e125f7c0
c015eb35 e125f7c0 00000000 dceef000 dcf16000 dded5000 dd2bc000 dcf16000
00000015 dceef000 c0172c9d dd2bc000 00000000 dceef000 dcf16000 00000000
Call Trace:
[<e1231c45>] nfs4_get_sb+0x265/0x275 [nfs]
[<c015eb35>] do_kern_mount+0x85/0x143
[<c0172c9d>] do_new_mount+0x67/0xa4
[<c01732ea>] do_mount+0x15f/0x179
[<c0107507>] error_code+0x2b/0x30
[<c026a298>] iret_exc+0xeb4/0x159c
[<c0173140>] copy_mount_options+0x49/0x94
[<c0173655>] sys_mount+0x9b/0x115
[<c010737f>] syscall_call+0x7/0xb
Code: 0e 80 3a 00 74 09 5b 5e 5f 5d e9 b1 4b 0b 00 5b 5e 5f 5d c3 56 53 8b 74 24
0c 31 db f7 d6 0f a3 1d 24 59 3a c0 19 c0 85 c0 74 09 <ff> 34 9e e8 55 ff ff ff
59 43 83 fb 1f 7e e4 5b 5e c3 8b 44 24
<0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception
It looks like -42.26 does not panic on i686, so my guess is that this is a
regression introduced somewhere between those two releases. I'll see if I can
confirm when it was introduced.
Looks like this was introduced in -42.27. The most likely culprit is the nfs-stats patch detailed here: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199263 I'm going to try backing this out and seeing if it fixes the problem. Created attachment 148684 [details]
patch to check for NULL pointer before freeing
This patch corrected the oops. nfs4_get_sb needs to check if server->io_stats
is NULL before trying to free it.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This bugzilla has Keywords: Regression. Since no regressions are allowed between releases, it is also being proposed as a blocker for this release. Please resolve ASAP. Created attachment 148816 [details] backported upstream patch for same problem Actually, this patch, backported from here might be better: http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=01d0ae8beaee75d954900109619b700fe68707d9 This looks like it fixes the original problem and also addresses Peter's concerns. In addition to what was in the upstream patch, I also added a call to the error path of nfs_sb_init. It looked like if getting a root inode or dentry failed then the iostats would leak. Created attachment 148838 [details]
updated patch -- remove added nfs_free_iostats that would have caused double-free
Peter pointed out that that nfs_free_iostats that I added could cause a double
free, since kill_sb gets called in an error condition anyway. This patch gets
rid of that and should be pretty much the same as what the upstream patch was.
Created attachment 149372 [details]
patch -- don't free NULL pointer on error, also dont leak iostats
This patch should fix the problem as well, and doesn't pull in the changes to
nfs_sb_init.
committed in stream U5 build 50. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ Patch is in -52, already verified by at least one partner. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |
Easily reproducable kernel panic -- mount a nfsv4 filesystem, and then attempt to mount it again. For instance, run this twice: # mount -t nfs4 server:/ /mnt/server Oops looks like this: Unable to handle kernel paging request at ffffffffffffffff RIP: <ffffffff8015bd5a>{free_percpu+24} PML4 103067 PGD 1727067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5 auth_rpcgss des sunrpc loop xennet dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod xenblk sd_mod scsi_mod Pid: 2133, comm: mount Not tainted 2.6.9-48.EL.mntcrash.1xenU RIP: e030:[<ffffffff8015bd5a>] <ffffffff8015bd5a>{free_percpu+24} RSP: e02b:ffffff801b7d5c18 EFLAGS: 00010286 RAX: 00000000ffffffff RBX: ffffffffffffffff RCX: ffffff801fe2be00 RDX: ffffff8001000000 RSI: 0000000000000042 RDI: 0000000000000000 RBP: 0000000000000000 R08: 00000000c43910ac R09: ffffff801fe2be00 R10: ffffff801fe2be00 R11: ffffff801fe2be00 R12: 0000000000000000 R13: ffffff801b037000 R14: ffffffffa019a1e0 R15: ffffff801b029000 FS: 0000002a95573b00(0000) GS:ffffffff8041d700(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 Process mount (pid: 2133, threadinfo ffffff801b7d4000, task ffffff801bdab030) Stack: ffffffffff5fd000 ffffff801fe2be00 ffffffffa019a1e0 ffffffffa016561c ffffff801fe03400 ffffffff801ce4de 0000000000000000 0000000000000000 0000000000000000 0000000000000000 Call Trace:<ffffffffa016561c>{:nfs:nfs4_get_sb+1759} <ffffffff801ce4de>{selinux_sb_copy_data+47} <ffffffff8017b02d>{do_kern_mount+161} <ffffffff80190d33>{do_mount+1690} <ffffffff802360c9>{sock_common_recvmsg+48} <ffffffff80232d8a>{sock_aio_read+297} <ffffffff80265895>{tcp_transmit_skb+2037} <ffffffff80235e1f>{sk_reset_timer+15} <ffffffff802663b0>{tcp_write_xmit+314} <ffffffff80156ff4>{buffered_rmqueue+384} <ffffffff8010ddc3>{error_exit+0} <ffffffff801571e4>{__alloc_pages+200} <ffffffff801910d6>{sys_mount+186} <ffffffff8010d66e>{system_call+134} <ffffffff8010d5e8>{system_call+0} Code: 48 8b 3b e8 9d f4 ff ff ff c5 48 83 c3 08 83 fd 1f 7e e0 58 RIP <ffffffff8015bd5a>{free_percpu+24} RSP <ffffff801b7d5c18> CR2: ffffffffffffffff <0>Kernel panic - not syncing: Oops reproduced so far on an x86_64 xen guest running a -48.EL kernel with the patch for bz 226983. Not certain yet if other arches are affected.