Bug 209419 - kernel BUG at fs/nfs/inode.c:1650!
Summary: kernel BUG at fs/nfs/inode.c:1650!
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Brian Brock
URL:
Whiteboard:
: 161868 235384 (view as bug list)
Depends On: 245197
Blocks: 234251 234587 245198
TreeView+ depends on / blocked
 
Reported: 2006-10-05 10:42 UTC by Rene Hennequin
Modified: 2018-10-19 22:43 UTC (History)
10 users (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:15:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patch -- increment and decrement i_count when getting and putting open state (748 bytes, patch)
2007-03-30 11:45 UTC, Jeff Layton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 0 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description Rene Hennequin 2006-10-05 10:42:24 UTC
Description of problem:

Kernel crash, see the log from /var/log/messages below

Oct  5 04:09:56 fieldarcher kernel: ------------[ cut here ]------------
Oct  5 04:09:56 fieldarcher kernel: kernel BUG at fs/nfs/inode.c:1650!
Oct  5 04:09:56 fieldarcher kernel: invalid operand: 0000 [#1]
Oct  5 04:09:56 fieldarcher kernel: SMP 
Oct  5 04:09:56 fieldarcher kernel: Modules linked in: nfs ide_scsi nfsd
exportfs lockd nfs_acl md5 ipv6 i2c_dev i2c_core sunrpc ipt
_LOG ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod
button battery ac pcnet32 mii floppy ext3 jbd mptsc
sih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod
Oct  5 04:09:56 fieldarcher kernel: CPU:    0
Oct  5 04:09:56 fieldarcher kernel: EIP:    0060:[<f89ac6d8>]    Not tainted VLI
Oct  5 04:09:56 fieldarcher kernel: EFLAGS: 00010202   (2.6.9-42.0.2.ELsmp) 
Oct  5 04:09:56 fieldarcher kernel: EIP is at nfs4_clear_inode+0x6b/0x90 [nfs]
Oct  5 04:09:56 fieldarcher kernel: eax: 00000001   ebx: ceafbe5c   ecx:
00000000   edx: ceafbec4
Oct  5 04:09:56 fieldarcher kernel: esi: f7daf980   edi: ceafbd1c   ebp:
c032de78   esp: f7cd7eb4
Oct  5 04:09:56 fieldarcher kernel: ds: 007b   es: 007b   ss: 0068
Oct  5 04:09:56 fieldarcher kernel: Process kswapd0 (pid: 41,
threadinfo=f7cd7000 task=f7d076f0)
Oct  5 04:09:56 fieldarcher kernel: Stack: ceafbe5c f7cd7ee8 0000000e c0170d78
ceafbe5c c0170df5 ede74164 ede7416c 
Oct  5 04:09:56 fieldarcher kernel:        00000054 c017117a 00000080 00000080
00000080 e23b9404 e4e8dbcc 00000000 
Oct  5 04:09:56 fieldarcher kernel:        00000083 00000000 f7ffe9a0 c01711e7
c01497f8 006aef00 00000000 0000002c 
Oct  5 04:09:56 fieldarcher kernel: Call Trace:
Oct  5 04:09:56 fieldarcher kernel:  [<c0170d78>] clear_inode+0xcc/0x102
Oct  5 04:09:56 fieldarcher kernel:  [<c0170df5>] dispose_list+0x47/0x6d
Oct  5 04:09:56 fieldarcher kernel:  [<c017117a>] prune_icache+0x193/0x1ec
Oct  5 04:09:56 fieldarcher kernel:  [<c01711e7>] shrink_icache_memory+0x14/0x2b
Oct  5 04:09:56 fieldarcher kernel:  [<c01497f8>] shrink_slab+0xf8/0x161
Oct  5 04:09:56 fieldarcher kernel:  [<c014aa93>] balance_pgdat+0x1e1/0x30e
Oct  5 04:09:56 fieldarcher kernel:  [<c02d26b1>] schedule+0x86d/0x8db
Oct  5 04:09:56 fieldarcher kernel:  [<c0120420>] prepare_to_wait+0x12/0x4c
Oct  5 04:09:56 fieldarcher kernel:  [<c014ac8a>] kswapd+0xca/0xcc
Oct  5 04:09:56 fieldarcher kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Oct  5 04:09:56 fieldarcher kernel:  [<c02d46da>] ret_from_fork+0x6/0x14
Oct  5 04:09:56 fieldarcher kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Oct  5 04:09:56 fieldarcher kernel:  [<c014abc0>] kswapd+0x0/0xcc
Oct  5 04:09:56 fieldarcher kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Oct  5 04:09:56 fieldarcher kernel: Code: 00 ff b3 c4 fe ff ff 05 60 01 00 00 ff
b3 c0 fe ff ff 50 68 e5 84 9c f8 68 e2 98 9c f8 e8 
cf 61 77 c7 83 c4 18 8b 46 60 48 74 08 <0f> 0b 72 06 74 94 9c f8 0f b7 56 5c 89
f0 e8 cd 7c 01 00 8d 87 
Oct  5 04:09:56 fieldarcher kernel:  <0>Fatal exception: panic in 5 seconds

Version-Release number of selected component (if applicable):


How reproducible:

The machine has a nfsv4 over tcp mount from a netapp filer. The kernel has
crasht 2 times with the same message. When the machine is rebooted, the problem
occurs after a week approximately.

Steps to Reproduce:
1. 
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Peter Staubach 2006-12-08 14:07:56 UTC
I don't suppose that you have any idea what the machine was doing
when it crashed these two times, do you?

Comment 2 Rene Hennequin 2006-12-08 14:21:09 UTC
The machine is running in vmware server, it runs mainly apache with php. The two
crashes we not at the same time of day. We switched to nfsv3, and the server
runs stable now.

Comment 3 Peter Staubach 2006-12-08 14:35:17 UTC
Well, I wouldn't know how to approach this without more information,
so I will close it.  If it reoccurs and there is more information for
me to use, then please reopen this bz and I will look at it again then.

Comment 6 Rene Hennequin 2007-03-26 13:10:41 UTC
I moved from nfs4 to nfs3, the problem hasn't occured again.

Comment 7 Jeff Layton 2007-03-26 13:18:16 UTC
Thanks Rene. We've had another person report this and I'm hoping they have a
reproducer, so I'm reopening this bug for that reason.

Comment 8 Jeff Layton 2007-03-27 12:14:16 UTC
I've been chasing a not too dissimilar race in bug 228292. Now that I have a
patch for that deadlock, it seems to have brought out at least one case of a
crash similar to this one, and a "VFS: Busy inodes after umount" problem as
well. My guess is that this is not due to the patch itself, but rather that
clearing the deadlock there has made the race occur with more frequency.

I'm going to keep pursuing that case for now, with the hope that I'll be able to
track down the race that's occurring here as well.


Comment 9 Jeff Layton 2007-03-27 18:14:19 UTC
This is the post I just made in bug 228292. My thinking is that that patch for
that BZ just happens to make this bug occur more frequently, but is not
responsible for it. I'll continue work on this race here. Note that I've seen
what appears to be a number of reference count related races with that
reproducer, and not all of them are alike. Still I think they may have a common
root...

I think I'm starting to see the problem and it is related to signal handling. In
the most common case that I've seen it appears that open_namei ends up returning
-ERESTARTSYS, likely due to the fact that I'm killing the process.

I think that the way this is supposed to work is that the lookup ends up calling
nfs4_do_open, and then later __dentry_open is supposed to initiate a sequence of
calls that does a nfs4_proc_file_open. That does a nfs4_close_state and would
likely clean up the i_count, etc.

Because open_namei exits with an error, however, this is not occuring and that's
causing this particular reference count leak. That said, I'm thinking this is a
different bug and one that exists with or without the workqueue patch. I'll
continue work on that problem in bug 209419.

It's not yet clear to me what the correct solution for this problem is...


Comment 10 Jeff Layton 2007-03-27 21:22:50 UTC
Replicating the post I just made to bug 228292:

----------[snip]-----------
Actually, after looking over the code, the "VFS Busy inodes" problem may be
introduced by the first patch in this set (the one to convert nfs4_close_state
to do an async call). That patch adds an igrab/iput in nfs4_get/put_open_state.

Essentially, an nfs4 open seems to be a 2 stage process. The first part is
handled by a chain like this:

filp_open
  open_namei
    __lookup_hash
      i_op->lookup (nfs_atomic_lookup)
         nfs4_atomic_open
           nfs4_do_open
              _nfs4_do_open
                   nfs4_get_open_state

nfs4_get_open_state does an igrab, and that bumps i_count by 1. _nfs4_do_open
increments the nreaders and nwriters for the state as well (as appropriate for
the flags it's passed).

The second part of the open has a call chain something like this:

filp_open
  __dentry_open
     f_op->open  (nfs_file_open)
       rpc_ops->file_open (nfs4_proc_file_open)
          nfs4_close_state
            nfs4_do_close
               nfs4_close_done
                  nfs4_put_open_state

nfs4_put_open_state does an iput and decrements the i_count. nfs4_close_done
decrements the nreaders/nwriters as appropriate as well.

I believe, the problem is that if open_namei returns an error to filp_open, then
the second call chain never happens and the i_count and nreaders/writers remains
artificially high. I think this problem also may explain bug 209419 as well.

When we send a signal to these processes (most likely, ones in the middle of an
open() call), we sometimes make open_namei return a  -ERESTARTSYS.

I still I need to do a bit of investigation to see where the ERESTARTSYS is
ultimately coming from. This seems like a sticky problem, however. It seems like
we need a hook to get in a nfs4_close_state if open_namei is going to return an
error after doing its lookup, but we don't have anything like that. We may have
to do something to force it in the dput, or find a way to make sure that we
don't return a -ERESTARTSYS at all.
----------[snip]-----------

So anyone who's seen this particular crash, could you let us know definitively
whether you had processes doing I/O on the filesystem that may have caught a
signal? That may help confirm whether my speculation about the above situation
is true. In the meantime, I'm doing more investigation to see if I can tell
where the -ERESTARTSYS is ultimately coming from.


Comment 11 Martin Flint 2007-03-28 12:31:17 UTC
it seems that no I/O was occuring in our case. The customer was runing
automounter and it was umounting a filesystem at the time. We have the following
messages prior to the panic

NFS: mount path /auto/home/oracle does not exist!
NFS: suggestion: try mounting '/' instead.
------------[ cut here ]------------
kernel BUG at fs/nfs/inode.c:1650!
invalid operand: 0000 [#1]
SMP 
Modules linked in: iptable_filter ip_tables nfsd exportfs nfs lockd md5 ipv6
parport_pc lp parport autofs4 i2c_dev i2c_core hw_random nfs_acl sunrpc
egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod
egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U)
egenera_virtual_bus(U) egenera_fs(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<f96486d8>]    Tainted: PF     VLI
EFLAGS: 00010202   (2.6.9-42.0.3.ELsmp) 
EIP is at nfs4_clear_inode+0x6b/0x90 [nfs]
eax: 00000002   ebx: e53953fc   ecx: 000001c7   edx: 00000000
esi: f38f9d80   edi: e53952bc   ebp: c032de78   esp: e7a67ee4
ds: 007b   es: 007b   ss: 0068
Process umount (pid: 20118, threadinfo=e7a67000 task=f74683b0)
Stack: e53953fc e7a67f0c 0000002c c0170d78 e53953fc c0170df5 00000000 f00f7400 
       f00f7400 c0170f5e e539569c f1433e64 e7a67000 f00f7400 00000000 f9674a60 
       c0160396 00000011 f028d400 f9674ac0 e7a67000 c0160c13 f00f7400 f9648ea2 
Call Trace:
 [<c0170d78>] clear_inode+0xcc/0x102
 [<c0170df5>] dispose_list+0x47/0x6d
 [<c0170f5e>] invalidate_inodes+0x96/0xae
 [<c0160396>] generic_shutdown_super+0x88/0x150
 [<c0160c13>] kill_anon_super+0x9/0x30
 [<f9648ea2>] nfs4_kill_super+0x16/0x50 [nfs]
 [<c016025c>] deactivate_super+0x5b/0x70
 [<c017389f>] sys_umount+0x65/0x6c
 [<c016314a>] sys_stat64+0xf/0x23
 [<c01507f5>] unmap_vma_list+0xe/0x17
 [<c0150b2f>] do_munmap+0x129/0x137
 [<c01738b1>] sys_oldumount+0xb/0xe
 [<c02d47cb>] syscall_call+0x7/0xb


Comment 12 Jeff Layton 2007-03-29 15:49:44 UTC
I wasn't so much concerned with I/O happening at the time of the umount (by
definition there wouldnt be any, or you wouldn't be able to umount). My question
was more along the lines of "did any application that was doing any I/O to this
filesystem prior to the unmount end up being killed by a signal". Unfortunately,
it may be hard to tell.

I have a reproducer of sorts for this issue, but it involves a much larger NFSv4
patch that's still in progress. Essentially I run a hefty amount of filesystem
activity via several concurrent threads and then kill off the processes doing it
and unmount. After several attempts at this I end up with a panic that's similar
to the one above. I believe the issue there is that the i_count for the inode is
 bad at the time that an iput is done and we end up trying to delete an inode
that's still being actively worked with.

There's a later patch that adds an igrab/iput in nfs4_get/put_open_state. That
seems to correct the problems of panicing in nfs4_clear_inode for me, but I
still see a "VFS: Busy inodes at umount" error, and some other panics that are
related to reference counting. So the problem is not completely fixed by that.

There seems to be a reference count leak that's occurring and I believe it's
tied up with signal handling while in the middle of an open call, but tracking
this down is tricky since it's pretty racy.


Comment 13 Jeff Layton 2007-03-29 18:48:28 UTC
Instrumented a kernel to watch for the case where may_open/open_namei returns
with an error and to print the inode address that it had been working with at
the time. I also made it dump out the inode address that had been busy at the
time. So in my reproducer, it appears that may_open returning with an error
(-ERESTARTSYS in this case) is causing the problem:

open_namei: returning -512, inode ffffff803baf4a20 left dangling
invalidate_list2[inode_in_use]: busy inode=ffffff803baf4a20 i_count=1
i_state=0x0 i_debug=0x33, i_mode=0100666 i_dentry=empty
VFS: [ffffff8000981400:10] 1 busy inodes after unmount. Self-destruct in 5
seconds.  Have a nice day...

...I've yet to come up with an elegant way to "clean up" in this situation,
however. We really need some way to call down into NFS from the VFS and clean up
the state, but there are no such hooks that are suitable for that...


Comment 14 Jeff Layton 2007-03-30 11:45:26 UTC
Created attachment 151265 [details]
patch -- increment and decrement i_count when getting and putting open state

I feel pretty confident that this patch will correct this panic. It's a
backported piece of an upstream patch that converts the NFSv4 close call to be
asynchronous:

http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=09ca6dde884139015027ea102cbb560de5836eec;hp=cedf3820493f7fefec00da3699507bf7cdfd6544


This piece simply makes sure that we increase the inode reference count when we
do a get on the open state and decrement it when we put the open state. I
believe this will prevent the panic since it should prevent the code from ever
clear_inode on the inode until the nfs4_state has been put for the last time.

Unfortunately, in the testing I've done, this seems to simply trade this
particular panic for a "VFS: Busy inodes at unmount" problem when the fs is
eventually unmounted. I think that there are still some problems with reference
counting with the nfs4_state, but this does cure the immediate problem and
seems to be the right thing to do.

Comment 16 Jeff Layton 2007-03-30 12:29:08 UTC
I'm leaving this bug to handle this patch and am going to pursue the "VFS Busy
inodes" problems in bug 234587.


Comment 17 RHEL Program Management 2007-04-12 13:28:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Jeff Layton 2007-04-12 18:59:12 UTC
*** Bug 235384 has been marked as a duplicate of this bug. ***

Comment 19 RHEL Program Management 2007-04-18 22:53:10 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 22 Don Howard 2007-07-17 19:53:39 UTC
A patch addressing this issue has been included in kernel-2.6.9-55.19.EL.

Comment 24 Jeff Layton 2007-08-10 19:50:25 UTC
*** Bug 161868 has been marked as a duplicate of this bug. ***

Comment 25 John Poelstra 2007-08-29 04:28:18 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6 Beta released on RHN (also available at partners.redhat.com).  

Requested action: Please verify that your issue is fixed to ensure that it is
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message to Issue Tracker and
I will change the status for you.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 26 John Poelstra 2007-09-05 22:23:42 UTC
A fix for this issue should have been included in the packages contained in 
the RHEL4.6-Snapshot1 on partners.redhat.com.  

Requested action: Please verify that your issue is fixed to ensure that it is 
included in this update release.

After you (Red Hat Partner) have verified that this issue has been addressed, 
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent 
symptoms of the problem you are having and change the status of the bug to 
FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test 
results to Issue Tracker.  If you need assistance accessing 
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 27 John Poelstra 2007-09-12 00:44:08 UTC
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on
partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.

Comment 28 John Poelstra 2007-09-20 04:32:06 UTC
A fix for this issue should have been included in the packages contained in the
RHEL4.6-Snapshot3 on partners.redhat.com.  

Please verify that your issue is fixed to ensure that it is included in this
update release.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to FAILS_QA.

If you cannot access bugzilla, please reply with a message about your test
results to Issue Tracker.  If you need assistance accessing
ftp://partners.redhat.com, please contact your Partner Manager.


Comment 31 errata-xmlrpc 2007-11-15 16:15:30 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.