Description of problem: Kernel crash, see the log from /var/log/messages below Oct 5 04:09:56 fieldarcher kernel: ------------[ cut here ]------------ Oct 5 04:09:56 fieldarcher kernel: kernel BUG at fs/nfs/inode.c:1650! Oct 5 04:09:56 fieldarcher kernel: invalid operand: 0000 [#1] Oct 5 04:09:56 fieldarcher kernel: SMP Oct 5 04:09:56 fieldarcher kernel: Modules linked in: nfs ide_scsi nfsd exportfs lockd nfs_acl md5 ipv6 i2c_dev i2c_core sunrpc ipt _LOG ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mirror dm_mod button battery ac pcnet32 mii floppy ext3 jbd mptsc sih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Oct 5 04:09:56 fieldarcher kernel: CPU: 0 Oct 5 04:09:56 fieldarcher kernel: EIP: 0060:[<f89ac6d8>] Not tainted VLI Oct 5 04:09:56 fieldarcher kernel: EFLAGS: 00010202 (2.6.9-42.0.2.ELsmp) Oct 5 04:09:56 fieldarcher kernel: EIP is at nfs4_clear_inode+0x6b/0x90 [nfs] Oct 5 04:09:56 fieldarcher kernel: eax: 00000001 ebx: ceafbe5c ecx: 00000000 edx: ceafbec4 Oct 5 04:09:56 fieldarcher kernel: esi: f7daf980 edi: ceafbd1c ebp: c032de78 esp: f7cd7eb4 Oct 5 04:09:56 fieldarcher kernel: ds: 007b es: 007b ss: 0068 Oct 5 04:09:56 fieldarcher kernel: Process kswapd0 (pid: 41, threadinfo=f7cd7000 task=f7d076f0) Oct 5 04:09:56 fieldarcher kernel: Stack: ceafbe5c f7cd7ee8 0000000e c0170d78 ceafbe5c c0170df5 ede74164 ede7416c Oct 5 04:09:56 fieldarcher kernel: 00000054 c017117a 00000080 00000080 00000080 e23b9404 e4e8dbcc 00000000 Oct 5 04:09:56 fieldarcher kernel: 00000083 00000000 f7ffe9a0 c01711e7 c01497f8 006aef00 00000000 0000002c Oct 5 04:09:56 fieldarcher kernel: Call Trace: Oct 5 04:09:56 fieldarcher kernel: [<c0170d78>] clear_inode+0xcc/0x102 Oct 5 04:09:56 fieldarcher kernel: [<c0170df5>] dispose_list+0x47/0x6d Oct 5 04:09:56 fieldarcher kernel: [<c017117a>] prune_icache+0x193/0x1ec Oct 5 04:09:56 fieldarcher kernel: [<c01711e7>] shrink_icache_memory+0x14/0x2b Oct 5 04:09:56 fieldarcher kernel: [<c01497f8>] shrink_slab+0xf8/0x161 Oct 5 04:09:56 fieldarcher kernel: [<c014aa93>] balance_pgdat+0x1e1/0x30e Oct 5 04:09:56 fieldarcher kernel: [<c02d26b1>] schedule+0x86d/0x8db Oct 5 04:09:56 fieldarcher kernel: [<c0120420>] prepare_to_wait+0x12/0x4c Oct 5 04:09:56 fieldarcher kernel: [<c014ac8a>] kswapd+0xca/0xcc Oct 5 04:09:56 fieldarcher kernel: [<c01204f5>] autoremove_wake_function+0x0/0x2d Oct 5 04:09:56 fieldarcher kernel: [<c02d46da>] ret_from_fork+0x6/0x14 Oct 5 04:09:56 fieldarcher kernel: [<c01204f5>] autoremove_wake_function+0x0/0x2d Oct 5 04:09:56 fieldarcher kernel: [<c014abc0>] kswapd+0x0/0xcc Oct 5 04:09:56 fieldarcher kernel: [<c01041f5>] kernel_thread_helper+0x5/0xb Oct 5 04:09:56 fieldarcher kernel: Code: 00 ff b3 c4 fe ff ff 05 60 01 00 00 ff b3 c0 fe ff ff 50 68 e5 84 9c f8 68 e2 98 9c f8 e8 cf 61 77 c7 83 c4 18 8b 46 60 48 74 08 <0f> 0b 72 06 74 94 9c f8 0f b7 56 5c 89 f0 e8 cd 7c 01 00 8d 87 Oct 5 04:09:56 fieldarcher kernel: <0>Fatal exception: panic in 5 seconds Version-Release number of selected component (if applicable): How reproducible: The machine has a nfsv4 over tcp mount from a netapp filer. The kernel has crasht 2 times with the same message. When the machine is rebooted, the problem occurs after a week approximately. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I don't suppose that you have any idea what the machine was doing when it crashed these two times, do you?
The machine is running in vmware server, it runs mainly apache with php. The two crashes we not at the same time of day. We switched to nfsv3, and the server runs stable now.
Well, I wouldn't know how to approach this without more information, so I will close it. If it reoccurs and there is more information for me to use, then please reopen this bz and I will look at it again then.
I moved from nfs4 to nfs3, the problem hasn't occured again.
Thanks Rene. We've had another person report this and I'm hoping they have a reproducer, so I'm reopening this bug for that reason.
I've been chasing a not too dissimilar race in bug 228292. Now that I have a patch for that deadlock, it seems to have brought out at least one case of a crash similar to this one, and a "VFS: Busy inodes after umount" problem as well. My guess is that this is not due to the patch itself, but rather that clearing the deadlock there has made the race occur with more frequency. I'm going to keep pursuing that case for now, with the hope that I'll be able to track down the race that's occurring here as well.
This is the post I just made in bug 228292. My thinking is that that patch for that BZ just happens to make this bug occur more frequently, but is not responsible for it. I'll continue work on this race here. Note that I've seen what appears to be a number of reference count related races with that reproducer, and not all of them are alike. Still I think they may have a common root... I think I'm starting to see the problem and it is related to signal handling. In the most common case that I've seen it appears that open_namei ends up returning -ERESTARTSYS, likely due to the fact that I'm killing the process. I think that the way this is supposed to work is that the lookup ends up calling nfs4_do_open, and then later __dentry_open is supposed to initiate a sequence of calls that does a nfs4_proc_file_open. That does a nfs4_close_state and would likely clean up the i_count, etc. Because open_namei exits with an error, however, this is not occuring and that's causing this particular reference count leak. That said, I'm thinking this is a different bug and one that exists with or without the workqueue patch. I'll continue work on that problem in bug 209419. It's not yet clear to me what the correct solution for this problem is...
Replicating the post I just made to bug 228292: ----------[snip]----------- Actually, after looking over the code, the "VFS Busy inodes" problem may be introduced by the first patch in this set (the one to convert nfs4_close_state to do an async call). That patch adds an igrab/iput in nfs4_get/put_open_state. Essentially, an nfs4 open seems to be a 2 stage process. The first part is handled by a chain like this: filp_open open_namei __lookup_hash i_op->lookup (nfs_atomic_lookup) nfs4_atomic_open nfs4_do_open _nfs4_do_open nfs4_get_open_state nfs4_get_open_state does an igrab, and that bumps i_count by 1. _nfs4_do_open increments the nreaders and nwriters for the state as well (as appropriate for the flags it's passed). The second part of the open has a call chain something like this: filp_open __dentry_open f_op->open (nfs_file_open) rpc_ops->file_open (nfs4_proc_file_open) nfs4_close_state nfs4_do_close nfs4_close_done nfs4_put_open_state nfs4_put_open_state does an iput and decrements the i_count. nfs4_close_done decrements the nreaders/nwriters as appropriate as well. I believe, the problem is that if open_namei returns an error to filp_open, then the second call chain never happens and the i_count and nreaders/writers remains artificially high. I think this problem also may explain bug 209419 as well. When we send a signal to these processes (most likely, ones in the middle of an open() call), we sometimes make open_namei return a -ERESTARTSYS. I still I need to do a bit of investigation to see where the ERESTARTSYS is ultimately coming from. This seems like a sticky problem, however. It seems like we need a hook to get in a nfs4_close_state if open_namei is going to return an error after doing its lookup, but we don't have anything like that. We may have to do something to force it in the dput, or find a way to make sure that we don't return a -ERESTARTSYS at all. ----------[snip]----------- So anyone who's seen this particular crash, could you let us know definitively whether you had processes doing I/O on the filesystem that may have caught a signal? That may help confirm whether my speculation about the above situation is true. In the meantime, I'm doing more investigation to see if I can tell where the -ERESTARTSYS is ultimately coming from.
it seems that no I/O was occuring in our case. The customer was runing automounter and it was umounting a filesystem at the time. We have the following messages prior to the panic NFS: mount path /auto/home/oracle does not exist! NFS: suggestion: try mounting '/' instead. ------------[ cut here ]------------ kernel BUG at fs/nfs/inode.c:1650! invalid operand: 0000 [#1] SMP Modules linked in: iptable_filter ip_tables nfsd exportfs nfs lockd md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core hw_random nfs_acl sunrpc egenera_nmi(U) egenera_veth(U) sd_mod egenera_vscsi(U) scsi_mod egenera_vmdump(U) egenera_dumpdev(U) egenera_ipmi(U) egenera_base(U) egenera_virtual_bus(U) egenera_fs(U) dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod CPU: 0 EIP: 0060:[<f96486d8>] Tainted: PF VLI EFLAGS: 00010202 (2.6.9-42.0.3.ELsmp) EIP is at nfs4_clear_inode+0x6b/0x90 [nfs] eax: 00000002 ebx: e53953fc ecx: 000001c7 edx: 00000000 esi: f38f9d80 edi: e53952bc ebp: c032de78 esp: e7a67ee4 ds: 007b es: 007b ss: 0068 Process umount (pid: 20118, threadinfo=e7a67000 task=f74683b0) Stack: e53953fc e7a67f0c 0000002c c0170d78 e53953fc c0170df5 00000000 f00f7400 f00f7400 c0170f5e e539569c f1433e64 e7a67000 f00f7400 00000000 f9674a60 c0160396 00000011 f028d400 f9674ac0 e7a67000 c0160c13 f00f7400 f9648ea2 Call Trace: [<c0170d78>] clear_inode+0xcc/0x102 [<c0170df5>] dispose_list+0x47/0x6d [<c0170f5e>] invalidate_inodes+0x96/0xae [<c0160396>] generic_shutdown_super+0x88/0x150 [<c0160c13>] kill_anon_super+0x9/0x30 [<f9648ea2>] nfs4_kill_super+0x16/0x50 [nfs] [<c016025c>] deactivate_super+0x5b/0x70 [<c017389f>] sys_umount+0x65/0x6c [<c016314a>] sys_stat64+0xf/0x23 [<c01507f5>] unmap_vma_list+0xe/0x17 [<c0150b2f>] do_munmap+0x129/0x137 [<c01738b1>] sys_oldumount+0xb/0xe [<c02d47cb>] syscall_call+0x7/0xb
I wasn't so much concerned with I/O happening at the time of the umount (by definition there wouldnt be any, or you wouldn't be able to umount). My question was more along the lines of "did any application that was doing any I/O to this filesystem prior to the unmount end up being killed by a signal". Unfortunately, it may be hard to tell. I have a reproducer of sorts for this issue, but it involves a much larger NFSv4 patch that's still in progress. Essentially I run a hefty amount of filesystem activity via several concurrent threads and then kill off the processes doing it and unmount. After several attempts at this I end up with a panic that's similar to the one above. I believe the issue there is that the i_count for the inode is bad at the time that an iput is done and we end up trying to delete an inode that's still being actively worked with. There's a later patch that adds an igrab/iput in nfs4_get/put_open_state. That seems to correct the problems of panicing in nfs4_clear_inode for me, but I still see a "VFS: Busy inodes at umount" error, and some other panics that are related to reference counting. So the problem is not completely fixed by that. There seems to be a reference count leak that's occurring and I believe it's tied up with signal handling while in the middle of an open call, but tracking this down is tricky since it's pretty racy.
Instrumented a kernel to watch for the case where may_open/open_namei returns with an error and to print the inode address that it had been working with at the time. I also made it dump out the inode address that had been busy at the time. So in my reproducer, it appears that may_open returning with an error (-ERESTARTSYS in this case) is causing the problem: open_namei: returning -512, inode ffffff803baf4a20 left dangling invalidate_list2[inode_in_use]: busy inode=ffffff803baf4a20 i_count=1 i_state=0x0 i_debug=0x33, i_mode=0100666 i_dentry=empty VFS: [ffffff8000981400:10] 1 busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day... ...I've yet to come up with an elegant way to "clean up" in this situation, however. We really need some way to call down into NFS from the VFS and clean up the state, but there are no such hooks that are suitable for that...
Created attachment 151265 [details] patch -- increment and decrement i_count when getting and putting open state I feel pretty confident that this patch will correct this panic. It's a backported piece of an upstream patch that converts the NFSv4 close call to be asynchronous: http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=09ca6dde884139015027ea102cbb560de5836eec;hp=cedf3820493f7fefec00da3699507bf7cdfd6544 This piece simply makes sure that we increase the inode reference count when we do a get on the open state and decrement it when we put the open state. I believe this will prevent the panic since it should prevent the code from ever clear_inode on the inode until the nfs4_state has been put for the last time. Unfortunately, in the testing I've done, this seems to simply trade this particular panic for a "VFS: Busy inodes at unmount" problem when the fs is eventually unmounted. I think that there are still some problems with reference counting with the nfs4_state, but this does cure the immediate problem and seems to be the right thing to do.
I'm leaving this bug to handle this patch and am going to pursue the "VFS Busy inodes" problems in bug 234587.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
*** Bug 235384 has been marked as a duplicate of this bug. ***
This request was evaluated by Red Hat Kernel Team for inclusion in a Red Hat Enterprise Linux maintenance release, and has moved to bugzilla status POST.
A patch addressing this issue has been included in kernel-2.6.9-55.19.EL.
*** Bug 161868 has been marked as a duplicate of this bug. ***
A fix for this issue should have been included in the packages contained in the RHEL4.6 Beta released on RHN (also available at partners.redhat.com). Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message to Issue Tracker and I will change the status for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot1 on partners.redhat.com. Requested action: Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should be included in RHEL4.6-Snapshot2--available soon on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
A fix for this issue should have been included in the packages contained in the RHEL4.6-Snapshot3 on partners.redhat.com. Please verify that your issue is fixed to ensure that it is included in this update release. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to FAILS_QA. If you cannot access bugzilla, please reply with a message about your test results to Issue Tracker. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0791.html