Description of problem: Kernel hang when catting a file on an NFS share mounted with intr. The problem occurs on x86 and x86_64. Backtraces attached below. Version-Release number of selected component (if applicable): 2.4.21-20.EL (both UP and SMP) works with 2.4.21-15.EL
Created attachment 106209 [details] nfshang-x86_64-sysrq-t.txt
Created attachment 106210 [details] nfshang-x86-sysrq-t.txt
appears LLNL is also seeing a similar problem. This is reported in IT #54507.
copying in info provided by LLNL in IT #54507: -------------------------------------------------- The nfsbug.c reproducer will occaisionally reproduce the problem on our machines. We think the key is that the last operation to have the sillydeleted file open must be the completion of an asynchronous write, so the reproducer attempts to write enough data to the deleted nfs file so that the process closes the file before the writes complete. This doesn't always happen, and I've found that ctrl-c-ing out of "nfsbug" before the write loop completes can reproduce the problem more regularly. e.g.: [in nfs dir] ./nfsbug [hit ctrl-c] hang This happens for me once every four or five times `nfsbug' is run. I'll try to work on something that doesn't need to be killed in order to reproduce the problem.
Created attachment 106965 [details] reproducer
After Mark told me about his reproducer, it makes sense to me how the problem could be occurring. When an unlink is done, the dentry count is checked. If the count is > 1, some other process has the file open, so we shouldn't do an unlink. A sillyrename is done instead and the flag DCACHE_NFSFS_RENAMED is set on the dentry. When the process with that file opened is done, it closes the file and exits. Due to CTO consistency, all the dirty paged need to be synced back to the nfs server. Rpc tasks are setup to do that through nfs_file_flush() -> nfs_wb_all() -> nfs_sync_file() -> nfs_flush_list() -> nfs_flush_one(). nfs_flush_one() sets up the rpc task that calls nfs_writeback_done(). nfs_writeback_done(), when called, will call nfs_dentry_iput(), which checks the flag DCACHE_NFSFS_RENAMED, and if set, will call nfs_complete_unlink(). Now, nfs_flush_one() calls rpc_execute() on the task it just created, instead of sticking it on the schedq. I don't know much about the rpc scheduler, so if its possible for the task to be put to sleep and back on the sched queue at some point, then rpciod would presumably reschedule it later on, possibly calling nfs_complete_unlink() down the road. I haven't been able to see it yet. Al
the problem is that nfs_unlink_complete runs in rpciod as an async RPC task, but then calls wait_event. this makes rpciod sleep, which prevents any other async RPC tasks from running. the fix, i believe, is to make nfs_unlink_complete use the RPC client's sleep primitives rather than nfs_wait_event. these should do the right thing whether a sync or an async RPC task is calling nfs_unlink_complete.
Created attachment 107280 [details] unlink-deadlock patch LLNL was provided this patch from Trond via Chuck Lever
the fix idea was from trond, the patch is mine. i say that because i'm still waiting for trond to review my work and post his version (the official version) on the client.linux-nfs.org web site.
there are two ways to address this bug. 1. apply the patch i gave to LLNL, or 2. revert the fix-unlink patch. the fix-unlink patch changes "rm -rf" on NFS to wait (or hang, depending on your perspective) for open files to be closed. this is more "unix-y" in that applications can continue to use unlinked files, but it might be troublesome for folks using "rm -rf" on very large directory trees. fix #1 retains the nicer unix-like behavior, but fix #2 will make RHEL 3.0 behave more like earlier versions of Linux.
Pending Stephen Tweedie's code review and approval, I'll queue SteveD's patch for the next U5 build (best case is tomorrow night). Personally, I'm in favor of committing SteveD's new fix (as well as retaining his original U3 fix), despite the slight divergence from Upstream NFS code (which doesn't resolve the original problem fixed in Taroon U3).
A fix for this problem has just been committed to the RHEL3 U5 patch pool this evening (in kernel version 2.4.21-27.7.EL).
Changes to revert the U3 silly-delete and the U5 rpciod-hang patches have just been committed to the RHEL3 U5 patch pool this afternoon (in kernel version 2.4.21-27.9.EL).
Is the patch in attachment (id=107280) the latest version of the patch that went into the U5 kernel 2.4.21-27.9.EL, or is there a more recent patch?
Created attachment 111450 [details] change committed to -27.9.EL Jason, that's what was committed to -27.7.EL. In -27.9.EL, we reverted that change plus the U3 patch that it was trying to fix. Attached above is the patch that was committed to accomplish this.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-294.html
*** Bug 139570 has been marked as a duplicate of this bug. ***
*** Bug 145550 has been marked as a duplicate of this bug. ***