As Al pointed out recently, if a process doing a sillyrename ends up getting issued a SIGKILL then it can end up returning back up to userspace while the RENAME operation is still going on the wire. When this happens, it will release the parent's i_mutex prematurely, and nfs_async_rename_done will call d_move without holding the it.
Holding the i_mutex is required to prevent dcache corruption. I sent a patch to Trond to fix this recently by simply unhashing the old and new dentries in this situation, and he has pushed it to Linus for 3.1. I think we'll also want this in 6.2 as well:
Author: Jeff Layton <email@example.com>
Date: Mon Jul 18 11:26:30 2011 -0400
nfs: don't use d_move in nfs_async_rename_done
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update release.
QE need to know how to reproduce and verify the problem by run some test steps, so could you point out it? thanks.
There's no reproducer that I'm aware of. This was noticed by inspection. The thing to do here is just to test that sillyrenames still work after the patch. I think the connectathon suite already tests this so making sure that it doesn't regress is probably the best you can do for this.
Patch(es) available on kernel-2.6.32-188.el6
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.