Bug 582321
Summary: | VFS: Busy inodes after unmount issue. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Sachin Prabhu <sprabhu> | ||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.4 | CC: | bugzilla, dhoward, dhowells, iannis, ikent, jlayton, jpirko, jwest, lmcilroy, rwheeler, steved, tao, vincent, yugzhang | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
When an NFS server exported a file system with an explicit "fsid=[file_system_ID]", an NFS client mounted that file system on one mount point and a subdirectory of that file system on a separate mount point, then if the server re-exported that file system after un-exporting and unmounting it, it was possible for the NFS client to unmount those mount points and receive the following error message:
"VFS: Busy inodes after unmount..."
Additionally, it was possible to crash the NFS client's kernel in this situation.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 587225 (view as bug list) | Environment: | |||||
Last Closed: | 2011-01-13 21:27:02 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 596384 | ||||||
Attachments: |
|
Description
Sachin Prabhu
2010-04-14 16:11:31 UTC
Event posted on 04-14-2010 05:35pm BRT by fbl Hi, More interesting data from this vmcore: ... Looking at the parent crash> dentry 0xffff810515f84b70 struct dentry { d_count = { counter = 1 }, d_flags = 20, --> DCACHE_DISCONNECTED | DCACHE_UNHASHED .. d_inode = 0xffff810514f57260, ... crash> eval 0xffff810514f57260 - 0x1e0 hexadecimal: ffff810514f57080 struct nfs_inode { fileid = 0x2, fh = { size = 0x8, data = "\001\000\001\000,\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000" }, flags = 0xe, cache_validity = 0x3b, read_cache_jiffies = 0x107ee4cba, attrtimeo = 0x7530, attrtimeo_timestamp = 0x10851de52, change_attr = 0x36827f90, last_updated = 0x107ee4cbc, cache_change_attribute = 0x107ee4cb7, data_updates = { counter = 0x0 }, access_cache = { rb_node = 0xffff8103c34ca900 }, access_cache_entry_lru = { next = 0xffff8103c34ca918, prev = 0xffff8103c34ca918 }, access_cache_inode_lru = { next = 0xffff81035dce1990, prev = 0xffff810274811b50 }, acl_access = 0xfffffffffffffff5, acl_default = 0x0, cookieverf = {0x0, 0x0}, req_lock = { raw_lock = { slock = 0x1 } }, dirty = { next = 0xffff810514f571a0, prev = 0xffff810514f571a0 }, commit = { next = 0xffff810514f571b0, prev = 0xffff810514f571b0 }, nfs_page_tree = { height = 0x0, gfp_mask = 0x220, rnode = 0x0 }, ndirty = 0x0, ncommit = 0x0, npages = 0x0, open_files = { next = 0xffff810514f571e0, prev = 0xffff810514f571e0 }, silly_count = { counter = 0x1 }, silly_list = { first = 0x0 }, waitqueue = { lock = { raw_lock = { slock = 0x1 } }, task_list = { next = 0xffff810514f57208, prev = 0xffff810514f57208 } }, nfs4_acl = 0x0, open_states = { next = 0xffff810514f57220, prev = 0xffff810514f57220 }, delegation = 0x0, delegation_state = 0x0, rwsem = { activity = 0x0, wait_lock = { raw_lock = { slock = 0x1 } }, wait_list = { next = 0xffff810514f57248, prev = 0xffff810514f57248 } }, fscache = 0x0, vfs_inode = { i_hash = { next = 0x0, pprev = 0xffff81000b49a460 }, ... <all remaining data matches with previous inode dump> So, the parent inode fsid is 0x2 and it matches with the following log entry right before the crash: ... NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x2, got 0x1193a1 NFS: server fisias-lnx-vcs error: fileid changed ... and it seems to come from: /* * Many nfs protocol calls return the new file attributes after * an operation. Here we update the inode to reflect the state * of the server's inode. * * This is a bit tricky because we have to make sure all dirty pages * have been sent off to the server before calling invalidate_inode_pages. * To make sure no other process adds more write requests while we try * our best to flush them, we make them sleep during the attribute refresh. * * A very similar scenario holds for the dir cache. */ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr) fbl This event sent from IssueTracker by fbl issue 759533 Event posted on 04-14-2010 05:38pm BRT by fbl sorry, s/fsid/fileid/ in my previous post. fbl This event sent from IssueTracker by fbl issue 759533 It does look like a NFS inode here...and the root cause of the problem does appear to be the "Busy inodes after umount" issue. Here's basically what happened I think: A NFS filesystem was unmounted, but at least one of inodes in it couldn't be freed. That caused the "Busy inodes after umount" message to pop at umount time. The superblock was freed and the memory was recycled. The inodes and dentries associated with it were still sitting in the cache however, and when the VM tried to prune the dcache due to memory pressure it ended up dereferencing the now recycled superblock. The question of course is "Why weren't the inodes and dentries pruned at umount?" Flags field in the root nfs_inode is 0xe, which is: #define NFS_INO_ADVISE_RDPLUS (1) /* advise readdirplus */ #define NFS_INO_STALE (2) /* possible stale inode */ #define NFS_INO_ACL_LRU_SET (3) /* Inode is on the LRU list */ ...which indicates that it was marked as stale at some point (as expected). The i_ino is also "2", which matches these messages: NFS: server fisias-lnx-vcs error: fileid changed fsid 0:19: expected fileid 0x1193a1, got 0x2 ...so I suspect that the problem is likely related to the fact that the fileid for the root inode of the mount changed. (In reply to comment #0) > > The parent dentry which also appears to be the root dentry is as follows > > crash> dentry 0xffff810515f84b70 > struct dentry { > d_count = { > counter = 1 > }, > d_flags = 20, --> DCACHE_DISCONNECTED | DCACHE_UNHASHED > .. > d_inode = 0xffff810514f57260, > .. > d_parent = 0xffff810515f84b70, > .. > d_op = 0xffffffff88774ee0, > d_sb = 0xffff81051cc3e800, > .. > d_iname = > "\000dmap\000\000\000\000\000\000\000/lib64/libc.so.6\000\000\000\000\000\000\000" > } > > This has a valid inode which probably triggered the VFS: Busy inode message Maybe my comment will be helpful, not sure, but here goes anyway. Probably not, unless it has already been successfully umounted. The root dentry should remain valid until it is dput (dentry_iput() does the real business I think) during umount. The busy inode message usually results from dentrys within the root still being active in some way, but the child above is either well and truly "gone" or is in the process of being created. If it is really "gone" then any process that tries to use it is in error. Except for the dcache garbage collection routines (that we see here), of course. The other thing that comes to mind is that the VFS will not "see" unhashed dentrys during lookups so they should't be available to modules unless they have socked an address away somewhere and shouldn't become negative unless a reference was not taken. Is something playing funny games with dentrys here? Is VxFS trying to use an invalid dentry? > > crash> inode.i_count 0xffff810514f57260 > i_count = { > counter = 1 > }, Yes, as it should be if it hasn't finished being umounted. Ian Well...I can make these messages pop with the hypothetical scenario I came up with: export a filesystem from a server mount it on a client cd into it do some stuff (ls's and such) unexport it from server umount server filesystem export directory with same fsid stat root of share ...and I get this. At that point, I get stale NFS filehandles on the client. Unmounting it though doesn't seem to cause problems with busy inodes however. So perhaps there's another factor that contributes to this? NFS: server dantu.rdu.redhat.com error: fileid changed fsid 0:18: expected fileid 0x2, got 0x378001 If it's suspected that the hypothetical situation that I've layed out is part of the problem here. One thing they may want to consider in their clusters is using the "mp" export option which will only export a directory if the filesystem is mounted. See the exports(5) manpage for info. That might help prevent the issue from occurring while we try to determine the cause. (In reply to comment #20) > (In reply to comment #15) > > > > Hypothetical situation: > > > > The service failed over and the underlying filesystem wasn't mounted, but the > > export was set up. Because it was exported with an explicit fsid= value, it > > looks like the same filesystem but the server ended up exporting the mountpoint > > of the directory instead. > > That would mean that the VCS configuration is grossly wrong > which is unlikley. The VCS configuration will consist of > groups of resources that depend on other groups. For example > the NFS service group will depend on the file systems group > and will not be able to start until VCS is happy the filesystems > are successfully setup. On second thoughts, maybe not. IIRC, the failover activity depends on site supplied scripts. If the ones responsible for the file system mounts returned an incorrect result that could cause this. If there is a failure then the whole process stops waiting for admin intervention so that is another opportunity for out of order actions. What happens if in your test above you mount the filesystem over the top of the mount point after it is exported? Created attachment 407030 [details]
suggested change to prevent dropping a disconnected dentry
The patch seems to fix the problem shown by the reproducer. What is less clear is whether this problem is the actual problem that the customer hit. There's no way to say that it is with 100% certainty. What we can say is that the problem they hit is consistent with the problem that the reproducer shows. That's probably the best we can hope for with this. At this point, I'm just waiting for Al to push the patch upstream. Once that's done, we can propose it for RHEL5. I also need to see whether the problem is reproducible on rawhide/rhel6 as well. I'm unable to reproduce this so far on rawhide kernels. This commit seems like it's probably papering over the bug: commit 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486 Author: Trond Myklebust <Trond.Myklebust> Date: Thu Mar 6 12:34:59 2008 -0500 NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings ...that said, it seems a little fragile to depend on this. The code could change at some point or there may be some way to avoid this check in the current code (exclusive create?). Either way, I think Al's proposed patch is reasonable. I'll add my ACK on Al's patch. in kernel-2.6.18-199.el5 You can download this test kernel from http://people.redhat.com/jwilson/el5 Please update the appropriate value in the Verified field (cf_verified) to indicate this fix has been successfully verified. Include a comment with verification details. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When an NFS server exported a file system with an explicit "fsid=[file_system_ID]", an NFS client mounted that file system on one mount point and a subdirectory of that file system on a separate mount point, then if the server re-exported that file system after un-exporting and unmounting it, it was possible for the NFS client to unmount those mount points and receive the following error message: "VFS: Busy inodes after unmount..." Additionally, it was possible to crash the NFS client's kernel in this situation. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0017.html |