Bug 1272078
Summary: | Subdirectory of nfs mount can be removed even if is also a mountpoint itself | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> |
Component: | kernel | Assignee: | nfs-maint |
kernel sub component: | NFS | QA Contact: | Filesystem QE <fs-qe> |
Status: | CLOSED WONTFIX | Docs Contact: | |
Severity: | unspecified | ||
Priority: | unspecified | CC: | bcodding, bfields, ikent, smayhew |
Version: | 7.1 | Keywords: | Reproducer |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-11-17 21:09:35 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Frank Sorenson
2015-10-15 12:39:10 UTC
Note that the stale directory will successfully unmount: # cd / # umount /mnt2 # egrep -w '/mnt[12]' /proc/mounts vm17:/export /mnt1 nfs4 rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.80,local_lock=none,addr=192.168.122.85 0 0 Note even if we fix this a user can probably still get into the same situation by removing subdir from the server or from another client (or possibly even from this client if the mount with -onosharecache). As long as we can still succesfully umount this doesn't sound so bad. Is this causing you a problem in practice? Maybe I can be convinced this is a bug, but if so I wonder whether it's really a priority. (In reply to Frank Sorenson from comment #0) > Description of problem: > > A subdirectory of an nfs mount can be removed, even if the subdirectory > itself is also mounted. This results in ESTALE errors for anything > accessing the mountpoint and /proc/mounts shows the mount with ' (deleted)' > appended. > > > Version-Release number of selected component (if applicable): > > RHEL 7.1 kernel version 3.10.0-229.11.1.el7.x86_64 > > How reproducible: > > see reproducer > > Steps to Reproduce: > > on nfs server: > mkdir -p /export/subdir > echo "/export/ *(rw,no_root_squash)" >> /etc/exports > exportfs -arv > > on nfs client: > mkdir /mnt1 /mnt2 > mount server:/export /mnt1 > mount server:/export/subdir /mnt2 > cd /mnt2 > df . > > rmdir /mnt1/subdir > df . > > > Actual results: > > subdirectory (/mnt1/subdir) is removed, mountpoint (/mnt2) unavailable > (ESTALE), and /proc/mounts appends ' (deleted)' to mount string in > /proc/mounts ('/mnt2\040(deleted)') Oops, I did misunderstand what you were describing. Now I see that, on the client, the dentry of /mnt1/subdir is not a mount point, /mnt2 is, so d_unlinked() will return true and " (deleted)" will be added to the path following an rmdir. For the same reason d_mountpoint() in vfs_rmdir() will return false and the rmdir will not fail because of it. So, as Bruce points out, there are a number of ways this could happen and it become an exercise in deciding how the case should be handled since it can't be avoided. > > # cd /mnt2 > > # df . > Filesystem 1K-blocks Used Available Use% Mounted on > vm17:/export/subdir 26767616 19196416 6204672 76% /mnt2 > > # rmdir /mnt1/subdir > > # ls -l /mnt1 > total 0 > > # df . > df: ‘.’: Stale file handle > > # egrep -w '/mnt[12]' /proc/mounts > shell-init: error retrieving current directory: getcwd: cannot access parent > directories: No such file or directory > vm17:/export /mnt1 nfs4 > rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp, > port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.80,local_lock=none, > addr=192.168.122.85 0 0 > vm17:/export/subdir /mnt2\040(deleted) nfs4 > rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp, > port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.80,local_lock=none, > addr=192.168.122.85 0 0 > > > Expected results: > > rmdir /mnt1/subdir returning EBUSY It would be good if we could detect this case during the rmdir but I'm not sure how to do it. Ian (In reply to Frank Sorenson from comment #1) > Note that the stale directory will successfully unmount: > > # cd / > > # umount /mnt2 > > # egrep -w '/mnt[12]' /proc/mounts > vm17:/export /mnt1 nfs4 > rw,relatime,vers=4.0,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp, > port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.80,local_lock=none, > addr=192.168.122.85 0 0 Sure, and it's probably a good thing that people can clean up the mess on the client after doing something on the server that has unfortunate side effects. I think that's due to the ESTALE handling patch series from Jeff Layton which essentially avoids revalidation of the last path component in order to avoid umount hangs against servers that aren't responding. This case might not have been considered at the time and is, I think, a bonus. Bruce Fields from comment #3) > Note even if we fix this a user can probably still get into the same > situation by removing subdir from the server or from another client (or > possibly even from this client if the mount with -onosharecache). As long > as we can still succesfully umount this doesn't sound so bad. Understood and agreed. As far as unmounting, the umount program included in RHEL 7 will succeed, while the umount in RHEL 6 will fail (without attempting the syscall) due to strings not matching between /etc/mtab, /proc/mounts, and the command-line argument. In both RHEL 6 and RHEL 7, the umount() syscall itself will succeed. > Is this causing you a problem in practice?] We have one strategic customer who has opened 2 cases now, with 16 months or so between occurrences, so it's not an issue we're seeing regularly. Since the customer can't reproduce the issue, and since it's so infrequent, the condition in this bugzilla is the best and closest root cause we can provide (really, the first potential explanation for what they're seeing). It seems to match, however, we can't confirm it's the exact issue they're seeing. We do periodically have other cases raised where the directory in /proc/mounts contains ' (deleted)' appended, but I don't think we've come up with an explanation for those either (since umount in RHEL 6 and earlier won't unmount these, it's required a reboot to clear, which customers are obviously not thrilled about). With this in mind, it's certainly understandable that this would be a lower-priority issue to fix, if indeed it can/should be fixed. > Maybe I can be convinced this is a bug, but if so I wonder whether it's > really a priority. If this isn't a bug (or at least isn't a bug we want to tackle--perhaps WONTFIX over NOTABUG ?), I suppose we'll work to document the issue and possible explanations for future cases, and I'll probably open a bz for RHEL 6 to fix unmounting these directories without requiring a reboot. Closing this out. The frequency of occurrence of this is very low, and there are multiple ways the underlying issue could occur (this one probably being one of the less-common). There are dependencies on external factors (other nfs clients, for example), so this may ultimately be unfixable. |