Description of problem: We have an odd problem with automounts from failover nfs servers which is poorly understood but I think it boils down to the following reproduceable problem: if you lazily umount a filesystem then any processes with that filesystem still open will see ".." linked to "." in the root directory of the mount and therefore getcwd will no longer work. A subsidiary bug is that getcwd pretends to return successfully in this situation instead of flagging an error. Version-Release number of selected component (if applicable): Currently running with kernel-2.6.23.1-49.fc8 How reproducible: Always Steps to Reproduce: Note that /auto/scratch/imc is auto-mounted over nfs in the following. $ cd /auto/scratch/imc/foo $ ls -al total 6 drwxr-xr-x 3 imc support 512 Feb 6 16:12 . drwxr-xr-x 59 imc support 3072 Feb 6 16:12 .. drwxr-xr-x 2 imc support 512 Feb 6 16:12 bar -rw-r--r-- 1 imc support 6 Feb 6 16:12 hello $ ls -ldi . .. ../.. /auto/scratch/imc 290882 drwxr-xr-x 3 imc support 512 Feb 6 16:12 . 9148 drwxr-xr-x 59 imc support 3072 Feb 6 16:12 .. 2644455 drwxr-xr-x 3 root root 0 Feb 6 16:12 ../.. 9148 drwxr-xr-x 59 imc support 3072 Feb 6 16:12 /auto/scratch/imc $ su -c 'umount -l /auto/scratch/imc' Password: umount: /auto/scratch/imc: not mounted $ /bin/pwd foo $ ls -ldi . .. ../.. /auto/scratch/imc 290882 drwxr-xr-x 3 imc support 512 Feb 6 16:12 . 9148 drwxr-xr-x 59 imc support 3072 Feb 6 16:12 .. 9148 drwxr-xr-x 59 imc support 3072 Feb 6 16:12 ../.. 9148 drwxr-xr-x 59 imc support 3072 Feb 6 16:12 /auto/scratch/imc $ ls -al total 6 drwxr-xr-x 3 imc support 512 Feb 6 16:12 . drwxr-xr-x 59 imc support 3072 Feb 6 16:12 .. drwxr-xr-x 2 imc support 512 Feb 6 16:12 bar -rw-r--r-- 1 imc support 6 Feb 6 16:12 hello $ (cd .. && /bin/pwd) /auto/scratch/imc $ (cd bar && /bin/pwd) /auto/scratch/imc/foo/bar $ /bin/pwd foo Note that after the umount the ".." entry of (what was) /auto/scratch/imc points to itself instead of to the real parent. There may be a good reason for this, but if so I'd be interested to find out what it is. The /bin/pwd command claims that "foo" is the correct name of the current directory. If I'd been in /auto/scratch/imc at the time of the umount then it would have claimed that "" is the correct name of the current directory without reporting an error. Even though the filesystem is unmounted, you can still list it and change to directories within it (which seems to be the point of lazy unmounting). However, because the filesystem is automounted, doing "cd .." remounts it and things are fine from then on. However, any shell process which hasn't done a cd since the umount will still be affected. Although we don't usually use "umount -l" in real-world situations, I quite often discover long-lived shells with no pwd so I'm assuming the auto-mounter has tried to umount the filesystem at some point, possibly when the nfs server became temporarily unavailable owing to a network problem or scheduled reboot.
(In reply to comment #0) > Note that after the umount the ".." entry of (what was) /auto/scratch/imc points > to itself instead of to the real parent. There may be a good reason for this, > but if so I'd be interested to find out what it is. The parent is gone, so you are in the root of a disconnected tree. If you are in / and try 'cd ..' it silently succeeds there too (and leaves you in the same directory.)
Ian, would it be okay with you if we closed this bug as a duplicate of bug 287411? Essentially, when restarting autofs, it will do a umount -l of busy mount points, since there is no way to re-attach to existing mounts currently. Ian Kent is working on a solution to fix that problem, so then we won't rely on umount -l for the forced restart case and I think that should resolve your problem.
It's in a disconnected tree, yes, but the parent is not gone - it is /auto/scratch in the local filesystem, same as before. However, once you leave the disconnected tree by changing to the parent, you can't get back - which of course means that getcwd still wouldn't work even if the parent link were preserved, I now realise. But since you are in a disconnected tree which by definition doesn't have a name within the filesystem, getcwd shouldn't succeed. If there is a good reason why it should succeed then I have a bug to raise against vim. ;-) Bug 287411 does indeed look like the same bug. (If I'd searched for "cwd" instead of "pwd" then I'd have found it, I guess.) Several of our workstations updated their copies of autofs in late January which is consistent with that being the cause of our current crop of lost cwds. Thanks.
(In reply to comment #3) > Bug 287411 does indeed look like the same bug. (If I'd searched for "cwd" > instead of "pwd" then I'd have found it, I guess.) Several of our workstations > updated their copies of autofs in late January which is consistent with that > being the cause of our current crop of lost cwds. 287411 is against F-7, so I guess we should keep both bugs open. I'll add a dependency.
(In reply to comment #3) > It's in a disconnected tree, yes, but the parent is not gone - it is > /auto/scratch in the local filesystem, same as before. However, once you leave > the disconnected tree by changing to the parent, you can't get back - which of > course means that getcwd still wouldn't work even if the parent link were > preserved, I now realise. As I've finally realized "umount -l" is not appropriate for mounts that we expect to work, as normal, until they aren't in use any more. This is what autofs currently does for active mounts at restart which causes the problem. The lazy umount actually has very limited function as no path lookups can be done within the mount point once it's lazy umounted. But once autofs is restarted path lookups can again be done again and a new mounts performed and this works well. Unfortunately, anything that needs to walk back up the mount tree to construct a path, such as getcwd(2) and the proc filesystem /proc/<pid>/cwd, cannot work because the point from which the path is constructed has been detached from the mount tree. It's normal for detached mounts to point to themselves in the kernel, after all they are essentially waiting to go away. > > But since you are in a disconnected tree which by definition doesn't have a name > within the filesystem, getcwd shouldn't succeed. If there is a good reason why > it should succeed then I have a bug to raise against vim. ;-) The issue here is with checking the return of the kernel function d_path. I can't remember now what it returns in this case but if it returns NULL (I think that was it) then that should be checked by system calls and filesystems that use it. But that's a different issue to being able to cleanly restart autofs in the presence of busy mounts. The actual problem with autofs is that it can't reconnect to existing mounts. Immediately one things of just adding the ability to remount the autofs filesystem would solve it, but alas, that can't work. This is because autofs direct mounts and the implementation of "on demand mount and expire" of nested mount trees (like those in multiple offset automount maps) have the mount location mounted on top of them. The upshot of this is that we can't use anything that needs to walk the path because the VFS will skip over the autofs mount point we need and end up at the mount above it. For the same reason an ioctl file descriptor can't be opened on these mounts either. The way that I'm resolving this is to add code to implement a device node for the autofs4 kernel module so it can be used to route ioctl control commands to these mounts. Then there are the changes to autofs itself to reconnect the mounts at startup which is quite tricky. > > Bug 287411 does indeed look like the same bug. (If I'd searched for "cwd" > instead of "pwd" then I'd have found it, I guess.) Several of our workstations > updated their copies of autofs in late January which is consistent with that > being the cause of our current crop of lost cwds. I'm fairly sure it's the issue your seeing. I've been working on it for a while now and I've made a lot of progress. I can't commit to a time frame because I don't know what, if any, difficulties I will run into. I can say that I have the foundation for this mostly done. That being the kernel device node code and the interface code for autofs to use this. The testing done so far shows it works fine but not all functions have been exercised. Still to be done is code in autofs to actually do the reconnect to active mounts at startup. Sorry about the rant but I wanted to try and give you an idea of what is really happening and were we are at with it. Hopefully it is, at least partly, understandable. Ian
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
This message is a reminder that Fedora 9 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 9. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '9'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 9's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 9 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.