431716 – lazy umount causes pwd to fail silently

Bug 431716 - lazy umount causes pwd to fail silently

Summary: lazy umount causes pwd to fail silently

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	9
Hardware:	All
OS:	Linux
Priority:	low
Severity:	low
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	287411
TreeView+	depends on / blocked

Reported:	2008-02-06 16:57 UTC by Ian Collier
Modified:	2009-07-14 17:58 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-07-14 17:58:56 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ian Collier 2008-02-06 16:57:11 UTC

Description of problem:

We have an odd problem with automounts from failover nfs servers which is poorly
understood but I think it boils down to the following reproduceable problem: if
you lazily umount a filesystem then any processes with that filesystem still
open will see ".." linked to "." in the root directory of the mount and
therefore getcwd will no longer work.  A subsidiary bug is that getcwd pretends
to return successfully in this situation instead of flagging an error.

Version-Release number of selected component (if applicable):
Currently running with kernel-2.6.23.1-49.fc8

How reproducible:
Always

Steps to Reproduce:

Note that /auto/scratch/imc is auto-mounted over nfs in the following.

$ cd /auto/scratch/imc/foo
$ ls -al
total 6
drwxr-xr-x  3 imc support  512 Feb  6 16:12 .
drwxr-xr-x 59 imc support 3072 Feb  6 16:12 ..
drwxr-xr-x  2 imc support  512 Feb  6 16:12 bar
-rw-r--r--  1 imc support    6 Feb  6 16:12 hello
$ ls -ldi . .. ../.. /auto/scratch/imc
 290882 drwxr-xr-x  3 imc  support  512 Feb  6 16:12 .
   9148 drwxr-xr-x 59 imc  support 3072 Feb  6 16:12 ..
2644455 drwxr-xr-x  3 root root       0 Feb  6 16:12 ../..
   9148 drwxr-xr-x 59 imc  support 3072 Feb  6 16:12 /auto/scratch/imc
$ su -c 'umount -l /auto/scratch/imc'
Password: 
umount: /auto/scratch/imc: not mounted
$ /bin/pwd
foo
$ ls -ldi . .. ../.. /auto/scratch/imc
290882 drwxr-xr-x  3 imc support  512 Feb  6 16:12 .
  9148 drwxr-xr-x 59 imc support 3072 Feb  6 16:12 ..
  9148 drwxr-xr-x 59 imc support 3072 Feb  6 16:12 ../..
  9148 drwxr-xr-x 59 imc support 3072 Feb  6 16:12 /auto/scratch/imc
$ ls -al
total 6
drwxr-xr-x  3 imc support  512 Feb  6 16:12 .
drwxr-xr-x 59 imc support 3072 Feb  6 16:12 ..
drwxr-xr-x  2 imc support  512 Feb  6 16:12 bar
-rw-r--r--  1 imc support    6 Feb  6 16:12 hello
$ (cd .. && /bin/pwd)
/auto/scratch/imc
$ (cd bar && /bin/pwd)
/auto/scratch/imc/foo/bar
$ /bin/pwd
foo

Note that after the umount the ".." entry of (what was) /auto/scratch/imc points
to itself instead of to the real parent.  There may be a good reason for this,
but if so I'd be interested to find out what it is.  The /bin/pwd command claims
that "foo" is the correct name of the current directory.  If I'd been in
/auto/scratch/imc at the time of the umount then it would have claimed that ""
is the correct name of the current directory without reporting an error.

Even though the filesystem is unmounted, you can still list it and change to
directories within it (which seems to be the point of lazy unmounting). 
However, because the filesystem is automounted, doing "cd .." remounts it and
things are fine from then on.  However, any shell process which hasn't done a cd
since the umount will still be affected.

Although we don't usually use "umount -l" in real-world situations, I quite
often discover long-lived shells with no pwd so I'm assuming the auto-mounter
has tried to umount the filesystem at some point, possibly when the nfs server
became temporarily unavailable owing to a network problem or scheduled reboot.

Comment 1 Chuck Ebbert 2008-02-06 20:11:57 UTC

(In reply to comment #0)
> Note that after the umount the ".." entry of (what was) /auto/scratch/imc points
> to itself instead of to the real parent.  There may be a good reason for this,
> but if so I'd be interested to find out what it is.

The parent is gone, so you are in the root of a disconnected tree. If you are in
/ and try 'cd ..' it silently succeeds there too (and leaves you in the same
directory.)

Comment 2 Jeff Moyer 2008-02-06 20:25:34 UTC

Ian, would it be okay with you if we closed this bug as a duplicate of bug 287411?

Essentially, when restarting autofs, it will do a umount -l of busy mount
points, since there is no way to re-attach to existing mounts currently.  Ian
Kent is working on a solution to fix that problem, so then we won't rely on
umount -l for the forced restart case and I think that should resolve your problem.

Comment 3 Ian Collier 2008-02-07 12:00:38 UTC

It's in a disconnected tree, yes, but the parent is not gone - it is
/auto/scratch in the local filesystem, same as before.  However,  once you leave
the disconnected tree by changing to the parent, you can't get back - which of
course means that getcwd still wouldn't work even if the parent link were
preserved, I now realise.

But since you are in a disconnected tree which by definition doesn't have a name
within the filesystem, getcwd shouldn't succeed.  If there is a good reason why
it should succeed then I have a bug to raise against vim. ;-)

Bug 287411 does indeed look like the same bug.  (If I'd searched for "cwd"
instead of "pwd" then I'd have found it, I guess.)  Several of our workstations
updated their copies of autofs in late January which is consistent with that
being the cause of our current crop of lost cwds.

Thanks.

Comment 4 Jeff Moyer 2008-02-07 21:40:30 UTC

(In reply to comment #3)
> Bug 287411 does indeed look like the same bug.  (If I'd searched for "cwd"
> instead of "pwd" then I'd have found it, I guess.)  Several of our workstations
> updated their copies of autofs in late January which is consistent with that
> being the cause of our current crop of lost cwds.

287411 is against F-7, so I guess we should keep both bugs open.  I'll add a
dependency.

Comment 5 Ian Kent 2008-02-08 05:46:45 UTC

(In reply to comment #3)
> It's in a disconnected tree, yes, but the parent is not gone - it is
> /auto/scratch in the local filesystem, same as before.  However,  once you leave
> the disconnected tree by changing to the parent, you can't get back - which of
> course means that getcwd still wouldn't work even if the parent link were
> preserved, I now realise.

As I've finally realized "umount -l" is not appropriate for
mounts that we expect to work, as normal, until they aren't
in use any more. This is what autofs currently does for active
mounts at restart which causes the problem.

The lazy umount actually has very limited function as no path
lookups can be done within the mount point once it's lazy
umounted. But once autofs is restarted path lookups can
again be done again and a new mounts performed and this
works well. Unfortunately, anything that needs to walk
back up the mount tree to construct a path, such as
getcwd(2) and the proc filesystem /proc/<pid>/cwd, cannot
work because the point from which the path is constructed has
been detached from the mount tree. It's normal for detached
mounts to point to themselves in the kernel, after all they
are essentially waiting to go away.

> 
> But since you are in a disconnected tree which by definition doesn't have a name
> within the filesystem, getcwd shouldn't succeed.  If there is a good reason why
> it should succeed then I have a bug to raise against vim. ;-)

The issue here is with checking the return of the kernel
function d_path. I can't remember now what it returns in
this case but if it returns NULL (I think that was it)
then that should be checked by system calls and filesystems
that use it.

But that's a different issue to being able to cleanly restart
autofs in the presence of busy mounts.

The actual problem with autofs is that it can't reconnect
to existing mounts. Immediately one things of just adding
the ability to remount the autofs filesystem would solve
it, but alas, that can't work. This is because autofs direct
mounts and the implementation of "on demand mount and expire"
of nested mount trees (like those in multiple offset automount
maps) have the mount location mounted on top of them. The
upshot of this is that we can't use anything that needs
to walk the path because the VFS will skip over the autofs
mount point we need and end up at the mount above it.
For the same reason an ioctl file descriptor can't be
opened on these mounts either.

The way that I'm resolving this is to add code to 
implement a device node for the autofs4 kernel
module so it can be used to route ioctl control
commands to these mounts. Then there are the changes
to autofs itself to reconnect the mounts at startup
which is quite tricky.

> 
> Bug 287411 does indeed look like the same bug.  (If I'd searched for "cwd"
> instead of "pwd" then I'd have found it, I guess.)  Several of our workstations
> updated their copies of autofs in late January which is consistent with that
> being the cause of our current crop of lost cwds.

I'm fairly sure it's the issue your seeing.
I've been working on it for a while now and I've made a lot
of progress. I can't commit to a time frame because I don't
know what, if any, difficulties I will run into. I can say
that I have the foundation for this mostly done. That being
the kernel device node code and the interface code for autofs
to use this. The testing done so far shows it works fine but
not all functions have been exercised. Still to be done is
code in autofs to actually do the reconnect to active mounts
at startup.

Sorry about the rant but I wanted to try and give you an idea
of what is really happening and were we are at with it. Hopefully
it is, at least partly, understandable.

Ian

Comment 6 Bug Zapper 2008-11-26 09:42:54 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Bug Zapper 2009-06-09 23:30:36 UTC

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2009-07-14 17:58:56 UTC

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.