Bug 445056

Summary:

autofs mounts NFS exported filesystems and leaves them unmountable

Product:

[Fedora] Fedora

Reporter:

Michal Jaegermann <michal>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

low

Docs Contact:

Priority:

low

Version:

CC:

ikent, jmoyer

Target Milestone:

---

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-07-14 17:59:03 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
debugging output from level 3	none
debug output for a run from a desktop (level 5)	none
the real debug output for a run from a desktop (level 5)	none

Description Michal Jaegermann 2008-05-03 01:20:54 UTC

Description of problem:

A server on a local net exports file systems like this
(as printed by 'showmount -e ...'):

/           192.168.23.0/255.255.255.0
/var        192.168.23.0/255.255.255.0
/opt        192.168.23.0/255.255.255.0
/usr        192.168.23.0/255.255.255.0
/boot       192.168.23.0/255.255.255.0
/home       192.168.23.0/255.255.255.0
/home/spare 192.168.23.0/255.255.255.0

A reference like 'ls /net/$sever' mounts ALL these file systems
but attempts to umount that end up with "device is busy" for
/net/$server/home and /net/$server (the later clearly because
the first is still mounted).

The condition persists even with autofs stopped although doing
all these mounts and unmounts directly, i.e. without autofs in
the picture, does not present any problems.  Neither lsof nor
fuser show anything using those file system and strace on
/sbin/umount.nfs shows only that umount() system calls fails
with EBUSY so this is not much help.

After starting autofs with '-v' flag, and after an experiment
like the above, one gets only spammed in /var/log/messages like that:

May  2 17:58:49 dyna0 automount[3824]: 2 remaining in /net
May  2 17:59:04 dyna0 automount[3824]: 2 remaining in /net
May  2 17:59:19 dyna0 automount[3824]: 2 remaining in /net
May  2 17:59:34 dyna0 automount[3824]: 2 remaining in /net
May  2 17:59:49 dyna0 automount[3824]: 2 remaining in /net
May  2 18:00:04 dyna0 automount[3824]: 2 remaining in /net
May  2 18:00:19 dyna0 automount[3824]: 2 remaining in /net
May  2 18:00:34 dyna0 automount[3824]: 2 remaining in /net

until autofs is stopped.  Probably because a timeout is given
and it expired.

Hm, maybe the following would give a clue:

automount[3824]: do_mount_direct: can't stat direct mount trigger
/net/$sever/home/spare
automount[3824]: send_fail:610: AUTOFS_IOC_FAIL: error Bad file descriptor

but does not seem to show up on other occassions so it may be
coincidental.

Taking a machine to a single user mode does these unmounts and
it appears the only way to do that.  Somewhat too radical.

The above happens not only on level 5 but on a level 3 as well.

Version-Release number of selected component (if applicable):
autofs-5.0.3-11

How reproducible:
on every try

Comment 1 Ian Kent 2008-05-05 08:05:44 UTC

(In reply to comment #0)
> A server on a local net exports file systems like this
> (as printed by 'showmount -e ...'):
> 
> /           192.168.23.0/255.255.255.0
> /var        192.168.23.0/255.255.255.0
> /opt        192.168.23.0/255.255.255.0
> /usr        192.168.23.0/255.255.255.0
> /boot       192.168.23.0/255.255.255.0
> /home       192.168.23.0/255.255.255.0
> /home/spare 192.168.23.0/255.255.255.0

What does the /etc/exports on the server have in it?

Comment 2 Michal Jaegermann 2008-05-05 15:24:06 UTC

> What does the /etc/exports on the server have in it?

The same export lines, even in the same order, only with
(rw,no_root_squash,sync) options added in every case.

Comment 3 Ian Kent 2008-05-06 02:39:43 UTC

(In reply to comment #2)
> > What does the /etc/exports on the server have in it?
> 
> The same export lines, even in the same order, only with
> (rw,no_root_squash,sync) options added in every case.

OK, so no fsid option anywhere?
Let's have a full debug log please.
See http://people.redhat.com/jmoyer if you need any information
about what's needed to capture all the output.

Comment 4 Michal Jaegermann 2008-05-06 05:31:17 UTC

Created attachment 304597 [details]
debugging output from level 3

I updated to autofs-5.0.3-13, as I was asked in bug 445060, and the
picture changed somewhat.

If I am running with a console only then mount points eventually
expire and file systems are unmounted.	If I will try before
that 'umount -a -t nfs' then I can still get "device is busy" if
I did that "too early" but eventually mount points will expire
and everything will get unmounted

Comment 5 Michal Jaegermann 2008-05-06 05:43:34 UTC

Created attachment 304598 [details]
debug output for a run from a desktop (level 5)

That situation is very different for a run from a desktop.
The same command which caused only 3 mounts from a console resulted
in 7 mounts from a desktop.  They apparently never expire as
something repeatedly attempts to create /net/.Trash (or maybe something
else is "refreshing" that???) and we loop.

If I will try, as root,  'umount -a -t nfs' then, even if '-f'
is added to that command, a response is:

umount2: Device or resource busy
umount.nfs: /net/zeno/home: device is busy
umount2: Device or resource busy
umount.nfs: /net/zeno: device is busy

For a short while I am down to "2 remaining in /net" but in a short
order this goes up to 4 and it does not look like it is ever going
to go below that on its own.

Comment 6 Michal Jaegermann 2008-05-06 05:48:55 UTC

Created attachment 304599 [details]
the real debug output for a run from a desktop (level 5)

Oops! It looks like that I attached twice the same log.  Hopefuly
this is better

Comment 7 Jeff Moyer 2008-05-06 18:10:09 UTC

(In reply to comment #0)
> /home       192.168.23.0/255.255.255.0
> /home/spare 192.168.23.0/255.255.255.0
> 
> A reference like 'ls /net/$sever' mounts ALL these file systems
> but attempts to umount that end up with "device is busy" for
> /net/$server/home and /net/$server (the later clearly because
> the first is still mounted).
> 
> The condition persists even with autofs stopped although doing
> all these mounts and unmounts directly, i.e. without autofs in
> the picture, does not present any problems.  Neither lsof nor
> fuser show anything using those file system and strace on
> /sbin/umount.nfs shows only that umount() system calls fails
> with EBUSY so this is not much help.
> 
> After starting autofs with '-v' flag, and after an experiment
> like the above, one gets only spammed in /var/log/messages like that:
> 

> Hm, maybe the following would give a clue:
> 
> automount[3824]: do_mount_direct: can't stat direct mount trigger
> /net/$sever/home/spare
> automount[3824]: send_fail:610: AUTOFS_IOC_FAIL: error Bad file descriptor

Hmm, Ian, this looks a bit like the issues we're chasing with the expiration of
submounts, does it not?

> but does not seem to show up on other occassions so it may be
> coincidental.

> Version-Release number of selected component (if applicable):
> autofs-5.0.3-11

Kernel version?

(In reply to comment #4)
> Created an attachment (id=304597) [edit]
> debugging output from level 3
> 
> I updated to autofs-5.0.3-13, as I was asked in bug 445060, and the
> picture changed somewhat.
> 
> If I am running with a console only then mount points eventually
> expire and file systems are unmounted.	If I will try before
> that 'umount -a -t nfs' then I can still get "device is busy" if
> I did that "too early" but eventually mount points will expire
> and everything will get unmounted

This isn't really support with autofs.  You're not supposed to unmount things
out from under it.  If you need to get things unmounted, you can send SIGUSR1 to
the daemon.

Comment 8 Michal Jaegermann 2008-05-06 19:13:46 UTC

> > automount[3824]: send_fail:610: AUTOFS_IOC_FAIL: error Bad file descriptor

> Hmm, Ian, this looks a bit like the issues we're chasing with the expiration of
> submounts, does it not?

That possibly too but in principle this looks now as a duplicate of
bug 445060.  That "send_fail" showed here apparently by an accident.
Few things got conflated confusing me quite a bit in the process.

> Kernel version?
2.6.25-14.fc9.x86_64

Comment 9 Ian Kent 2008-05-07 01:08:14 UTC

(In reply to comment #4)
> Created an attachment (id=304597) [edit]
> debugging output from level 3

I'm a bit confused.
This looks like a log of a well behaved session.
The point of obtaining a debug log is to capture a log
of the problem happening.

Comment 10 Michal Jaegermann 2008-05-07 02:25:37 UTC

> debugging output from level 3
....
> This looks like a log of a well behaved session.

In a comment to that log, and with autofs-5.0.3-13 running on
a console only, I wrote: "If I am running with a console only then
mount points eventually expire and file systems are unmounted".
So, yes, in this case this is indeed a well behaved session.
OTOH that one from attachment (id=304599) is not.

I still had problems on a console with autofs-5.0.3-11 but I was
asked to update that.

Comment 11 Ian Kent 2008-05-07 08:43:02 UTC

(In reply to comment #10)
> > debugging output from level 3
> ....
> > This looks like a log of a well behaved session.
> 
> In a comment to that log, and with autofs-5.0.3-13 running on
> a console only, I wrote: "If I am running with a console only then
> mount points eventually expire and file systems are unmounted".
> So, yes, in this case this is indeed a well behaved session.
> OTOH that one from attachment (id=304599) [edit] is not.

Right, that attachment is much better, thanks.
Hopefully we'll get some help from the Gnome folks for the
bug that Jeff has logged.

I saw a couple of instances in the log where autofs mounted a
file system, and then mounted it again within the same second.
For example, /net/zeno/home/spare. So this may well be an
instance of another bug were trying to resolve. What is different
about this example is that autofs isn't reporting that the
file system is already mounted, which is likely leading to
those bad file descriptor messages.

So, although this is a pain for you, it may be helpful to us.

Ian

Comment 12 Ian Kent 2008-05-08 07:24:39 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > > debugging output from level 3
> > ....
> > > This looks like a log of a well behaved session.
> > 
> > In a comment to that log, and with autofs-5.0.3-13 running on
> > a console only, I wrote: "If I am running with a console only then
> > mount points eventually expire and file systems are unmounted".
> > So, yes, in this case this is indeed a well behaved session.
> > OTOH that one from attachment (id=304599) [edit] [edit] is not.
> 
> Right, that attachment is much better, thanks.
> Hopefully we'll get some help from the Gnome folks for the
> bug that Jeff has logged.
> 
> I saw a couple of instances in the log where autofs mounted a
> file system, and then mounted it again within the same second.
> For example, /net/zeno/home/spare. So this may well be an
> instance of another bug were trying to resolve. What is different
> about this example is that autofs isn't reporting that the
> file system is already mounted, which is likely leading to
> those bad file descriptor messages.

Just to let you know what's happening.

I've installed a current Rawhide and I'm seeing the aggressive
scanning that your seeing. At first I didn't but when I added
the root directory of the server as an export the gvfsd thingy
went crazy at expire, probably trying to stat .Trash* in each
umounted directory. I don't yet understand why it tries to
access /net/.Trash, as this directory isn't being umounted.
Perhaps it's due to a directory within it being removed as
the mounts expire. I haven't seen the multiple mounting that
showed in your log yet but it's early days. This might take
a while to resolve and ultimately may need to be addressed
mostly in the Gnome area.

Ian

Comment 13 Michal Jaegermann 2008-05-08 15:27:16 UTC

> This might take a while to resolve and ultimately may
> need to be addressed mostly in the Gnome area.

At this moment killing gvfsd-trash in a session startup
looks like a possible workaround.

Comment 14 Ian Kent 2008-05-08 18:10:13 UTC

(In reply to comment #13)
> > This might take a while to resolve and ultimately may
> > need to be addressed mostly in the Gnome area.
> 
> At this moment killing gvfsd-trash in a session startup
> looks like a possible workaround.

Yeah, and I'll work on the file handles getting corrupted.
That shouldn't be happening. Once it happens, and there's
an actual mount present, there's no way to ask it to expire.

Ian

Comment 15 Bug Zapper 2008-05-14 10:34:08 UTC

Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 16 Bug Zapper 2009-06-10 00:35:28 UTC

This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '9'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 9's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 9 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 17 Bug Zapper 2009-07-14 17:59:03 UTC

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.