Bug 796422

Summary: autofs mounts never expire
Product: [Fedora] Fedora Reporter: Jason Tibbitts <j>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 16CC: brunoc, ikent, orion
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-13 14:50:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Script to reproduce "autofs never expire" issue
none
Excerpt from /var/log/messages for failing expiration none

Description Jason Tibbitts 2012-02-22 20:40:33 UTC
It appears that on F16 the mounts made by autofs do not expire.

I am using a configuration unchanged since the stone age; /etc/auto.master just has entries for /misc and /net, and ends with "+auto.master". nsswitch.conf has "automount:  files ldap".  In ldap is a simple nisObject configuration; no expiry times are specified.  /etc/sysconfig/autofs specifies TIMEOUT=300

Mounting things works fine, and on older releases (F14, at least) the same configuration had no problems expiring mounts.  Just dug out an F15 machine and tested it to find it appears to have the same problem.

Trying to understand what's happening, I set LOGGING="debug" and see that it does at least attempt to expire a mount:

Feb 22 13:37:58 epithumia automount[3696]: st_expire: state 1 path /home
Feb 22 13:37:58 epithumia automount[3696]: expire_proc: exp_proc = 140160062736128 path /home
Feb 22 13:37:58 epithumia automount[3696]: expire_proc_indirect: expire /home/tibbs
Feb 22 13:37:58 epithumia automount[3696]: expire_proc_indirect: expire /home/dave
Feb 22 13:37:58 epithumia automount[3696]: 2 remaining in /home
Feb 22 13:37:58 epithumia automount[3696]: expire_cleanup: got thid 140160062736128 path /home stat 5
Feb 22 13:37:58 epithumia automount[3696]: expire_cleanup: sigchld: exp 140160062736128 finished, switching from 2 to 1
Feb 22 13:37:58 epithumia automount[3696]: st_ready: st_ready(): state = 2 path /home

/home/dave is completely unreferenced; I just did df /home/dave five minutes before that message.  But nothing seems to be unmounted and I'm not sure why.  Is there a way to get any more information about why it decided not to umount that filesystem?  I suppose it is possible that something is doing a stat on it and thus keeping it active, but the entries persist even when nobody is logged in at all.  (Which still doesn't rule out the possibility, I guess.)

Comment 1 Jason Tibbitts 2012-02-22 20:42:11 UTC
Ugh, I'm a terrible reporter.  Currently running kernel 3.2.6-3.fc16.x86_64 (though this was an issue on previous F16 kernels as well) and autofs-5.0.6-5.fc16.x86_64.  Not sure if it matters, but util-linux-2.20.1-2.2.fc16.x86_64 (for mount/umount) and nfs-utils-1.2.5-4.fc16.x86_64.

Comment 2 Ian Kent 2012-02-23 02:16:46 UTC
(In reply to comment #1)
> Ugh, I'm a terrible reporter.  Currently running kernel 3.2.6-3.fc16.x86_64
> (though this was an issue on previous F16 kernels as well) and
> autofs-5.0.6-5.fc16.x86_64.  Not sure if it matters, but
> util-linux-2.20.1-2.2.fc16.x86_64 (for mount/umount) and
> nfs-utils-1.2.5-4.fc16.x86_64.

Don't know what's going on their.
I have F16 with kernel-3.2.6-3 and I don't see a problem with
expires even if I install autofs-5.0.6-5.

But atm. I have a timeout of 60 seconds, I'll try later with
the default of 300, maybe something is scanning file systems.

Comment 3 Jason Tibbitts 2012-02-23 02:47:32 UTC
Just to show it's not just one weird machine, I'm seeing this on about 120 machines running various F16 kernels, and on the 15 or so F15 machines I still have around as well, so it must be something specific about my setup instead of some unfortunate random set of circumstances.  I found someone else on IRC who appears to be having the same problem.

However, I know that at some point mounts must expire somehow, because on some machines I can see that people logged in, say, a week ago and their home directories aren't mounted even there's been no intervening reboot or autofs update.  But it certainly doesn't happen after five minutes as configured.

I'll configure the timeout down a bit and see what happens.

Comment 4 Jason Tibbitts 2012-02-23 03:12:38 UTC
Problem solved.

I set timeouts down to 30 seconds and did an experiment.  On two machines I ran "df /home/dave" to mount it, then on one machine ran
  watch grep dave /proc/mounts
and on the other
  watch df\|grep dave

On the former, /home/dave unmounts properly after about 30 seconds.  On the latter it never unmounts.  So it seems that simply running df is sufficient to mark the filesystem as "accessed" and prevent it from being unmounted.

Why is this important?  Because I have something scanning every machine on the network every two minutes, and one of the things it does is pull a list of filesystems.

Now, this system has been in place for many years now, so at some time in the not too distant past running df turned into enough of an "access" to reset the filesystem expiry.  No big deal; I'll just crank the timeout way down.  I just wonder if this was intentional.

Comment 5 Ian Kent 2012-02-23 05:35:00 UTC
(In reply to comment #4)
> Problem solved.
> 
> I set timeouts down to 30 seconds and did an experiment.  On two machines I ran
> "df /home/dave" to mount it, then on one machine ran
>   watch grep dave /proc/mounts
> and on the other
>   watch df\|grep dave
> 
> On the former, /home/dave unmounts properly after about 30 seconds.  On the
> latter it never unmounts.  So it seems that simply running df is sufficient to
> mark the filesystem as "accessed" and prevent it from being unmounted.

Yes, that has changed back to what it used to be (quite a long time
ago), from about 2.6.39. So any access will prevent the mount from
expiring. This reduces expire/mount activity quite a bit and, well,
I had some complaints about the change originally as well.

I'm reluctant to revert that change because the way it is now is I
believe the way it should be and is the way it originally was.

Perhaps a kernel module load parameter to enable use of the previous
semantic would be sufficient?

Comment 6 Jason Tibbitts 2012-02-23 07:09:03 UTC
Man, I've been running this homebrew monitoring system for a really long time.  Back in the Red Hat 7.0, 2.2 kernel days, even.  I don't recall df ever preventing autofs unmounting like that, but who knows.  It's a perfectly reasonable behavior, just unexpected and I'm certainly not going to worry about getting the old behavior back.

Comment 7 Ian Kent 2012-02-23 08:55:13 UTC
(In reply to comment #6)
> Man, I've been running this homebrew monitoring system for a really long time. 
> Back in the Red Hat 7.0, 2.2 kernel days, even.  I don't recall df ever
> preventing autofs unmounting like that, but who knows.  It's a perfectly
> reasonable behavior, just unexpected and I'm certainly not going to worry about
> getting the old behavior back.

Phew, that's a relief, thanks.

A lot has changed since 2.2, of course.
It is still puzzling though.

It's the actual traversal of a path that will update the expire
counter and I don't think df by itself will do that unless you
supply a path to it.

Comment 8 Jason Tibbitts 2012-02-24 18:06:29 UTC
Just a plain 'df' is sufficient to reset the expiry counter as far as I can tell.  If I run it more frequently than the autofs expiry time, nothing will ever unmount.

Now, maybe df is doing more than just calling statfs, but a quick strace doesn't show that.  So I guess simply calling statfs is indeed sufficient to reset the expiry counter.  I'm not really sure if that's the expected behavior, but it does seem a bit counterintuitive.

Comment 9 Ian Kent 2012-02-26 07:01:43 UTC
(In reply to comment #8)
> Just a plain 'df' is sufficient to reset the expiry counter as far as I can
> tell.  If I run it more frequently than the autofs expiry time, nothing will
> ever unmount.
> 
> Now, maybe df is doing more than just calling statfs, but a quick strace
> doesn't show that.  So I guess simply calling statfs is indeed sufficient to
> reset the expiry counter.  I'm not really sure if that's the expected behavior,
> but it does seem a bit counterintuitive.

I'll have a look at old and new code and see if I can understand
why this has changed.

Comment 10 Ian Kent 2012-02-27 04:16:40 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Just a plain 'df' is sufficient to reset the expiry counter as far as I can
> > tell.  If I run it more frequently than the autofs expiry time, nothing will
> > ever unmount.

It is and looking at as far back as 2.6.9 the expire counter would
also be updated for every path walk. But that would have changed
at about 2.6.18 to only updating the counter if the dentry was really
busy, meaning belonging to an open file or the subject of a process
working directory.

> > 
> > Now, maybe df is doing more than just calling statfs, but a quick strace
> > doesn't show that.  So I guess simply calling statfs is indeed sufficient to
> > reset the expiry counter.  I'm not really sure if that's the expected behavior,
> > but it does seem a bit counterintuitive.

It could be a combination of things.

Perhaps, somehow the symlinking of /etc/mtab to /proc/mounts is
causing this. Possibly in combination with a change that made
statfs(2) trigger automounts which it didn't do before.

When I saw the statfs(2) patch I thought it was reasonable since
a statfs(2) of an autofs file system is not really useful and you
want the info. of the file system that would be mounted. Obviously
someone had a problem like that since there was a patch posted.

Neither of the above changes were my doing so if we want to make
further changes we will need clear evidence of problems and reasons
why we want it changed, especially the mtab symlink change.

Ian

Comment 11 Ian Kent 2012-03-28 23:26:25 UTC
I did a yum update the other day and now I see I have the same
problem.

I don't see the problem at run level 3 and the fact that I did
a yum update has to mean it's something in the GUI that is
causing this.

Comment 12 Jason Tibbitts 2012-03-29 14:41:18 UTC
Somehow the component got changed from Fedora to Entitlements and all sorts of things changed with it.  Trying to get it set back properly.

Comment 13 Ian Kent 2012-03-29 14:55:28 UTC
(In reply to comment #12)
> Somehow the component got changed from Fedora to Entitlements and all sorts of
> things changed with it.  Trying to get it set back properly.

Oops, I didn't notice, but I didn't change it myself either....

Comment 14 Jason Tibbitts 2012-03-29 15:49:20 UTC
The ticket status seems good now.

In any case, I think everyone can agree that software (perhaps the desktop environment, perhaps some monitoring system) is quite justified in occasionally calling statfs to keep track of disk usage.  A reasonable frequency for this call is up for debate, of course, but if it happens to be any lower than the autofs expiry time then no mount will ever go away.

I guess it then remains for someone to decide whether there is any benefit to statfs resetting the expiry time, especially in light of the above.  Personally I don't see it, but it is certain that there are plenty of facts I'm not aware of.

Comment 15 Ian Kent 2012-03-30 00:30:40 UTC
(In reply to comment #14)
> The ticket status seems good now.
> 
> In any case, I think everyone can agree that software (perhaps the desktop
> environment, perhaps some monitoring system) is quite justified in occasionally
> calling statfs to keep track of disk usage.  A reasonable frequency for this
> call is up for debate, of course, but if it happens to be any lower than the
> autofs expiry time then no mount will ever go away.

That's the way it has been since the statfs(2) kernel change and
probably isn't unreasonable.

Although it also means that if you use the browse option and
statfs(2) a mount point path it will cause it to mount. That's
pretty much the stat(2) mount storm problem all over again.
Fortunately statfs(2) is not normally called in this way and
when it is called you probably do what to know about the mount
that is mounted since the autofs entry information is from a
pseudo file system and generally isn't useful.

OTOH many system monitoring systems probably use statfs(2) a
lot and if they run frequently and cause many mounts or prevent
mounts from being umounted that could be enough to warrant the
statfs(2) change be reverted.

> 
> I guess it then remains for someone to decide whether there is any benefit to
> statfs resetting the expiry time, especially in light of the above.  Personally
> I don't see it, but it is certain that there are plenty of facts I'm not aware
> of.

All we need is a couple of bugs with a root cause of the statfs(2)
change and I can post a revert and see who complains.

But there's something else going on here.

The testing that I've done due to comment #11 show that there
are no frequent path walks occurring and the last_used counter
doesn't get checked because the dentry looks busy before it
even gets to it.

Using a simple single indirect automount that should have a single
open file handle on it, when I do an lsof I see 4 occurrences of
the file handle. That can't be due to a thread not closing the
file handle because that particular one is opened in a thread
created "after" other three threads. Now I'm not sure why things
appear to work in run level three, I'll have to check that again.
At the moment I'm trying to go back in glibc revisions to see if
that changes anything.

This is a really weird problem.

Comment 16 brunoc 2012-11-29 19:48:12 UTC
Similar issue here (see below for detail on system info):

Autofs sysconfig has been set to TIMEOUT=4 and NEGATIVE_TIMEOUT=1 to make sure the DVD is unmounted and allows for ejection using the button on the drive shortly after reading. In some conditions, autofs does not timeout and never releases the mount. The consequence of this problem for us is that the physical eject button never ejects the disk. Restarting autofs or issuing an eject works but is not an option for us at the moment.

With Autofs 5.0.7 (compiled and installed using rpmbuild and yum on Fedora 15), problem occurs only when trying to access DVD mount folder shortly after DVD is inserted. With 5.0.5-38 or -39, occurs sporadically, even after fresh restart of autofs service.

I have attached a script testautofs to reproduce as well as a debug output of automount in testautofs.log . Note that you can see the proper expiration happening after 4s for the scenario without eject around line 69 (handle_packet: type = 6), however, no such thing for the second test with eject prior to accessing the DVD. In that case, we see the system being stuck in a loop ("expire_proc_direct: send expire to trigger /usr/BDV/Interfaces/DVD" remains without response?)

Any idea?

------------------------- INFO --------------------------------

$ cat /etc/auto.master
/-	/etc/auto.misc
+auto.master

$ cat /etc/auto.misc
/usr/BDV/Interfaces/DVD	  -fstype=iso9660,ro,nosuid,nodev	:/dev/sr0

$ cat /etc/sysconfig/autofs|grep -v "#"
TIMEOUT=4
NEGATIVE_TIMEOUT=1
BROWSE_MODE="no"
MOUNT_NFS_DEFAULT_PROTOCOL=4
LOGGING="debug"
USE_MISC_DEVICE="yes"

$ cat /etc/fedora-release 
Fedora release 15 (Lovelock)

$ automount -V

Linux automount version 5.0.7-1

Directories:
        config dir:     /etc/sysconfig
        maps dir:       /etc
        modules dir:    /usr/lib64/autofs

Compile options:
  DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_HESIOD WITH_LDAP
  WITH_SASL LIBXML2_WORKAROUND

$ ./ver_linux
Linux doris 2.6.39.1 #1 SMP PREEMPT Wed Oct 5 17:26:29 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
 
Gnu C                  4.6.1
Gnu make               3.82
binutils               2.21.51.0.6
util-linux             2.19.1
mount                  support
module-init-tools      3.16
e2fsprogs              1.41.14
xfsprogs               3.1.4
pcmciautils            017
quota-tools            4.00-pre1.
PPP                    2.4.5
Linux C Library        2.14
Dynamic linker (ldd)   2.14
Procps                 3.2.8
Net-tools              1.60
Kbd                    1.15.2
Sh-utils               8.10
wireless-tools         29
Modules Loaded         nls_utf8 ppdev parport_pc lp parport sunrpc snd_hda_codec_realtek nvidia snd_hda_intel snd_hda_codec snd_hwdep usblp snd_seq ftdi_sio snd_seq_device snd_pcm usbserial snd_timer snd i7core_edac mxser iTCO_wdt serio_raw soundcore edac_core iTCO_vendor_support blackmagic snd_page_alloc wmi i2c_i801 pcspkr microcode ipv6 usb_storage

Comment 17 brunoc 2012-11-29 19:54:10 UTC
Created attachment 654513 [details]
Script to reproduce "autofs never expire" issue

See also testautofs.log for an excerpt of /var/log/messages.

Here is the output on my system:
$ ~/testautofs.sh
---------- test without eject ---------
Closing tray...
Waiting for DVD to be recognized.case1_Anonymous

/dev/sr0 on /usr/BDV/Interfaces/DVD type iso9660 (ro,nosuid,nodev,relatime)
Restarting autofs...
Redirecting to /bin/systemctl  restart autofs.service
case1_Anonymous
Waiting for 5s (which is greater than autofs configured timeout of 4 sec)
YEAH!!!
---------- test with eject ---------
Restarting autofs...
Redirecting to /bin/systemctl  restart autofs.service
Ejecting...
Closing tray
Waiting for DVD to be recognized...........case1_Anonymous

Waiting for 5s (which is greater than autofs configured timeout of 4 sec)
/dev/sr0 on /usr/BDV/Interfaces/DVD type iso9660 (ro,nosuid,nodev,relatime)
Waiting another 5s just in case
/dev/sr0 on /usr/BDV/Interfaces/DVD type iso9660 (ro,nosuid,nodev,relatime)
sr0 should be unmounted!

Comment 18 brunoc 2012-11-29 19:55:59 UTC
Created attachment 654514 [details]
Excerpt from /var/log/messages for failing expiration

see other attachment script producing this output.

Comment 19 Fedora End Of Life 2013-01-16 13:54:29 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 20 Fedora End Of Life 2013-02-13 14:50:09 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.