Bug 794486

Summary: Direct mapping in auto.master are mounted instantly without accessing the files
Product: [Fedora] Fedora Reporter: Kobi <kobi.cohenarazi>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: aviro, ikent, kobi.cohenarazi, kzak, lpoetter
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-13 14:49:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Kobi 2012-02-16 22:52:38 UTC
Description of problem:
When using direct mapping in /etc/auto.master ("/-"), all mounting points from the direct map file, will be instantly mounted without even having any attempt to access the dirs/files of these mounting points.

Version-Release number of selected component (if applicable):
autofs-5.0.6-5.fc16.x86_64.rpm  

How reproducible: Always

Steps to Reproduce:
0. service autofs stop
1. have something like that in /etc/auto.master:
/- auto.direct
2. and have /etc/auto.direct file with some entries like:
/mnt/foo -rw server:/some/remote/mount/point
3. service autofs start.
4. run mount and you will see that /mnt/foo is mounted already even though no attempt to access it has been made. Check /etc/mtab.
  
Actual results:
/mnt/foo is instantly mounted. Check /etc/mtab.

Expected results:
/mnt/foo should only be mounted if it is being accessed.

Additional info:
That part was working well in FC14 (For sure). I also noticed the following:
1. Same behavior when running autofs in single user mode.
2. Would not happen in an indirect mapping. So with indirect mapping, mounting points are only mounted when there is attempt to access them.
With direct mapping, it seems like there is an automatic trigger for mounting all the entry points.
The problem is acute when direct map is a huge list (10K+ e.g.) and autofs is trying to mount everything.

Thanks for looking at it.

Comment 1 Ian Kent 2012-02-17 00:32:21 UTC
(In reply to comment #0)
> Description of problem:
> When using direct mapping in /etc/auto.master ("/-"), all mounting points from
> the direct map file, will be instantly mounted without even having any attempt
> to access the dirs/files of these mounting points.
> 
> Version-Release number of selected component (if applicable):
> autofs-5.0.6-5.fc16.x86_64.rpm  
> 
> How reproducible: Always
> 
> Steps to Reproduce:
> 0. service autofs stop
> 1. have something like that in /etc/auto.master:
> /- auto.direct
> 2. and have /etc/auto.direct file with some entries like:
> /mnt/foo -rw server:/some/remote/mount/point
> 3. service autofs start.
> 4. run mount and you will see that /mnt/foo is mounted already even though no
> attempt to access it has been made. Check /etc/mtab.

Rubbish, that does not happen.

What does happen is that the direct mount triggers are mounted
which is required (and has always been required) for direct
mounts. You can verify that this is the case by looking at
/proc/mounts on older distributions where you will see the
direct mount triggers.

It is the symlinking of /etc/mtab to /proc/mounts that now
causes them to show up when you run mount.

Despite having mentioned this problem when that change was
being done no-one seemed to understand that this was going
to be a problem and is absolutely unacceptable for people
that have direct maps that are even modestly large.

The bottom line is that this is not an autofs user space
problem.

Comment 2 Kobi 2012-02-17 00:41:23 UTC
Ian, Thanks for looking at it.

>Despite having mentioned this problem when that change was
>being done no-one seemed to understand that this was going
>to be a problem and is absolutely unacceptable for people
>that have direct maps that are even modestly large.

Just to understand - what change are you referring to?
I guess it is a change done between FC14 and FC16. (not an autofs change I assume).

Thanks,
Kobi

Comment 3 Ian Kent 2012-02-17 00:52:39 UTC
(In reply to comment #2)
> Ian, Thanks for looking at it.
> 
> >Despite having mentioned this problem when that change was
> >being done no-one seemed to understand that this was going
> >to be a problem and is absolutely unacceptable for people
> >that have direct maps that are even modestly large.
> 
> Just to understand - what change are you referring to?
> I guess it is a change done between FC14 and FC16. (not an autofs change I
> assume).

As I said, it's change that symlinked /etc/mtab to
/proc/mounts.

You should be able to work around it by removing the
symlink but that is a bit tricky because you do need
some (most) of the contents of /etc/mtab to exist and
I don't know what effect it will ahve on systemd which
uses the kernel module direct mount functionality.

Comment 4 Kobi 2012-02-17 00:58:25 UTC
Thanks Ian. Perfectly clear now.
I compared with older dist and saw the diff.

Kobi

Comment 5 Kobi 2012-02-17 01:11:14 UTC
Ian, just one additional data point.
On older dist, when mtab was not a symlink, I did not see an issue where autofs is hanging because of that long direct mapping list.

What I'm saying is that on older dist, the fact that /proc/mounts is populated with the direct mapping, and mtab is not a symlink to /proc/mounts, did not hurt performance and did not prevent other indirect mapping to be created.

So the issue is really not a long "mount" output, but autofs is hanging/choking and *unable* to load other indirect mapping when you actually need them.

In another words, on FC16, long (10K+) direct mapping entries would run autofs to fail and other important dirs would not be created.

Not sure if increasing TO would help here. I can try it if you think it makes sense.

Kobi

Comment 6 Ian Kent 2012-02-17 01:36:10 UTC
(In reply to comment #5)
> 
> What I'm saying is that on older dist, the fact that /proc/mounts is populated
> with the direct mapping, and mtab is not a symlink to /proc/mounts, did not
> hurt performance and did not prevent other indirect mapping to be created.

That is another problem with using the symlink approach and it
has no easy solution since, even if utilities are ware of say
some pseudo mount option to tell them to ignore the entries, the
entries will still be present and will still be read.

> 
> So the issue is really not a long "mount" output, but autofs is hanging/choking
> and *unable* to load other indirect mapping when you actually need them.

Yes, but I don't think it is autofs that is a problem here.
You will see an entry in /etc/sysconfig/autofs, USE_MISC_DEVICE="yes",
near the bottom of the default installed configuration. As long
as that is set to "yes" autofs will rarely, if ever, use /etc/mtab.
It will use /proc/mounts (as it always has) and will use autofs4
kernel module functionality to avoid reading mtab.

That is not to say that other applications will not scan the mtab
and, for those with large mount maps, they can easily bring the
system to a halt. This has always been a problem and is one reason
autofs file system mounts were kept out of mtab when autofs version
5 was implemented.

I don't know yet how to handle this because autofs, at this point,
does need the contents of /proc/mounts and we have other users of
the kernel module so we can't just modify the kernel to not list
autofs file system mounts.

Lets wait for a bit, perhaps others on the cc list will comment.

Comment 7 Karel Zak 2012-02-21 12:24:39 UTC
I understand the pain that the symlink to /proc/mounts causes, but it's still better than classic mtab...

We can add any hack to mount(8) to filter out the autofs mounts, but I don't think it's a proper solution. It's only a way how to hide the problem, and in many others tools like df(1) or systemd will be the mount points still visible. 

If you don't want to see that userspace reads your private autofs stuff then move the stuff to some autofs specific /proc file (e.g. /proc/automounts) and don't export it by the generic /proc files ;-)  Yes, this sounds like utopia...

How does it hurt performance? How many autofs mountpoints contains typical system with large mount maps? I'd like to test it.


BTW, you can filter out mountpoints by findmnt, for example

   findmnt -t noautofs

Comment 8 Ian Kent 2012-02-21 14:35:31 UTC
(In reply to comment #7)
> 
> How does it hurt performance? How many autofs mountpoints contains typical
> system with large mount maps? I'd like to test it.
> 

Comment #5 gives an example, to quote it:

> In another words, on FC16, long (10K+) direct mapping entries would run autofs
> to fail and other important dirs would not be created.

10K+ direct mount maps are fairly large but it isn't that
uncommon for people to have several thousand entries in
their direct maps.

Comment 9 Kobi 2012-02-21 19:03:59 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > 
> > How does it hurt performance? How many autofs mountpoints contains typical
> > system with large mount maps? I'd like to test it.
> > 
> 
> Comment #5 gives an example, to quote it:
> 
> > In another words, on FC16, long (10K+) direct mapping entries would run autofs
> > to fail and other important dirs would not be created.
> 
> 10K+ direct mount maps are fairly large but it isn't that
> uncommon for people to have several thousand entries in
> their direct maps.

Exactly. For *big* enterprise environment it is not uncommon.

What I see is that autofs will timeout after few minutes saying it failed and when I run something like that:
$ mount | wc -l
I can see increasing number of entries every few seconds. It means that things continue to slowly added to mtab.

autofs is busted at that point of course. Trying to use autofs for all kinds of stuff would not work. It just seems to hang.
When I reboot the machine, I can see many many messages like:
Unmounting /somedir/dir1
Unmounting /somedir/dir2
....

It takes a lot of time for the machine to reboot since it looks like it is Unmounting all the entries.

As I mentioned before, it happens to all of my FC16 machines. FC14 is not broken in that sense.

Kobi

Comment 10 Ian Kent 2012-02-22 00:39:45 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > 
> > > How does it hurt performance? How many autofs mountpoints contains typical
> > > system with large mount maps? I'd like to test it.
> > > 
> > 
> > Comment #5 gives an example, to quote it:
> > 
> > > In another words, on FC16, long (10K+) direct mapping entries would run autofs
> > > to fail and other important dirs would not be created.
> > 
> > 10K+ direct mount maps are fairly large but it isn't that
> > uncommon for people to have several thousand entries in
> > their direct maps.
> 
> Exactly. For *big* enterprise environment it is not uncommon.
> 
> What I see is that autofs will timeout after few minutes saying it failed and
> when I run something like that:
> $ mount | wc -l
> I can see increasing number of entries every few seconds. It means that things
> continue to slowly added to mtab.

When symlinked to /proc/mounts nothing is added to mtab, it's
a read-only view of kernel state. This just means that things
are being mounted. Perhaps you are seeing a mount storm for some
reason. What kernel revision are you using and what are the
coreutils and acl versions.

> 
> autofs is busted at that point of course. Trying to use autofs for all kinds of
> stuff would not work. It just seems to hang.
> When I reboot the machine, I can see many many messages like:
> Unmounting /somedir/dir1
> Unmounting /somedir/dir2
> ....

Which autofs does, obviously, since that's its job in life.
That hasn't changed.

The not so obvious point is that autofs is a slave to
processes walking paths that contain automount points in
that it is duty bound to send a mount request when processes
walk on mount points. There is no way to throttle it without
breaking things.

It might not be autofs that is causing the actual problem.

Any application that scans mount tables when there is a large
number of mounts will cripple the system. Even autofs itself
has suffered from that in the past to some degree.

What is particularly bad is when an application sets some sort
of notification of changes to a mount table and scans the mount
table every time it gets a notification. That's just not scalable
"at all" and will fairly quickly cripple a system.

As long as you are using the autofs miscellaneous device for
control operations autofs is essentially independent of the size
of the mtab. Since the systemd unit doesn't do anything wrt. to
the miscellaneous device it should get created when the kernel
module loads and be used by autofs without you needing to do
anything.
  
> 
> It takes a lot of time for the machine to reboot since it looks like it is
> Unmounting all the entries.
> 
> As I mentioned before, it happens to all of my FC16 machines. FC14 is not
> broken in that sense.

How about we have a look at what is happening by checking
a debug log.

In order to get a proper debug log you need to ensure that
the system facility daemon.* is being logged to syslog as
well as setting LOGGING="debug" in /etc/sysconfig/autofs.

Comment 11 Fedora End Of Life 2013-01-16 13:54:26 UTC
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 12 Fedora End Of Life 2013-02-13 14:50:03 UTC
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.