Bug 490142 - statfs returns invalid results
statfs returns invalid results
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: autofs5 (Show other bugs)
4.7
All Linux
low Severity high
: rc
: ---
Assigned To: Ian Kent
BaseOS QE
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2009-03-13 10:43 EDT by Ion Badulescu
Modified: 2009-03-13 20:51 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-03-13 19:47:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Ion Badulescu 2009-03-13 10:43:46 EDT
Description of problem:
statfs can return invalid results for automounted nfs filesystems if it's the first system call that triggers the mount.

Version-Release number of selected component (if applicable):
autofs5-5.0.1-0.rc2.88.el4.1
kernel-smp-2.6.9-78.0.13.EL

How reproducible:
always

Steps to Reproduce:
1. run 'df' on an autofs5-controlled mount point for an nfs share when that share is not yet mounted. 
  
Actual results:
First run triggers the mount but returns before the mount is completed:
$ df /apps/arca
Filesystem           1K-blocks      Used Available Use% Mounted on
-                            0         0         0   -  /apps

The second run returns the correct information:
$ df /apps/arca
Filesystem           1K-blocks      Used Available Use% Mounted on
isilonx:/ifs/data/arca
                     162550008768 67976130784 94573877984  42% /apps/arca

Expected results:
With autofs v4 the first statfs used to return correct information from the first invocation. In this respect,
this bug is a regression between autofs v4 and autofs v5.

Additional info:
I'm not sure how relevant this is or whether it's a different bug... but here it goes anyway. Under certain conditions and unfortunately not very reproducably, df can also return:
df: `/apps/stonefs1-ny': Device or resource busy

Since it's not very reproducable and it happens usually to scripts running from cron, it is unclear whether the mount point is already mounted, in the process of being mounted, or in the process of being unmounted. But, once again, this never used to happen with autofs v4 and EBUSY is not an error that makes much sense in this context.

Another data point: using vanilla 2.6.23 and 2.6.28 kernels instead of the RH-supplied 2.6.9-78.0.13.EL results in the exact same behavior, so it doesn't appear to be kernel version-specific.
Comment 1 Ian Kent 2009-03-13 11:19:23 EDT
(In reply to comment #0)
> Description of problem:
> statfs can return invalid results for automounted nfs filesystems if it's the
> first system call that triggers the mount.

What is the master map entry for this autofs mount point?

snip ...

> Expected results:
> With autofs v4 the first statfs used to return correct information from the
> first invocation. In this respect,
> this bug is a regression between autofs v4 and autofs v5.

I'm not sure this is correct.
I suspect v4 will behave the same with the kernels you mention.
You should check.
Comment 2 Ian Kent 2009-03-13 11:50:22 EDT
(In reply to comment #1)
> (In reply to comment #0)
> > Description of problem:
> > statfs can return invalid results for automounted nfs filesystems if it's the
> > first system call that triggers the mount.
> 
> What is the master map entry for this autofs mount point?

And if this master map entry does not have "browse" or --ghost
options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE)
in /etc/sysconfig/autofs.
Comment 3 Ion Badulescu 2009-03-13 12:52:34 EDT
(In reply to comment #1)
> (In reply to comment #0)
> > Description of problem:
> > statfs can return invalid results for automounted nfs filesystems if it's the
> > first system call that triggers the mount.
> 
> What is the master map entry for this autofs mount point?

/apps           /etc/auto_apps  vers=3,proto=tcp

and the map entry is

arca                    isilonx.jc.tower-research.com:/ifs/data/arca

> > Expected results:
> > With autofs v4 the first statfs used to return correct information from the
> > first invocation. In this respect,
> > this bug is a regression between autofs v4 and autofs v5.
> 
> I'm not sure this is correct.
> I suspect v4 will behave the same with the kernels you mention.
> You should check.  

I did... in fact that's what we used to use and it wasn't having this issue. We switched to autofs v5 in order to get working autofs submaps, basically stuff like:

home -fstype=autofs  file:/etc/auto_home

which, with autofs v4, would not work properly if the entries are symlink/bind mounts to other automounted things. The same thing as type:=auto in amd, if you're familiar with it.

The day after the switch, a whole bunch of scripts starting reporting out of disk space errors because the df was returning all zeros.

(In reply to comment #2)
> And if this master map entry does not have "browse" or --ghost
> options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE)
> in /etc/sysconfig/autofs.  

Hmm. So here is where things get more interesting. If /etc/sysconfig/autofs5 has the default contents from the rpm, namely:

TIMEOUT=300
BROWSE_MODE="no"

then the issue does not manifest itself. If BROWSE_MODE is commented out or the sysconfig file is missing, then the issue is 100% reproducible. I'm guessing BROWSE_MODE defaults to something other than "no"?...
Comment 4 Ian Kent 2009-03-13 13:56:34 EDT
(In reply to comment #3)
> (In reply to comment #1)
> > (In reply to comment #0)
> > > Description of problem:
> > > statfs can return invalid results for automounted nfs filesystems if it's the
> > > first system call that triggers the mount.
> > 
> > What is the master map entry for this autofs mount point?
> 
> /apps           /etc/auto_apps  vers=3,proto=tcp
> 
> and the map entry is
> 
> arca                    isilonx.jc.tower-research.com:/ifs/data/arca
> 
> > > Expected results:
> > > With autofs v4 the first statfs used to return correct information from the
> > > first invocation. In this respect,
> > > this bug is a regression between autofs v4 and autofs v5.
> > 
> > I'm not sure this is correct.
> > I suspect v4 will behave the same with the kernels you mention.
> > You should check.  
> 
> I did... in fact that's what we used to use and it wasn't having this issue. We
> switched to autofs v5 in order to get working autofs submaps, basically stuff
> like:
> 
> home -fstype=autofs  file:/etc/auto_home
> 
> which, with autofs v4, would not work properly if the entries are symlink/bind
> mounts to other automounted things. The same thing as type:=auto in amd, if
> you're familiar with it.
> 
> The day after the switch, a whole bunch of scripts starting reporting out of
> disk space errors because the df was returning all zeros.
> 
> (In reply to comment #2)
> > And if this master map entry does not have "browse" or --ghost
> > options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE)
> > in /etc/sysconfig/autofs.  
> 
> Hmm. So here is where things get more interesting. If /etc/sysconfig/autofs5
> has the default contents from the rpm, namely:
> 
> TIMEOUT=300
> BROWSE_MODE="no"
> 
> then the issue does not manifest itself. If BROWSE_MODE is commented out or the
> sysconfig file is missing, then the issue is 100% reproducible. I'm guessing
> BROWSE_MODE defaults to something other than "no"?...  

Exactly so, BROWSE_MODE defaults to "yes" internally.

The uncommented entry is present to maintain compatibility with
the behaviour of v4 as well as prevent a potential performance
issue with the expire of very large indirect maps that use the
browse option.

What your seeing has been present for years and we simply don't
have a way around it. For mounts that use the browse option
there is just no way for the kernel module to know when it will
cause a mount storm by triggering a mount for system calls such
as stat(2) (and statfs(2) looks exactly the same to the kernel
module) and when it won't. For example, a colour ls of an autofs
mount point would mount every directory contained in the map. Not
good if you have several hundred entries in the map (or more).

There are two ways to work around this, the first is to not use
the browse option (obviously) and, if you do want to use browse
mounts, then adding a "/" to the end of the path should trigger
a mount when using stat(2) or statfs(2).

It's not a good situation and I've though about it many times but,
given that the VFS folks want to see even less use of lookup flags
passed down in file system callbacks, I can't see any hope for the
future.

I guess we could change the internal default but that also makes
me nervous because it may also catch people unaware, as it has
been this way since v5 was released.
Comment 5 Ion Badulescu 2009-03-13 14:23:34 EDT
(In reply to comment #4)
> (In reply to comment #3)
> > I'm guessing
> > BROWSE_MODE defaults to something other than "no"?...  
> 
> Exactly so, BROWSE_MODE defaults to "yes" internally.
> 
> The uncommented entry is present to maintain compatibility with
> the behaviour of v4 as well as prevent a potential performance
> issue with the expire of very large indirect maps that use the
> browse option.
> 
> What your seeing has been present for years and we simply don't
> have a way around it. For mounts that use the browse option
> there is just no way for the kernel module to know when it will
> cause a mount storm by triggering a mount for system calls such
> as stat(2) (and statfs(2) looks exactly the same to the kernel
> module) and when it won't. For example, a colour ls of an autofs
> mount point would mount every directory contained in the map. Not
> good if you have several hundred entries in the map (or more).

Yeah, mount storms are not good. I ended up having one caused by updatedb after this switch to v5, which was not particularly pleasant.

> There are two ways to work around this, the first is to not use
> the browse option (obviously) and, if you do want to use browse
> mounts, then adding a "/" to the end of the path should trigger
> a mount when using stat(2) or statfs(2).

Actually the ending "/" doesn't help (tried it with an strace of stat -f to avoid the other crap that df does):

statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0

and the mount doesn't get triggered.

But disabling browsing is a perfectly acceptable solution for us, so we'll use that. We just need to restart the autofs5 service everywhere... (ouch).

Any luck making the autofs mounts restartable? We talked about this a long time ago (on the am-utils mailing list) but not much came out of it.

> I guess we could change the internal default but that also makes
> me nervous because it may also catch people unaware, as it has
> been this way since v5 was released.  

No, I agree, changing established defaults is bad. Maybe just adding a more prominent warning in some visible place (the sysconfig file maybe?) that enabling browsing causes [l]stat() and statfs() to no longer trigger a mount.

I guess you can close the bug for now, since we do have a good workaround.

Thanks,
-Ion
Comment 6 Ian Kent 2009-03-13 19:47:09 EDT
(In reply to comment #5)
> 
> > There are two ways to work around this, the first is to not use
> > the browse option (obviously) and, if you do want to use browse
> > mounts, then adding a "/" to the end of the path should trigger
> > a mount when using stat(2) or statfs(2).
> 
> Actually the ending "/" doesn't help (tried it with an strace of stat -f to
> avoid the other crap that df does):
> 
> statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0,
> f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096})
> = 0
> 
> and the mount doesn't get triggered.

Mmmm ... it looked like it would.
I'll have another look at that and see if I can work out why that
didn't work how I thought it would.

> 
> But disabling browsing is a perfectly acceptable solution for us, so we'll use
> that. We just need to restart the autofs5 service everywhere... (ouch).

Be careful, that can still be a problem, if there are processes
that have an automounted directory as their pwd at the time of
the restart.

> 
> Any luck making the autofs mounts restartable? We talked about this a long time
> ago (on the am-utils mailing list) but not much came out of it.

Not sure we're talking about the same thing but if you mean
making autofs able to restart and re-connect to existing busy
mounts then, yes, but it is a huge change so I don't think we'll
see it in RHEL-4 autofs5. It will be in RHEL-5 update 4 autofs.

> 
> > I guess we could change the internal default but that also makes
> > me nervous because it may also catch people unaware, as it has
> > been this way since v5 was released.  
> 
> No, I agree, changing established defaults is bad. Maybe just adding a more
> prominent warning in some visible place (the sysconfig file maybe?) that
> enabling browsing causes [l]stat() and statfs() to no longer trigger a mount.
> 
> I guess you can close the bug for now, since we do have a good workaround.

OK.
Comment 7 Ian Kent 2009-03-13 20:51:34 EDT
(In reply to comment #6)
> (In reply to comment #5)
> > 
> > > There are two ways to work around this, the first is to not use
> > > the browse option (obviously) and, if you do want to use browse
> > > mounts, then adding a "/" to the end of the path should trigger
> > > a mount when using stat(2) or statfs(2).
> > 
> > Actually the ending "/" doesn't help (tried it with an strace of stat -f to
> > avoid the other crap that df does):
> > 
> > statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0,
> > f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096})
> > = 0
> > 
> > and the mount doesn't get triggered.
> 
> Mmmm ... it looked like it would.
> I'll have another look at that and see if I can work out why that
> didn't work how I thought it would.

Oops, I was wrong.

The flags setting in that case is internal to
fs/namei.c:__link_path_walk() so the lookup flag doesn't get passed
down after all. Sorry about that.

Note You need to log in before you can comment on or make changes to this bug.