Description of problem: statfs can return invalid results for automounted nfs filesystems if it's the first system call that triggers the mount. Version-Release number of selected component (if applicable): autofs5-5.0.1-0.rc2.88.el4.1 kernel-smp-2.6.9-78.0.13.EL How reproducible: always Steps to Reproduce: 1. run 'df' on an autofs5-controlled mount point for an nfs share when that share is not yet mounted. Actual results: First run triggers the mount but returns before the mount is completed: $ df /apps/arca Filesystem 1K-blocks Used Available Use% Mounted on - 0 0 0 - /apps The second run returns the correct information: $ df /apps/arca Filesystem 1K-blocks Used Available Use% Mounted on isilonx:/ifs/data/arca 162550008768 67976130784 94573877984 42% /apps/arca Expected results: With autofs v4 the first statfs used to return correct information from the first invocation. In this respect, this bug is a regression between autofs v4 and autofs v5. Additional info: I'm not sure how relevant this is or whether it's a different bug... but here it goes anyway. Under certain conditions and unfortunately not very reproducably, df can also return: df: `/apps/stonefs1-ny': Device or resource busy Since it's not very reproducable and it happens usually to scripts running from cron, it is unclear whether the mount point is already mounted, in the process of being mounted, or in the process of being unmounted. But, once again, this never used to happen with autofs v4 and EBUSY is not an error that makes much sense in this context. Another data point: using vanilla 2.6.23 and 2.6.28 kernels instead of the RH-supplied 2.6.9-78.0.13.EL results in the exact same behavior, so it doesn't appear to be kernel version-specific.
(In reply to comment #0) > Description of problem: > statfs can return invalid results for automounted nfs filesystems if it's the > first system call that triggers the mount. What is the master map entry for this autofs mount point? snip ... > Expected results: > With autofs v4 the first statfs used to return correct information from the > first invocation. In this respect, > this bug is a regression between autofs v4 and autofs v5. I'm not sure this is correct. I suspect v4 will behave the same with the kernels you mention. You should check.
(In reply to comment #1) > (In reply to comment #0) > > Description of problem: > > statfs can return invalid results for automounted nfs filesystems if it's the > > first system call that triggers the mount. > > What is the master map entry for this autofs mount point? And if this master map entry does not have "browse" or --ghost options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE) in /etc/sysconfig/autofs.
(In reply to comment #1) > (In reply to comment #0) > > Description of problem: > > statfs can return invalid results for automounted nfs filesystems if it's the > > first system call that triggers the mount. > > What is the master map entry for this autofs mount point? /apps /etc/auto_apps vers=3,proto=tcp and the map entry is arca isilonx.jc.tower-research.com:/ifs/data/arca > > Expected results: > > With autofs v4 the first statfs used to return correct information from the > > first invocation. In this respect, > > this bug is a regression between autofs v4 and autofs v5. > > I'm not sure this is correct. > I suspect v4 will behave the same with the kernels you mention. > You should check. I did... in fact that's what we used to use and it wasn't having this issue. We switched to autofs v5 in order to get working autofs submaps, basically stuff like: home -fstype=autofs file:/etc/auto_home which, with autofs v4, would not work properly if the entries are symlink/bind mounts to other automounted things. The same thing as type:=auto in amd, if you're familiar with it. The day after the switch, a whole bunch of scripts starting reporting out of disk space errors because the df was returning all zeros. (In reply to comment #2) > And if this master map entry does not have "browse" or --ghost > options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE) > in /etc/sysconfig/autofs. Hmm. So here is where things get more interesting. If /etc/sysconfig/autofs5 has the default contents from the rpm, namely: TIMEOUT=300 BROWSE_MODE="no" then the issue does not manifest itself. If BROWSE_MODE is commented out or the sysconfig file is missing, then the issue is 100% reproducible. I'm guessing BROWSE_MODE defaults to something other than "no"?...
(In reply to comment #3) > (In reply to comment #1) > > (In reply to comment #0) > > > Description of problem: > > > statfs can return invalid results for automounted nfs filesystems if it's the > > > first system call that triggers the mount. > > > > What is the master map entry for this autofs mount point? > > /apps /etc/auto_apps vers=3,proto=tcp > > and the map entry is > > arca isilonx.jc.tower-research.com:/ifs/data/arca > > > > Expected results: > > > With autofs v4 the first statfs used to return correct information from the > > > first invocation. In this respect, > > > this bug is a regression between autofs v4 and autofs v5. > > > > I'm not sure this is correct. > > I suspect v4 will behave the same with the kernels you mention. > > You should check. > > I did... in fact that's what we used to use and it wasn't having this issue. We > switched to autofs v5 in order to get working autofs submaps, basically stuff > like: > > home -fstype=autofs file:/etc/auto_home > > which, with autofs v4, would not work properly if the entries are symlink/bind > mounts to other automounted things. The same thing as type:=auto in amd, if > you're familiar with it. > > The day after the switch, a whole bunch of scripts starting reporting out of > disk space errors because the df was returning all zeros. > > (In reply to comment #2) > > And if this master map entry does not have "browse" or --ghost > > options what is the setting of BROWSE_MODE (or DEFAULT_BROWSE_MODE) > > in /etc/sysconfig/autofs. > > Hmm. So here is where things get more interesting. If /etc/sysconfig/autofs5 > has the default contents from the rpm, namely: > > TIMEOUT=300 > BROWSE_MODE="no" > > then the issue does not manifest itself. If BROWSE_MODE is commented out or the > sysconfig file is missing, then the issue is 100% reproducible. I'm guessing > BROWSE_MODE defaults to something other than "no"?... Exactly so, BROWSE_MODE defaults to "yes" internally. The uncommented entry is present to maintain compatibility with the behaviour of v4 as well as prevent a potential performance issue with the expire of very large indirect maps that use the browse option. What your seeing has been present for years and we simply don't have a way around it. For mounts that use the browse option there is just no way for the kernel module to know when it will cause a mount storm by triggering a mount for system calls such as stat(2) (and statfs(2) looks exactly the same to the kernel module) and when it won't. For example, a colour ls of an autofs mount point would mount every directory contained in the map. Not good if you have several hundred entries in the map (or more). There are two ways to work around this, the first is to not use the browse option (obviously) and, if you do want to use browse mounts, then adding a "/" to the end of the path should trigger a mount when using stat(2) or statfs(2). It's not a good situation and I've though about it many times but, given that the VFS folks want to see even less use of lookup flags passed down in file system callbacks, I can't see any hope for the future. I guess we could change the internal default but that also makes me nervous because it may also catch people unaware, as it has been this way since v5 was released.
(In reply to comment #4) > (In reply to comment #3) > > I'm guessing > > BROWSE_MODE defaults to something other than "no"?... > > Exactly so, BROWSE_MODE defaults to "yes" internally. > > The uncommented entry is present to maintain compatibility with > the behaviour of v4 as well as prevent a potential performance > issue with the expire of very large indirect maps that use the > browse option. > > What your seeing has been present for years and we simply don't > have a way around it. For mounts that use the browse option > there is just no way for the kernel module to know when it will > cause a mount storm by triggering a mount for system calls such > as stat(2) (and statfs(2) looks exactly the same to the kernel > module) and when it won't. For example, a colour ls of an autofs > mount point would mount every directory contained in the map. Not > good if you have several hundred entries in the map (or more). Yeah, mount storms are not good. I ended up having one caused by updatedb after this switch to v5, which was not particularly pleasant. > There are two ways to work around this, the first is to not use > the browse option (obviously) and, if you do want to use browse > mounts, then adding a "/" to the end of the path should trigger > a mount when using stat(2) or statfs(2). Actually the ending "/" doesn't help (tried it with an strace of stat -f to avoid the other crap that df does): statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0, f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0 and the mount doesn't get triggered. But disabling browsing is a perfectly acceptable solution for us, so we'll use that. We just need to restart the autofs5 service everywhere... (ouch). Any luck making the autofs mounts restartable? We talked about this a long time ago (on the am-utils mailing list) but not much came out of it. > I guess we could change the internal default but that also makes > me nervous because it may also catch people unaware, as it has > been this way since v5 was released. No, I agree, changing established defaults is bad. Maybe just adding a more prominent warning in some visible place (the sysconfig file maybe?) that enabling browsing causes [l]stat() and statfs() to no longer trigger a mount. I guess you can close the bug for now, since we do have a good workaround. Thanks, -Ion
(In reply to comment #5) > > > There are two ways to work around this, the first is to not use > > the browse option (obviously) and, if you do want to use browse > > mounts, then adding a "/" to the end of the path should trigger > > a mount when using stat(2) or statfs(2). > > Actually the ending "/" doesn't help (tried it with an strace of stat -f to > avoid the other crap that df does): > > statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0, > f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) > = 0 > > and the mount doesn't get triggered. Mmmm ... it looked like it would. I'll have another look at that and see if I can work out why that didn't work how I thought it would. > > But disabling browsing is a perfectly acceptable solution for us, so we'll use > that. We just need to restart the autofs5 service everywhere... (ouch). Be careful, that can still be a problem, if there are processes that have an automounted directory as their pwd at the time of the restart. > > Any luck making the autofs mounts restartable? We talked about this a long time > ago (on the am-utils mailing list) but not much came out of it. Not sure we're talking about the same thing but if you mean making autofs able to restart and re-connect to existing busy mounts then, yes, but it is a huge change so I don't think we'll see it in RHEL-4 autofs5. It will be in RHEL-5 update 4 autofs. > > > I guess we could change the internal default but that also makes > > me nervous because it may also catch people unaware, as it has > > been this way since v5 was released. > > No, I agree, changing established defaults is bad. Maybe just adding a more > prominent warning in some visible place (the sysconfig file maybe?) that > enabling browsing causes [l]stat() and statfs() to no longer trigger a mount. > > I guess you can close the bug for now, since we do have a good workaround. OK.
(In reply to comment #6) > (In reply to comment #5) > > > > > There are two ways to work around this, the first is to not use > > > the browse option (obviously) and, if you do want to use browse > > > mounts, then adding a "/" to the end of the path should trigger > > > a mount when using stat(2) or statfs(2). > > > > Actually the ending "/" doesn't help (tried it with an strace of stat -f to > > avoid the other crap that df does): > > > > statfs("/apps/arca/", {f_type=0x187, f_bsize=4096, f_blocks=0, f_bfree=0, > > f_bavail=0, f_files=0, f_ffree=0, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) > > = 0 > > > > and the mount doesn't get triggered. > > Mmmm ... it looked like it would. > I'll have another look at that and see if I can work out why that > didn't work how I thought it would. Oops, I was wrong. The flags setting in that case is internal to fs/namei.c:__link_path_walk() so the lookup flag doesn't get passed down after all. Sorry about that.