1489542 – Behavior change in autofs expiry timer when a path walk is done following commit from BZ 1413523

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1489542 - Behavior change in autofs expiry timer when a path walk is done following commit from BZ 1413523

Summary: Behavior change in autofs expiry timer when a path walk is done following com...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	7.4
Hardware:	All
OS:	Linux
Priority:	urgent
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ian Kent
QA Contact:	xiaoli feng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1420851 1469559 1525994 1535760
TreeView+	depends on / blocked

Reported:	2017-09-07 16:29 UTC by smazul
Modified:	2021-08-30 12:36 UTC (History)
CC List:	17 users (show)
Fixed In Version:	kernel-3.10.0-822.el7
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1525994 1535760 (view as bug list)
Environment:
Last Closed:	2018-04-10 22:00:45 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Patch - revert: take more care to not update last_used on path walk (2.23 KB, patch) 2017-09-19 23:47 UTC, Ian Kent	no flags	Details \| Diff
Trivial testcase showing the regression and fix. (1.09 KB, application/x-shellscript) 2017-10-26 14:09 UTC, Dave Wysochanski	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	3225981	None	None	None	2017-10-26 22:19:48 UTC
Red Hat Knowledge Base (Solution)	3262391	None	None	None	2017-12-05 19:32:24 UTC
Red Hat Product Errata	RHSA-2018:1062	None	None	None	2018-04-10 22:02:19 UTC

Description smazul 2017-09-07 16:29:40 UTC

Description of problem:
Autofs will unmount within the specified timeout regardless of queries/stats of the mount point. Prior to RHEL7.4 if a path was traversed this would renew the timeout value.

Additionally, the hope for this BZ is to gain additional details on the new behavior and address some of the concerns in case 01909702

Version-Release number of selected component (if applicable):
Kernel: 3.10.0-693.1.1.el7.x86_64

nfs-utils-1.3.0-0.48.el7.x86_64 Tue Aug 15 11:59:31 2017
rpcbind-0.2.0-42.el7.x86_64 Tue Aug 15 11:55:40 2017
nfs4-acl-tools-0.3.3-15.el7.x86_64 Tue Aug 15 12:03:08 2017

autofs-5.0.7-69.el7.x86_64 Tue Aug 15 12:01:39 2017
libsss_autofs-1.15.2-50.el7.x86_64 Tue Aug 15 11:58:45 2017

How reproducible:
Always, with 'ls' and 'df' commands additionally occurs with 'noac' mount option.

Steps to Reproduce:
[1] Fresh install of RHEL7.4
[2] Installed generic versions of autofs, nfs and rpcbind packages
[3] Create autofs mount (ideally with some low timeout for testing):

EX:
# grep nfs /etc/auto.master
/- /etc/auto.nfs --timeout=60

# grep nfs /etc/auto.nfs
/nfs -noac 10.13.153.156:/mnt/export1/

[4] Bump automount debugging all the way up

# grep OPTION /etc/sysconfig/autofs
OPTIONS="--debug"

[5] Start autofs service

[6] cd to automount

[7] exit automount

[8] Start some query of the mount point:

EX:
# while true; do df /nfs/ >/dev/null; sleep 10; done

[9] Allow some time (2-3x timeout period) to pass then kill bash command

[10] Check the /var/log/messages, observation that automount is unmounted and remounted several times during the test (typically equal to the number of timeout periods)

Actual results:
The mount point has a unmount event once the timeout hits, even though the network traffic and debug show that the mount point is being traversed.

Expected results:
The desire it to return the prior RHEL7.3 behavior in which path traversal results in a timeout refresh.

Additional info:
The new behavior seems to have been implemented in the below BZ and commit:

Buzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1413523
Upstream commit: linux.git 092a53452b
Depends: https://bugzilla.redhat.com/show_bug.cgi?id=1247935

Specifically related to sfdc case 01909702, customer usage and concerns are as followed:

In our case, we have thousands of clients, and thousands of automounts configured via LDAP. There are a few main areas of concern for me:

1) Caching. This applies to all of the below concerns, in that wouldn't unmount of the NFS file system cause the cache to be invalidated, and wouldn't this cause performance issues during each of the use cases I am about to describe?

2) /home directories. All of our /home are automounts. For regular uses, this one is likely safe due to held open files including set working directories that would prevent the user's /home from being unmounted. The expiration logic probably doesn't trigger in this case, because it cannot be unmounted. However, for build systems that may have their working directory set to another location, they may still access content from /home like .ssh/id_rsa or .bashrc, and these would trigger the automounts. These automoutns would be re-established at the automount expiration interval.

3) Tools warehouse. We have a number of NFS paths that store versions of tools. These tools are accessed without setting a current working directory or holding any file open for an extended period of time. Think of tools like particular versions of "gcc" or "make", but imagine thousands of these. Thousands of machines access these NFS paths, and these automounts will be re-established at the automount expiration interval.

4) Source code. These are NFS project spaces that build systems such as Bamboo, Jenkins, Team City, or other systems we have may access this content, without setting a current working directory or holding any file open for an extended period of time.

5) Build output. These are NFS project spaces that build systems such as Bamboo, Jenkins, Team City, or other systems we have may write build output to, often for sharing with other users or build systems, without setting a current working directory or holding any file open for an extended period of time.

6) Application data. Our application servers frequently use NFS for scalable storage. In many cases, I have been strongly encouraged the use of fixed mounts, as I have found automounts to be unnecessarily brittle in the past. However, the default configuration for application owners in our organization who may not be aware of this best practice, still use automounts. Think of applications like JIRA or Confluence, that store attachments on NFS. The attachments are accessed only for the duration required to read the file, and then they are closed. These automounts will be re-established at the automount expiration interval.

The reason we use automounts in the above cases, is that with the exception of 6), it can be much more overhead to publish fixed mounts on dozens or more machines that may need the resource. It is much easier to allow autofs to discover which machines which need which mounts. Any NFS mounts that might be used by a dozen or more machines, benefit from autofs. Any NFS mounts used by just one or two machines, can use fixed mounts instead. For our build systems, which have concerns 1) through 5) above, we might have 200 or more build machines, that have 20 or more automounts, each being re-established every 10 minutes. 200 x 20 = 4,000 mounts per 10 minutes. And this is just one of the scenarios.

I understand the dilemma. There are UI systems that have file dialogs or file system monitoring systems that automatically discover new content and scan these file systems just because they exist. But, there are also legitimate use cases such as I describe above, that we would like to update "last_used". Ian Kent documented this dilemma here:

Comment 3 Ian Kent 2017-09-08 08:38:43 UTC

(In reply to smazul from comment #0)
> Description of problem:
> Autofs will unmount within the specified timeout regardless of queries/stats
> of the mount point. Prior to RHEL7.4 if a path was traversed this would
> renew the timeout value. 
> 
> Additionally, the hope for this BZ is to gain additional details on the new
> behavior and address some of the concerns in case 01909702

Reading through the discussion below, I must say I'm not
unsympathetic to the adverse effects of this change.

So that means there are two things to discuss, first the
reasons for the change and secondly what should be done
about it.

snip ....

I don't think I need a reproducer, it's very likely this
behaviour change is due to the fix for the original upstream
regression regarding the last_used update.

Specifically, as mentioned in the case:
- [fs] autofs: take more care to not update last_used on path walk (Ian Kent) [1413523]

> 
> Actual results:
> The mount point has a unmount event once the timeout hits, even though the
> network traffic and debug show that the mount point is being traversed.
> 
> Expected results:
> The desire it to return the prior RHEL7.3 behavior in which path traversal
> results in a timeout refresh.

That's almost how autofs is supposed to work.

A long time ago autofs adopted the strategy of only preventing
expiry of mounts that are really, really in use which meant that
the last_used was mostly not updated on path walks specifically
to avoid user space utilities, monitoring systems etc. from
preventing mount expiry.

But, on each expire event the last_used field is updated if the
mount is really in use, which means at least one process has a
working directory in the file system or there are open files
in the file system.

But due to upstream changes, probably by me, that regressed.

In recent times this became a significant problem for me, to
the extent that virtually all mounts that were *not* in use
were being instantly re-mounted after being expired.

This is what lead me to discover the regression (at least
a regression from my POV anyway).

> 
> 
> Additional info:
> The new behavior seems to have been implemented in the below BZ and commit:
> 
> Buzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1413523

Yes, this is the one.

> Upstream commit: linux.git 092a53452b
> Depends: https://bugzilla.redhat.com/show_bug.cgi?id=1247935

This dependency was needed for a later patch series related to
namespace handling (related to undesirable behaviours when using
containers) so doesn't really affect whether the problem patch
is reverted or not.

It's pretty much an isolated change so reverting it it should
not have side effects.

> 
> Specifically related to sfdc case 01909702, customer usage and concerns are
> as followed:
> 
> In our case, we have thousands of clients, and thousands of automounts
> configured via LDAP. There are a few main areas of concern for me:
> 
> 1) Caching. This applies to all of the below concerns, in that wouldn't
> unmount of the NFS file system cause the cache to be invalidated, and
> wouldn't this cause performance issues during each of the use cases I am
> about to describe?

The NFS attribute cache, or buffer cache, or CacheFS cache?

I think all of these would be invalidated.

I don't think invalidating the attribute cache would be a
significant problem.

The buffer cache, perhaps, the CacheFS cache more so.

> 
> 2) /home directories. All of our /home are automounts. For regular uses,
> this one is likely safe due to held open files including set working
> directories that would prevent the user's /home from being unmounted. The
> expiration logic probably doesn't trigger in this case, because it cannot be
> unmounted. However, for build systems that may have their working directory
> set to another location, they may still access content from /home like
> .ssh/id_rsa or .bashrc, and these would trigger the automounts. These
> automoutns would be re-established at the automount expiration interval.

That's right, working directories or open files will cause the
last_used field to be updated on expire events.

I'm not sure this is a terribly big issue as the umount to mount
turnover shouldn't be that significant.

Is it actually a big problem in your environment?

> 
> 3) Tools warehouse. We have a number of NFS paths that store versions of
> tools. These tools are accessed without setting a current working directory
> or holding any file open for an extended period of time. Think of tools like
> particular versions of "gcc" or "make", but imagine thousands of these.
> Thousands of machines access these NFS paths, and these automounts will be
> re-established at the automount expiration interval.

Sure, common and sensible use of automounting.

This would be a significant problem in large environments like
yours, point well taken.

> 
> 4) Source code. These are NFS project spaces that build systems such as
> Bamboo, Jenkins, Team City, or other systems we have may access this
> content, without setting a current working directory or holding any file
> open for an extended period of time.
> 
> 5) Build output. These are NFS project spaces that build systems such as
> Bamboo, Jenkins, Team City, or other systems we have may write build output
> to, often for sharing with other users or build systems, without setting a
> current working directory or holding any file open for an extended period of
> time.

Same as point 3) I think, also point well taken.

> 
> 6) Application data. Our application servers frequently use NFS for scalable
> storage. In many cases, I have been strongly encouraged the use of fixed
> mounts, as I have found automounts to be unnecessarily brittle in the past.
> However, the default configuration for application owners in our
> organization who may not be aware of this best practice, still use
> automounts. Think of applications like JIRA or Confluence, that store
> attachments on NFS. The attachments are accessed only for the duration
> required to read the file, and then they are closed. These automounts will
> be re-established at the automount expiration interval.
> 
> The reason we use automounts in the above cases, is that with the exception
> of 6), it can be much more overhead to publish fixed mounts on dozens or
> more machines that may need the resource. It is much easier to allow autofs
> to discover which machines which need which mounts. Any NFS mounts that
> might be used by a dozen or more machines, benefit from autofs. Any NFS
> mounts used by just one or two machines, can use fixed mounts instead. For
> our build systems, which have concerns 1) through 5) above, we might have
> 200 or more build machines, that have 20 or more automounts, each being
> re-established every 10 minutes. 200 x 20 = 4,000 mounts per 10 minutes. And
> this is just one of the scenarios.

Sure, I understand this.

Other use cases of these types are render farms and Geophysics
data processing sites. These types of environments tend to be
smaller but the problem is the same as yours.

Are the access patterns for the point 6) case that bad in terms
of umount to mount turnover?

> 
> I understand the dilemma. There are UI systems that have file dialogs or
> file system monitoring systems that automatically discover new content and
> scan these file systems just because they exist. But, there are also
> legitimate use cases such as I describe above, that we would like to update
> "last_used". Ian Kent documented this dilemma here:

Seems my quote is missing, ;)

But you are correct, I didn't consider very large environments
in my anger at the user space changes that caused me so much
recent pain, all I can do is apologize and try and work out
some way to improve the large use problem.

In my defence, what I thought I was doing was resolving a
regression to a behaviour that I established some years ago
that I thought was the right way to do things, when autofs
version 5 was being developed.

What makes this problem hard is not deciding whether to revert
this particular change, I can do that if we agree it needs to
be done for RHEL-7 (and it sounds like we do need too).

The deeper problem is more long term in that it will just come
back again in RHEL-8 if I can't find a smarter way to do this
than just not updating the last_used field in certain cases, as
is done now.

After all, if I revert this change upstream I'm pretty sure
I'll get a bunch of bugs when RHEL-8 is released about mounts
never expiring which could easily affect a quite large number
of customers with large and small environmentss.

Ian

Comment 4 Ian Kent 2017-09-08 09:45:34 UTC

For the purposes of the immediate next steps for this bug we
need to verify what has caused the change in behaviour.

I will check through all the patches in the series of bug
1413523 and review any that touched the last_used field.

Once I have done that I'll produce a test kernel with reverted
changes to check if the change in behaviour has in fact been
reverted.

Once this is done we can focus our attention on what should
be done to fix the problem.
 
Ian

Comment 7 Ian Kent 2017-09-11 23:43:42 UTC

I think I have made the bug public.

If the customer still can't see the comments we can add a suitable
email address to the bug cc list which should do the trick.

Comment 8 Mark Mielke 2017-09-14 03:50:35 UTC

One of the things I was looking into when I discovered this concern in RHEL 7.4, was high LDAP query volumes, many of which were coming from autofs. When the automount needs to be established this not only has the overhead of performing the NFS protocol auto-negotiation (NFS v3 or NFS v4? ...) on the autofs side, and then on the Linux side, but prior to this it requires LDAP queries, and prior to this, it often needs to establish a TLS session with LDAP. All of this means that even if the underlying "mount" operation is fast, the end-to-end process here can take 2 or more seconds to complete. I don't want to get distracted by these details, as the details here are not fully relevant. But, just to provide some context in terms of what the cost of automount expiration could be in a worst case scenario.

> > 1) Caching. This applies to all of the below concerns, in that wouldn't
> > unmount of the NFS file system cause the cache to be invalidated, and
> > wouldn't this cause performance issues during each of the use cases I am
> > about to describe?
> > The NFS attribute cache, or buffer cache, or CacheFS cache?
>
> I think all of these would be invalidated.
>
> I don't think invalidating the attribute cache would be a
> significant problem.
>
> The buffer cache, perhaps, the CacheFS cache more so.

I am mostly thinking about the buffer cache.

When users or build services perform builds in NFS, this includes:

1) Establish the source workspace. This may be thousands to millions of source files or binary files. In some cases this can be incremental, however, it is a fairly common practice for the small to medium workspaces to be clean extracts to ensure safe builds.

2) Build the source. This requires reading back many of these source files a little while later (often later than the 5 or so seconds that the NFS attribute cache is most effective for). Some files like .h files may be repeatedly read back.

3) Publish the build output. This requires reading back the generated files a little while later (often later than the 5 or so seconds that the NFS attribute cache is most effective for).

I believe you are correct that the attribute cache is not really applicable here, as it already expires frequently. The buffer cache, however, allows the most commonly used objects to be stored in RAM, and made accessible after a relatively short attribute check to confirm that the buffer cache is still valid.

Similarly, for tools warehouse scenarios - there may be a large number of files required to publish a cross compiler tool chain to NFS, and these may be called repeatedly. While they remain in buffer cache, there is still the attribute query overhead but not the file read overhead. If the mounts expire frequently, even when the mount is in regular use, then these would need to be re-read into buffer cache.

We don't use CacheFS today. I have wondered if we should. I think this would be a separate consideration and outside the scope of this report.

> I'm not sure this is a terribly big issue as the umount to mount
> turnover shouldn't be that significant.

For /home, I don't think it is much of an issue. But, for the "time" question, I want to add some context that my concern isn't really for the "time to mount", but the amount of extra overhead on the thousands of NFS clients, and dozens of NFS servers, that this issue potentially results in.

> Are the access patterns for the point 6) case that bad in terms
> of umount to mount turnover?

For point 6) specifically, I only mentioned it because I wanted to be complete about our use cases. I don't think point 6) would have that much impact in real life, as I expect the applications likely hold at least one file open (perhaps just a lock file!), that would prevent unmount, and cause last_used to get updated.

> But you are correct, I didn't consider very large environments
> in my anger at the user space changes that caused me so much
> recent pain, all I can do is apologize and try and work out
> some way to improve the large use problem.
>
> In my defence, what I thought I was doing was resolving a
> regression to a behaviour that I established some years ago
> that I thought was the right way to do things, when autofs
> version 5 was being developed.

Understood.

> The deeper problem is more long term in that it will just come
> back again in RHEL-8 if I can't find a smarter way to do this
> than just not updating the last_used field in certain cases, as
> is done now.
>
> After all, if I revert this change upstream I'm pretty sure
> I'll get a bunch of bugs when RHEL-8 is released about mounts
> never expiring which could easily affect a quite large number
> of customers with large and small environmentss.

Yep, agree.

I think some aspect of this comes down to:

1) How to define "use"?
2) What abilities exist to detected "use"?

Having read about the issue you were trying to solve, I have a new appreciation for 1).

Having recently reviewed some of the autofs and kernel code for the first time, 
I have an appreciation for 2).

I think there are at least three scenarios here to consider:

A) Applications which are overly aggressive about monitoring all file systems, even file systems which should not be their concern.
B) Applications which are correctly aggressive about monitoring file systems, that should be their concern.
C) Applications which are actively using the file systems to perform a significant amount of work in the file system.

In RHEL 7.2 and RHEL 7.3, "use" seems to include at least:

- File handle open.
- Working directory set.
- Automount path traversed at least once.

In RHEL 7.4, "use" seems to be reduced to:

- File handle open.
- Working directory set.

I think C) from above is pretty easy to detect. I think repeated and persistent use of the file system easily differentiates C). In the "build" and "tools warehouse" cases, we would probably traverse the path thousands or millions of times during a 10 minute interval. This makes me think that you could choose a compromise in here:

- File handle open.
- Working directory set.
- Automount path traversed at least N times.

I wonder if you were to keep a "last_used" and a "last_used_count", would it be sufficient to confirm that "last_used_count > 100" or "last_used_count > 1000", and this would result in a decision that was often correct in terms of identifying passive monitoring use vs persistent active use?

As the expiration interval could have a large range depending upon use case, perhaps it would make sense to make N be proportional to the expiration interval? For example:

- File handle open.
- Working directory set.
- Automount path traversed at least 10 times per minute.

Back to the three scenarios:

A) Applications which are overly aggressive about monitoring all file systems, even file systems which should not be their concern.
B) Applications which are correctly aggressive about monitoring file systems, that should be their concern.
C) Applications which are actively using the file systems to perform a significant amount of work in the file system.

I think something like the above would help to more accurately identify C). Then, I am thinking that B) may not be important to distinguish from A) in real life, as the overhead should be low, and the use case may not be valid.

Anyways - just some ideas to continue the conversation.

Thanks for looking into this, Ian. It is much appreciated.

Comment 9 Ian Kent 2017-09-14 06:19:09 UTC

(In reply to Mark Mielke from comment #8)
> One of the things I was looking into when I discovered this concern in RHEL
> 7.4, was high LDAP query volumes, many of which were coming from autofs.
> When the automount needs to be established this not only has the overhead of
> performing the NFS protocol auto-negotiation (NFS v3 or NFS v4? ...) on the
> autofs side, and then on the Linux side, but prior to this it requires LDAP
> queries, and prior to this, it often needs to establish a TLS session with
> LDAP. All of this means that even if the underlying "mount" operation is
> fast, the end-to-end process here can take 2 or more seconds to complete. I
> don't want to get distracted by these details, as the details here are not
> fully relevant. But, just to provide some context in terms of what the cost
> of automount expiration could be in a worst case scenario.

Mmm ... yes, I'm aware of that.

This behaviour came about due to complaints about autofs maps
needing to be always up to date without the need to issue a HUP
signal (so it only applies to indirect mount maps). For remote
autofs maps a query needs to be done on "every" lookup to try
and work out if the map has changed. It isn't a problem for file
maps as the modified date is easily checked.

It does introduce additional overhead I would rather avoid.

I have considered doing something like the periodic server
monitoring of am-utils amd automounter instead of on every
mount lookup but that would make the proximity/availability
calculation unreliable so it's not high on the priorities
list. Still it might be adequate and would reduce the traffic
somewhat.

> 
> > > 1) Caching. This applies to all of the below concerns, in that wouldn't
> > > unmount of the NFS file system cause the cache to be invalidated, and
> > > wouldn't this cause performance issues during each of the use cases I am
> > > about to describe?
> > > The NFS attribute cache, or buffer cache, or CacheFS cache?
> >
> > I think all of these would be invalidated.
> >
> > I don't think invalidating the attribute cache would be a
> > significant problem.
> >
> > The buffer cache, perhaps, the CacheFS cache more so.
> 
> I am mostly thinking about the buffer cache.
> 
> When users or build services perform builds in NFS, this includes:
> 
> 1) Establish the source workspace. This may be thousands to millions of
> source files or binary files. In some cases this can be incremental,
> however, it is a fairly common practice for the small to medium workspaces
> to be clean extracts to ensure safe builds.
> 
> 2) Build the source. This requires reading back many of these source files a
> little while later (often later than the 5 or so seconds that the NFS
> attribute cache is most effective for). Some files like .h files may be
> repeatedly read back.
> 
> 3) Publish the build output. This requires reading back the generated files
> a little while later (often later than the 5 or so seconds that the NFS
> attribute cache is most effective for).
> 
> I believe you are correct that the attribute cache is not really applicable
> here, as it already expires frequently. The buffer cache, however, allows
> the most commonly used objects to be stored in RAM, and made accessible
> after a relatively short attribute check to confirm that the buffer cache is
> still valid.

Yes, the NFS attributes are included in most NFS RPC replies
so they are always being updated regardless of mounted status.

> 
> Similarly, for tools warehouse scenarios - there may be a large number of
> files required to publish a cross compiler tool chain to NFS, and these may
> be called repeatedly. While they remain in buffer cache, there is still the
> attribute query overhead but not the file read overhead. If the mounts
> expire frequently, even when the mount is in regular use, then these would
> need to be re-read into buffer cache.

Indeed, the buffer cache can make a big difference to IO.
TBH I don't know much about the buffer cache other than the
obvious significant effect it can have on reducing IOs. 

> 
> We don't use CacheFS today. I have wondered if we should. I think this would
> be a separate consideration and outside the scope of this report.

I also don't have much information about CacheFS but I don't
need to as the maintainer is part of our file systems group.

Maybe worth you following up with a discovery query at some
point as we do have specialist knowledge here at RedHat. 

> 
> > I'm not sure this is a terribly big issue as the umount to mount
> > turnover shouldn't be that significant.
> 
> For /home, I don't think it is much of an issue. But, for the "time"
> question, I want to add some context that my concern isn't really for the
> "time to mount", but the amount of extra overhead on the thousands of NFS
> clients, and dozens of NFS servers, that this issue potentially results in.
> 
> > Are the access patterns for the point 6) case that bad in terms
> > of umount to mount turnover?
> 
> For point 6) specifically, I only mentioned it because I wanted to be
> complete about our use cases. I don't think point 6) would have that much
> impact in real life, as I expect the applications likely hold at least one
> file open (perhaps just a lock file!), that would prevent unmount, and cause
> last_used to get updated.

Right, it sounds like I'm sufficiently up with this usage pattern.

> 
> > But you are correct, I didn't consider very large environments
> > in my anger at the user space changes that caused me so much
> > recent pain, all I can do is apologize and try and work out
> > some way to improve the large use problem.
> >
> > In my defence, what I thought I was doing was resolving a
> > regression to a behaviour that I established some years ago
> > that I thought was the right way to do things, when autofs
> > version 5 was being developed.
> 
> Understood.
> 
> > The deeper problem is more long term in that it will just come
> > back again in RHEL-8 if I can't find a smarter way to do this
> > than just not updating the last_used field in certain cases, as
> > is done now.
> >
> > After all, if I revert this change upstream I'm pretty sure
> > I'll get a bunch of bugs when RHEL-8 is released about mounts
> > never expiring which could easily affect a quite large number
> > of customers with large and small environmentss.
> 
> Yep, agree.
> 
> I think some aspect of this comes down to:
> 
> 1) How to define "use"?
> 2) What abilities exist to detected "use"?

I think it's time to modify the policy of "in use" that guides
changes like this.

It is hard for me to keep in mind large site use cases as I've
moved out of the data centre and into development and support
of autofs many years ago now.

Nevertheless the larger the environment the more benefit autofs
can be and, while a number of autofs uses don't quite get the
large site case, it is and should be the target use case IMHO.

> 
> Having read about the issue you were trying to solve, I have a new
> appreciation for 1).
> 
> Having recently reviewed some of the autofs and kernel code for the first
> time, 
> I have an appreciation for 2).
> 
> I think there are at least three scenarios here to consider:
> 
> A) Applications which are overly aggressive about monitoring all file
> systems, even file systems which should not be their concern.
> B) Applications which are correctly aggressive about monitoring file
> systems, that should be their concern.
> C) Applications which are actively using the file systems to perform a
> significant amount of work in the file system.
> 
> In RHEL 7.2 and RHEL 7.3, "use" seems to include at least:
> 
> - File handle open.
> - Working directory set.
> - Automount path traversed at least once.
> 
> In RHEL 7.4, "use" seems to be reduced to:
> 
> - File handle open.
> - Working directory set.
> 
> I think C) from above is pretty easy to detect. I think repeated and
> persistent use of the file system easily differentiates C). In the "build"
> and "tools warehouse" cases, we would probably traverse the path thousands
> or millions of times during a 10 minute interval. This makes me think that
> you could choose a compromise in here:
> 
> - File handle open.
> - Working directory set.
> - Automount path traversed at least N times.
> 
> I wonder if you were to keep a "last_used" and a "last_used_count", would it
> be sufficient to confirm that "last_used_count > 100" or "last_used_count >
> 1000", and this would result in a decision that was often correct in terms
> of identifying passive monitoring use vs persistent active use?
> 
> As the expiration interval could have a large range depending upon use case,
> perhaps it would make sense to make N be proportional to the expiration
> interval? For example:
> 
> - File handle open.
> - Working directory set.
> - Automount path traversed at least 10 times per minute.
> 
> Back to the three scenarios:
> 
> A) Applications which are overly aggressive about monitoring all file
> systems, even file systems which should not be their concern.
> B) Applications which are correctly aggressive about monitoring file
> systems, that should be their concern.
> C) Applications which are actively using the file systems to perform a
> significant amount of work in the file system.
> 
> I think something like the above would help to more accurately identify C).
> Then, I am thinking that B) may not be important to distinguish from A) in
> real life, as the overhead should be low, and the use case may not be valid.

Yes, I'm thinking along the lines of some sort of frequency
calculation. It would need to be very simple so as to not add
too much overhead to path lookups and ideally independent of
the expire time setting. Still not sure yet.

The reality is it was like this at RHEL-7 GA and we don't have
the problem (to a large degree, although it is there) in RHEL-7
so there shouldn't be problems with reverting this one change,
assuming it really is the change I think it is.

There were two patches in the RHEL-7 sync. with upstream change
that touch last_used, one was part of an upstream path walk
series that tries to optimize path walks for certain work loads
(one work load example was application building). I don't think
that patch has any effect on what we are seeing here because it
only moved the update out of a frequently called function to a
location that updates the last used once at completion of the
procedure. Only a small part of the overall optimization but
still part of it.

Anyway, I'm just getting back to this to build a test kernel
with the patch I think changed this reverted so hopefully I'll
have something for you to test (your) tomorrow, LOL or perhaps
your today depending on when you see this.

> 
> Anyways - just some ideas to continue the conversation.
> 
> Thanks for looking into this, Ian. It is much appreciated.

I appreciate your effort in evaluating and describing your
use case. It's very important in keeping me focused on what's
most for important for autofs and TBH I really don't get enough
of it which can lead to not so good decisions on my part.

Ian

Comment 10 Ian Kent 2017-09-15 06:23:41 UTC

Mmm ... that took a lot longer than it should have.

I have a test build with the recent last_used modifier patch
reverted.

It can be found at:
http://people.redhat.com/~ikent/kernel-3.10.0-693.2.2.el7.bz1489542.1/

This build isn't a release build and so it is not signed, it's
for testing only.

Please check if this kernel resolves the expiration problem we
have discussed here in this bug.

Ian

Comment 11 Mark Mielke 2017-09-19 05:55:08 UTC

I tried out your test kernel, and I am unable to reproduce the problem that I reported. Absolute path accesses are holding the mount as "used" as expected, and the same as RHEL 7.3 and prior.

Comment 12 Ian Kent 2017-09-19 07:55:53 UTC

(In reply to Mark Mielke from comment #11)
> I tried out your test kernel, and I am unable to reproduce the problem that
> I reported. Absolute path accesses are holding the mount as "used" as
> expected, and the same as RHEL 7.3 and prior.

Ok, thought that would be the case.
I'll go ahead with the process to revert the change.

Comment 13 Ian Kent 2017-09-19 23:47:14 UTC

Created attachment 1328172 [details]
Patch - revert: take more care to not update last_used on path walk

Comment 20 Dave Wysochanski 2017-10-26 14:09:39 UTC

Created attachment 1343778 [details]
Trivial testcase showing the regression and fix.

Passes on RHEL7.3 kernel

# ./test.sh 
Thu Oct 26 09:52:03 EDT 2017: System Configuration
3.10.0-514.6.1.el7.x86_64
autofs-5.0.7-56.el7.x86_64
/net    -hosts  --timeout=60
Thu Oct 26 09:52:03 EDT 2017: first access - initiates mount request
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=12,pgrp=23643,timeout=60,minproto=5,maxproto=5,offset 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.6,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:52:03 EDT 2017: sleep 30 seconds == 1/2 of expiry timer
Thu Oct 26 09:52:33 EDT 2017: check for presence of mount
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=12,pgrp=23643,timeout=60,minproto=5,maxproto=5,offset 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.6,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:52:33 EDT 2017: second access - initiates path walk but do not open any files
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=12,pgrp=23643,timeout=60,minproto=5,maxproto=5,offset 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.6,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:52:33 EDT 2017: sleep 45 seconds == 1/2 of expiry timer + 15 seconds
Thu Oct 26 09:53:18 EDT 2017: check for presence of mount - will be present if path walk via ls command reset timer
TEST PASS: autofs expiry timer reset on path walk via ls command


Fails with RHEL7.4 kernel

# ./test.sh 
Thu Oct 26 09:54:46 EDT 2017: System Configuration
3.10.0-693.el7.x86_64
autofs-5.0.7-69.el7.x86_64
/net    -hosts   --timeout=60
Thu Oct 26 09:54:46 EDT 2017: first access - initiates mount request
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=23857,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=128489 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:54:46 EDT 2017: sleep 30 seconds == 1/2 of expiry timer
Thu Oct 26 09:55:16 EDT 2017: check for presence of mount
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=23857,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=128489 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:55:17 EDT 2017: second access - initiates path walk but do not open any files
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=23857,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=128489 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 09:55:17 EDT 2017: sleep 45 seconds == 1/2 of expiry timer + 15 seconds
Thu Oct 26 09:56:02 EDT 2017: check for presence of mount - will be present if path walk via ls command reset timer
TEST FAIL: autofs expiry timer not reset on path walk via ls command



Passes again with the test kernel from https://bugzilla.redhat.com/show_bug.cgi?id=1489542#c10

# ./test.sh 
Thu Oct 26 10:01:28 EDT 2017: System Configuration
3.10.0-693.2.2.el7.bz1489542.1.x86_64
autofs-5.0.7-69.el7.x86_64
/net    -hosts   --timeout=60
Thu Oct 26 10:01:28 EDT 2017: first access - initiates mount request
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=999,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=17809 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 10:01:29 EDT 2017: sleep 30 seconds == 1/2 of expiry timer
Thu Oct 26 10:01:59 EDT 2017: check for presence of mount
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=999,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=17809 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 10:01:59 EDT 2017: second access - initiates path walk but do not open any files
grep rhel7u4-node2 /proc/mounts
-hosts /net/rhel7u4-node2/exports autofs rw,relatime,fd=13,pgrp=999,timeout=60,minproto=5,maxproto=5,offset,pipe_ino=17809 0 0
rhel7u4-node2:/exports /net/rhel7u4-node2/exports nfs4 rw,nosuid,nodev,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.32,local_lock=none,addr=192.168.122.8 0 0
Thu Oct 26 10:01:59 EDT 2017: sleep 45 seconds == 1/2 of expiry timer + 15 seconds
Thu Oct 26 10:02:44 EDT 2017: check for presence of mount - will be present if path walk via ls command reset timer
TEST PASS: autofs expiry timer reset on path walk via ls command

Comment 22 Klaas Demter 2017-10-27 06:59:59 UTC

is the kbase article protected on purpose? even when logged in with my user I can't view it

Comment 23 Dave Wysochanski 2017-10-27 18:14:22 UTC

(In reply to Klaas Demter from comment #22)
> is the kbase article protected on purpose? even when logged in with my user
> I can't view it

Sorry it is incomplete so not published yet.  I will publish it soon.

Comment 26 Rafael Aquini 2017-12-13 23:22:55 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 28 Rafael Aquini 2017-12-14 10:14:16 UTC

Patch(es) available on kernel-3.10.0-822.el7

Comment 32 errata-xmlrpc 2018-04-10 22:00:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1062

Note You need to log in before you can comment on or make changes to this bug.