2213267 – filesystems mount and expire immediately

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2213267 - filesystems mount and expire immediately

Summary: filesystems mount and expire immediately

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	8.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Ian Kent
QA Contact:	Kun Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2223252 2223506
TreeView+	depends on / blocked

Reported:	2023-06-07 16:56 UTC by Frank Sorenson
Modified:	2023-11-14 18:01 UTC (History)
CC List:	4 users (show)
Fixed In Version:	autofs-5.1.4-109.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2223252 2223506 (view as bug list)
Environment:
Last Closed:	2023-11-14 15:48:36 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
Patch - fix expire retry looping (3.15 KB, patch) 2023-07-03 08:25 UTC, Ian Kent	no flags	Details \| Diff
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-159208	None	None	None	2023-06-07 17:02:23 UTC
Red Hat Knowledge Base (Solution)	7028551	None	None	None	2023-08-14 08:48:43 UTC
Red Hat Product Errata	RHBA-2023:7098	None	None	None	2023-11-14 15:48:55 UTC

Description Frank Sorenson 2023-06-07 16:56:49 UTC

Description of problem:

automount is continually mounting filesystems, then expiring and unmounting them immediately afterward



Version-Release number of selected component (if applicable):

autofs-5.1.4-83.el8


How reproducible:

repeats frequently on many of customer's systems


Steps to Reproduce:

unknown


Actual results:

Jun  6 10:58:50 hostname automount[9853]: [daemon.info] attempting to mount entry /u/user1
Jun  6 10:58:50 hostname nfsrahead[794032]: [daemon.info] setting /u/user1 readahead to 128
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] mounted /u/user1
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expiring path /u/user2
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expired /u/user2
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] attempting to mount entry /u/user2
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expiring path /u/user1
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] mounted /u/user2
Jun  6 10:58:50 hostname systemd[1]: [daemon.info] u-user1.mount: Succeeded.
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expired /u/user1
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] attempting to mount entry /u/user1
Jun  6 10:58:50 hostname nfsrahead[794068]: [daemon.info] setting /u/user1 readahead to 128
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] mounted /u/user1
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expiring path /u/user2
Jun  6 10:58:50 hostname systemd[1]: [daemon.info] u-user2.mount: Succeeded.
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] expired /u/user2
Jun  6 10:58:50 hostname automount[9853]: [daemon.info] attempting to mount entry /u/user2


Expected results:

filesystems remain mounted until timeout, then expire and unmount if unused


Additional info:

Comment 5 Ian Kent 2023-06-08 02:34:29 UTC

It looks like all or most of the autofs managed mounts have a timeout of 3600:
/etc/autofs/auto_u on /u type autofs (rw,relatime,fd=180,pgrp=9853,timeout=3600,minproto=5,maxproto=5,indirect,pipe_ino=77207)

I can't find anywhere that sets that timeout, how is it set?

Comment 6 Frank Sorenson 2023-06-08 15:40:37 UTC

It looks like the timeout is set to an hour in /etc/sysconfig/autofs:

    TIMEOUT=3600

We'll get debug-level logging from the customer.

Comment 7 Ian Kent 2023-06-09 01:32:04 UTC

(In reply to Frank Sorenson from comment #6)
> It looks like the timeout is set to an hour in /etc/sysconfig/autofs:
> 
>     TIMEOUT=3600

Oh, yes, thought I looked there ...

> 
> We'll get debug-level logging from the customer.

Thanks.

Comment 8 Frank Sorenson 2023-06-19 15:36:38 UTC

Okay, so the immediate expire appears to be because the customer is sending a USR1 to automount immediately after triggering the mount.  What a difference having all the information makes...  <sigh>

In this case, sending USR1 does appear to cause a problem where automount gets into a loop constantly attempting to expire, but failing to unmount (-EBUSY).  This expire loop only occurs after getting USR1 when a mount is accessed (and still in-use) from a mount namespace with propagation=slave


/etc/auto.master:

    /home1 /etc/auto.home1


/etc/auto.home1:
    user1    -rw    server:/homes/user1


    # systemctl restart autofs.service

trigger the mount and keep it busy inside a mount namespace with propagation=slave:

    # unshare -m --propagation=slave /bin/bash -c "cd /home1/user1 ; sleep 999"

After each timeout period, autofs will attempt to unmount, failing with EBUSY:

Jun 17 14:15:02 vm23 automount[1770128]: st_expire: state 1 path /home1
Jun 17 14:15:02 vm23 automount[1770128]: expire_proc: exp_proc = 140602511918848 path /home1
Jun 17 14:15:02 vm23 automount[1770128]: expire_proc_indirect: expire /home1/user1
Jun 17 14:15:02 vm23 automount[1770128]: handle_packet: type = 4
Jun 17 14:15:02 vm23 automount[1770128]: handle_packet_expire_indirect: token 51582, name user1
Jun 17 14:15:02 vm23 automount[1770128]: expiring path /home1/user1
Jun 17 14:15:02 vm23 automount[1770128]: umount_multi: path /home1/user1 incl 1
Jun 17 14:15:02 vm23 automount[1770128]: umount_subtree_mounts: unmounting dir = /home1/user1
Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:15:02 vm23 automount[1770128]: spawn_umount: umount failed with error code 16, retrying with the -f option
Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:15:02 vm23 automount[1770128]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Jun 17 14:15:02 vm23 automount[1770128]: could not umount dir /home1/user1
Jun 17 14:15:02 vm23 automount[1770128]: couldn't complete expire of /home1/user1
Jun 17 14:15:02 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51582
Jun 17 14:15:02 vm23 automount[1770128]: expire_proc_indirect: 1 remaining in /home1
Jun 17 14:15:02 vm23 automount[1770128]: expire_cleanup: got thid 140602511918848 path /home1 stat 1
Jun 17 14:15:02 vm23 automount[1770128]: expire_cleanup: sigchld: exp 140602511918848 finished, switching from 2 to 1
Jun 17 14:15:02 vm23 automount[1770128]: st_ready: st_ready(): state = 2 path /home1

I presume that the mount is seen as idle because the automount process is running in the default mount namespace, and the mount is kept busy by a process in a separate mount namespace with propagation=slave, rather than shared.  As a result, the mount is seen as both idle and in-use.

The real kicker occurs when automount gets USR1; 

    # pkill -USR1 automount

autofs gets into a loop, trying to expire & unmount repeatedly:

Jun 17 14:39:59 vm23 automount[1770128]: do_notify_state: signal 10
Jun 17 14:39:59 vm23 automount[1770128]: master_notify_state_change: sig 10 switching /home1 from 1 to 3
Jun 17 14:39:59 vm23 automount[1770128]: st_prune: state 1 path /home1
Jun 17 14:39:59 vm23 automount[1770128]: expire_proc: exp_proc = 140602511918848 path /home1
Jun 17 14:39:59 vm23 automount[1770128]: expire_proc_indirect: expire /home1/user1

Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect: token 51593, name user1
Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1 incl 1
Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting dir = /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with error code 16, retrying with the -f option
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51593

Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect: token 51594, name user1
Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1 incl 1
Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting dir = /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with error code 16, retrying with the -f option
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51594

Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect: token 51595, name user1
Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1 incl 1
Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting dir = /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with error code 16, retrying with the -f option
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51595

Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect: token 51596, name user1
Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1 incl 1
Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting dir = /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with error code 16, retrying with the -f option
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1: device is busy
Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of /home1/user1
Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51596
...

repeating until the filesystem can actually be unmounted


So I suppose there may be two issues (though neither is the immediate expire for which this BZ was opened):

1) the kernel doesn't detect that the mount is still in-use in a mount namespace other than the one in which automount runs

2) after getting SIGUSR1, automount enters a loop where it repeatedly tries to expire & unmount a busy filesystem (should it try to unmount just once?)

Comment 9 Ian Kent 2023-06-25 08:22:31 UTC

(In reply to Frank Sorenson from comment #8)
> Okay, so the immediate expire appears to be because the customer is sending
> a USR1 to automount immediately after triggering the mount.  What a
> difference having all the information makes...  <sigh>
> 
> In this case, sending USR1 does appear to cause a problem where automount
> gets into a loop constantly attempting to expire, but failing to unmount
> (-EBUSY).  This expire loop only occurs after getting USR1 when a mount is
> accessed (and still in-use) from a mount namespace with propagation=slave
> 
> 
> /etc/auto.master:
> 
>     /home1 /etc/auto.home1
> 
> 
> /etc/auto.home1:
>     user1    -rw    server:/homes/user1
> 
> 
>     # systemctl restart autofs.service
> 
> trigger the mount and keep it busy inside a mount namespace with
> propagation=slave:
> 
>     # unshare -m --propagation=slave /bin/bash -c "cd /home1/user1 ; sleep
> 999"
> 
> After each timeout period, autofs will attempt to unmount, failing with
> EBUSY:
> 
> Jun 17 14:15:02 vm23 automount[1770128]: st_expire: state 1 path /home1
> Jun 17 14:15:02 vm23 automount[1770128]: expire_proc: exp_proc =
> 140602511918848 path /home1
> Jun 17 14:15:02 vm23 automount[1770128]: expire_proc_indirect: expire
> /home1/user1
> Jun 17 14:15:02 vm23 automount[1770128]: handle_packet: type = 4
> Jun 17 14:15:02 vm23 automount[1770128]: handle_packet_expire_indirect:
> token 51582, name user1
> Jun 17 14:15:02 vm23 automount[1770128]: expiring path /home1/user1
> Jun 17 14:15:02 vm23 automount[1770128]: umount_multi: path /home1/user1
> incl 1
> Jun 17 14:15:02 vm23 automount[1770128]: umount_subtree_mounts: unmounting
> dir = /home1/user1
> Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:15:02 vm23 automount[1770128]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:15:02 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:15:02 vm23 automount[1770128]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Jun 17 14:15:02 vm23 automount[1770128]: could not umount dir /home1/user1
> Jun 17 14:15:02 vm23 automount[1770128]: couldn't complete expire of
> /home1/user1
> Jun 17 14:15:02 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51582
> Jun 17 14:15:02 vm23 automount[1770128]: expire_proc_indirect: 1 remaining
> in /home1
> Jun 17 14:15:02 vm23 automount[1770128]: expire_cleanup: got thid
> 140602511918848 path /home1 stat 1
> Jun 17 14:15:02 vm23 automount[1770128]: expire_cleanup: sigchld: exp
> 140602511918848 finished, switching from 2 to 1
> Jun 17 14:15:02 vm23 automount[1770128]: st_ready: st_ready(): state = 2
> path /home1
> 
> I presume that the mount is seen as idle because the automount process is
> running in the default mount namespace, and the mount is kept busy by a
> process in a separate mount namespace with propagation=slave, rather than
> shared.  As a result, the mount is seen as both idle and in-use.

You presumption is accurate except that we never want the propagation to be
shared, specifically we don't want mounts to propagate from the mount namespace
back to the init (or root) mount namespace, that ends badly.

I'm pretty sure this is because the kernel function may_umount_tree(), used by
the autofs expire, is not able to check usage of propagated mounts. So if the
mount isn't in use in the root namespace it will be seen as not busy and will
be selected for expire.

The problem with fixing this is, a brute force traversal of "all" of the mount
namespace trees isn't acceptable for inclusion upstream and that has been
rejected some time ago now.

However, I have been working on this recently (stopped due to other demands)
and I have code I believe is acceptable upstream and appears to function
correctly but I have a suspicion it doesn't quite work exactly as we need
so more testing needs to be done before I propose it upstream.

> 
> The real kicker occurs when automount gets USR1; 
> 
>     # pkill -USR1 automount
> 
> autofs gets into a loop, trying to expire & unmount repeatedly:
> 
> Jun 17 14:39:59 vm23 automount[1770128]: do_notify_state: signal 10
> Jun 17 14:39:59 vm23 automount[1770128]: master_notify_state_change: sig 10
> switching /home1 from 1 to 3
> Jun 17 14:39:59 vm23 automount[1770128]: st_prune: state 1 path /home1
> Jun 17 14:39:59 vm23 automount[1770128]: expire_proc: exp_proc =
> 140602511918848 path /home1
> Jun 17 14:39:59 vm23 automount[1770128]: expire_proc_indirect: expire
> /home1/user1
> 
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect:
> token 51593, name user1
> Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1
> incl 1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting
> dir = /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of
> /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51593
> 
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect:
> token 51594, name user1
> Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1
> incl 1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting
> dir = /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of
> /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51594
> 
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect:
> token 51595, name user1
> Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1
> incl 1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting
> dir = /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of
> /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51595
> 
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet: type = 4
> Jun 17 14:39:59 vm23 automount[1770128]: handle_packet_expire_indirect:
> token 51596, name user1
> Jun 17 14:39:59 vm23 automount[1770128]: expiring path /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_multi: path /home1/user1
> incl 1
> Jun 17 14:39:59 vm23 automount[1770128]: umount_subtree_mounts: unmounting
> dir = /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: >> umount.nfs4: /home1/user1:
> device is busy
> Jun 17 14:39:59 vm23 automount[1770128]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Jun 17 14:39:59 vm23 automount[1770128]: could not umount dir /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: couldn't complete expire of
> /home1/user1
> Jun 17 14:39:59 vm23 automount[1770128]: dev_ioctl_send_fail: token = 51596
> ...
> 
> repeating until the filesystem can actually be unmounted

This is unexpected, I'll need to reproduce it and work out what's going on.

> 
> 
> So I suppose there may be two issues (though neither is the immediate expire
> for which this BZ was opened):
> 
> 1) the kernel doesn't detect that the mount is still in-use in a mount
> namespace other than the one in which automount runs

Correct, but more, the kernel doesn't know how to check them at all so it
can't check the last used time stamp either.

> 
> 2) after getting SIGUSR1, automount enters a loop where it repeatedly tries
> to expire & unmount a busy filesystem (should it try to unmount just once?)

Yes, but it is expected that if the mount remains unused for a further timeout
it will try and umount it again ... unfortunately ...

Ian

Comment 10 Ian Kent 2023-06-26 06:12:59 UTC

(In reply to Ian Kent from comment #9)
> 
> > 
> > 2) after getting SIGUSR1, automount enters a loop where it repeatedly tries
> > to expire & unmount a busy filesystem (should it try to unmount just once?)
> 
> Yes, but it is expected that if the mount remains unused for a further
> timeout
> it will try and umount it again ... unfortunately ...

Ok, I'm very much tempted to say lets just fix the kernel expire namespace
check problem.

It has needed to be fixed for ages and I have put quite a bit of effort in
to it over time which has got us something that's close and we have a
customer that needs it to so it's worth spending a bit more time on it 
and trying to get it merged upstream.

Thing is once the expire check is fixed automount behaves as it should
because the mount doesn't get selected for expire. That looping is due
to an optimisation that was done a while back and it was done with the
assumption that the kernel expire check functions properly so strictly
speaking its not a bug.

This behaviour might also occur during a forced shutdown (sig USR2)
but in this case mounts should always be umounted, either as normal
or lazy umounted so that would actually be a different problem.

If we have trouble getting this change accepted upstream we could
add a workaround (which I also have tested) while we wait for me
to do whatever is needed for the change upstream.

I have to say, there is one patch which was sent to me by Al as a
basis for what I needed to do. It does make it hugely simpler but
it is a fundamental change to the mount point reference counting
so it's cause to pause and consider the implications. OTOH I have
used it a lot during testing without any side effects so maybe I'm
just being paranoid.

Ian

Comment 11 Ian Kent 2023-07-03 08:25:45 UTC

Created attachment 1973843 [details]
Patch - fix expire retry looping

I think I'll go with this, the expire improvement needs to go
upstream but it's likely to take a while.

Comment 18 errata-xmlrpc 2023-11-14 15:48:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7098

Note You need to log in before you can comment on or make changes to this bug.