2052122 – autofs attempts unmount on directory in use

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2052122 - autofs attempts unmount on directory in use

Summary: autofs attempts unmount on directory in use

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 8
Classification:	Red Hat
Component:	autofs
Sub Component:
Version:	8.5
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	rc
Target Release:	---
Assignee:	Ian Kent
QA Contact:	Kun Wang
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	2056321
TreeView+	depends on / blocked

Reported:	2022-02-08 18:29 UTC by jas
Modified:	2022-05-10 16:52 UTC (History)
CC List:	1 user (show)
Fixed In Version:	autofs-5.1.4-82.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	2056321 (view as bug list)
Environment:
Last Closed:	2022-05-10 15:29:06 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
autofs rev 80 test build (3.25 MB, application/gzip) 2022-02-09 05:46 UTC, Ian Kent	no flags	Details
autofs-5.1.4-48.el8 packages (3.33 MB, application/gzip) 2022-02-10 08:33 UTC, Ian Kent	no flags	Details
autofs rev 82 test build (3.25 MB, application/gzip) 2022-02-11 07:20 UTC, Ian Kent	no flags	Details
autofs rev 82 test build (3.25 MB, application/gzip) 2022-02-11 07:22 UTC, Ian Kent	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Issue Tracker	RHELPLAN-111509	0	None	None	None	2022-02-08 18:49:22 UTC
Red Hat Product Errata	RHBA-2022:2084	0	None	None	None	2022-05-10 15:29:18 UTC

Description jas 2022-02-08 18:29:58 UTC

In my /etc/auto.direct I have the following:

/eecs/local             -bg,proto=tcp,acl \
        /                                               nfsserv:/obj/local-intel64 \
        /share                                          nfsserv:/obj/local-share \
        /src                                            nfsserv:/obj/local-src

When the system boots, it starts up a self-compiled httpd from a local directory, such as - /usr/local/pkg/httpd/bin/httpd.  However, one of the Apache modules (mod_wsgi) refers to a python library on an NFS share - /eecs/local/lib64.

When the machine starts, the web server starts fine, and I can ssh to the system and access /eecs/local, or /eecs/local/share, or /eecs/local/src.  In fact, just by logging in, my home directory (also on an NFS share) sources a file from /eecs/local/share, so /eecs/local/share is mounted on login.  An lsof on /eecs/local shows that the web module is using /eecs/local/lib64.

After logging out of the system, when I SSH to the system again later, I get a message that /eecs/local/share/cshrc.common is not readable.  I find that /eecs is mounted, but I cannot access /eecs/local/src or /eecs/local/share.

It turns out this is because autofs decided that /eecs is not in use - it tried to unmount everything, and found that /eecs/local was busy ("device busy") so it seems to have "given up" on it:

Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy
Feb  7 16:12:24 toor8 automount[1256]: spawn_umount: umount failed with error code 16, retrying with the -f option
Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy
Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy
Feb  7 16:12:24 toor8 automount[1256]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share
Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/
Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src

If I don't load the mod_wsgi module which uses /eecs/local, then everything gets unmounted properly by autofs, and works.  Is this a bug?

I've also tried to compile the latest autofs from kernel.org, and this still exhibits the same problem.

If lsof knows that /eecs/local is in use, then I assume that autofs should know as well.

Comment 1 Ian Kent 2022-02-09 02:10:32 UTC

(In reply to jas from comment #0)
> In my /etc/auto.direct I have the following:
> 
> /eecs/local             -bg,proto=tcp,acl \
>         /                                              
> nfsserv:/obj/local-intel64 \
>         /share                                         
> nfsserv:/obj/local-share \
>         /src                                           
> nfsserv:/obj/local-src
> 
> When the system boots, it starts up a self-compiled httpd from a local
> directory, such as - /usr/local/pkg/httpd/bin/httpd.  However, one of the
> Apache modules (mod_wsgi) refers to a python library on an NFS share -
> /eecs/local/lib64.
> 
> When the machine starts, the web server starts fine, and I can ssh to the
> system and access /eecs/local, or /eecs/local/share, or /eecs/local/src.  In
> fact, just by logging in, my home directory (also on an NFS share) sources a
> file from /eecs/local/share, so /eecs/local/share is mounted on login.  An
> lsof on /eecs/local shows that the web module is using /eecs/local/lib64.
> 
> After logging out of the system, when I SSH to the system again later, I get
> a message that /eecs/local/share/cshrc.common is not readable.  I find that
> /eecs is mounted, but I cannot access /eecs/local/src or /eecs/local/share.
> 
> It turns out this is because autofs decided that /eecs is not in use - it
> tried to unmount everything, and found that /eecs/local was busy ("device
> busy") so it seems to have "given up" on it:
> 
> Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  7 16:12:24 toor8 automount[1256]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  7 16:12:24 toor8 automount[1256]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/share
> Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/
> Feb  7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/src
> 
> If I don't load the mod_wsgi module which uses /eecs/local, then everything
> gets unmounted properly by autofs, and works.  Is this a bug?

One way or another, yes, I think so.

> 
> I've also tried to compile the latest autofs from kernel.org, and this still
> exhibits the same problem.

Yes, despite the difference in versions, RHEL is quite close to upstream.

There was a recent re-write of the offset handling, very much needed, and despite
significant testing and a number of problems found I'm getting some bug reports.

This however is not quite as simple as most because of a limitation of the kernel
autofs ioctl related to identifying if a mount tree is busy. Because of that I
it's difficult to get this right in user space.

However, I think there is a bug in that when an error like this is encountered
the mount tree should be restored and expire stopped.

I'll investigate.

> 
> If lsof knows that /eecs/local is in use, then I assume that autofs should
> know as well.

Yes, but believe it or not it isn't quite that simple from user space.

Ian

Comment 2 Ian Kent 2022-02-09 05:30:50 UTC

(In reply to Ian Kent from comment #1)
> > 
> > If I don't load the mod_wsgi module which uses /eecs/local, then everything
> > gets unmounted properly by autofs, and works.  Is this a bug?
> 
> One way or another, yes, I think so.

I wonder how the module is keeping the mount busy ...

I've tried to duplicate this behaviour but I don't seem to be able to.

Granted I'm using an upstream release with a number of bug fixes so this might
already be fixed but there may be something odd with the way the module is used
that isn't accounted for.

Probably the best thing to do will be to provide a RHEL build with most of the
current upstream changes (to be included in the next RHEL release) and see if
you can still reproduce the problem.

I'll get back to you on that.

Ian

Comment 3 Ian Kent 2022-02-09 05:46:04 UTC

Created attachment 1859982 [details]
autofs rev 80 test build

Comment 4 Ian Kent 2022-02-09 05:46:54 UTC

(In reply to Ian Kent from comment #3)
> Created attachment 1859982 [details]
> autofs rev 80 test build

Can you try this build and see if it improves matters please.

Comment 5 jas 2022-02-09 17:51:48 UTC

Hi Ian.
First off, thank you so much for investigating this so quickly.
After days of being able to login to the host fine, I re-enabled the mod_wsgi module, and was able to replicate the problem immediately.
I assume that you're setting up a similar mount on your system.
Since mod_wsgi is the Python module, lsof is showing this output:

COMMAND    PID USER  FD   TYPE DEVICE SIZE/OFF     NODE NAME
httpd   299692 root mem    REG   0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
httpd   299693  www mem    REG   0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
httpd   299694  www mem    REG   0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
httpd   299695  www mem    REG   0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
httpd   299696  www mem    REG   0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)

Would I have not tried the upstream changes by downloading the latest autofs from kernel.org, and compiling that?  

Here's the output from /var/log/messages again if this reveals anything else:

Feb  9 12:31:20 toor8 automount[1281]: st_expire: state 1 path /-
Feb  9 12:31:20 toor8 automount[1281]: expire_proc: exp_proc = 140113313212160 path /-
Feb  9 12:31:20 toor8 automount[1281]: expire_proc_direct: send expire to trigger /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: handle_packet: type = 6
Feb  9 12:31:20 toor8 automount[1281]: handle_packet_expire_direct: token 160, name /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: expiring path /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: umount_multi: path /eecs/local incl 1
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount offset /eecs/local/share
Feb  9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/share
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount offset /eecs/local/src
Feb  9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/src
Feb  9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src
Feb  9 12:31:20 toor8 automount[1281]: expired /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: dev_ioctl_send_ready: token = 160
Feb  9 12:31:20 toor8 automount[1281]: expire_cleanup: got thid 140113313212160 path /- stat 0
Feb  9 12:31:20 toor8 automount[1281]: expire_cleanup: sigchld: exp 140113313212160 finished, switching from 2 to 1
Feb  9 12:31:20 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /-

Thank you again.

Comment 6 Ian Kent 2022-02-10 05:07:41 UTC

(In reply to jas from comment #5)
> Hi Ian.
> First off, thank you so much for investigating this so quickly.

np.

> After days of being able to login to the host fine, I re-enabled the
> mod_wsgi module, and was able to replicate the problem immediately.
> I assume that you're setting up a similar mount on your system.
> Since mod_wsgi is the Python module, lsof is showing this output:

Only in so much as I'm simulating the mount being busy by changing
directory to it in another process.

That should be enough to simulate the business since the cd will
cause a reference to be taken on the mount in the same way an open
file will.

> 
> COMMAND    PID USER  FD   TYPE DEVICE SIZE/OFF     NODE NAME
> httpd   299692 root mem    REG   0,57 22036176 69564543
> /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
> httpd   299693  www mem    REG   0,57 22036176 69564543
> /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
> httpd   299694  www mem    REG   0,57 22036176 69564543
> /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
> httpd   299695  www mem    REG   0,57 22036176 69564543
> /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)
> httpd   299696  www mem    REG   0,57 22036176 69564543
> /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64)

This implies these are open, that should be enough for the containing
mount to appear busy unless the process that's done the open is running
within a different mount namespace.

Is the application running in some sort of isolated environment?

Something like systemd-nspawn I think could do this, but there are other
ways it can be done. Certainly the systemd service PrivateTmp directive
can have similar side effects in some cases too.

This would probably trigger that kernel bug ...

> 
> Would I have not tried the upstream changes by downloading the latest autofs
> from kernel.org, and compiling that?  

It won't, this is very recent work and I'm not quite ready to commit
it and push it to the kernel.org repository.

> 
> Here's the output from /var/log/messages again if this reveals anything else:
> 
> Feb  9 12:31:20 toor8 automount[1281]: st_expire: state 1 path /-
> Feb  9 12:31:20 toor8 automount[1281]: expire_proc: exp_proc =
> 140113313212160 path /-
> Feb  9 12:31:20 toor8 automount[1281]: expire_proc_direct: send expire to
> trigger /eecs/local
> Feb  9 12:31:20 toor8 automount[1281]: handle_packet: type = 6
> Feb  9 12:31:20 toor8 automount[1281]: handle_packet_expire_direct: token
> 160, name /eecs/local
> Feb  9 12:31:20 toor8 automount[1281]: expiring path /eecs/local
> Feb  9 12:31:20 toor8 automount[1281]: umount_multi: path /eecs/local incl 1
> Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount
> offset /eecs/local/share
> Feb  9 12:31:20 toor8 automount[1281]: umounted offset mount
> /eecs/local/share
> Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount
> offset /eecs/local/src
> Feb  9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/src
> Feb  9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local
> Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb  9 12:31:20 toor8 automount[1281]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/share
> Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/
> Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree:
> deleting offset key /eecs/local/src
> Feb  9 12:31:20 toor8 automount[1281]: expired /eecs/local
> Feb  9 12:31:20 toor8 automount[1281]: dev_ioctl_send_ready: token = 160
> Feb  9 12:31:20 toor8 automount[1281]: expire_cleanup: got thid
> 140113313212160 path /- stat 0
> Feb  9 12:31:20 toor8 automount[1281]: expire_cleanup: sigchld: exp
> 140113313212160 finished, switching from 2 to 1
> Feb  9 12:31:20 toor8 automount[1281]: st_ready: st_ready(): state = 2 path
> /-

Yes, it's definitely in use and it's not appearing that way to automount
in the init mount namespace ... there's also the question of why I'm not
handling this case properly since a umount can fail ... I will need to
duplicate the behaviour somehow ...

Ian

Comment 7 Ian Kent 2022-02-10 05:15:42 UTC

I wonder how autofs-5.1.4-48.el8 behaves?

If don't have access to this I can provide it.

Comment 8 jas 2022-02-10 05:48:49 UTC

Hi Ian.
While I'm not using any isolating environment per se, httpd.service file is definately using PrivateTmp=true. I didn't know this would impact autofs. This is the default behavior for httpd.service.  Does this change anything?

I don't see  autofs-5.1.4-48.el8 I'm running 5.1.4-70.el8. If you can make 48 available I can try it.

Comment 9 Ian Kent 2022-02-10 06:27:01 UTC

(In reply to jas from comment #8)
> Hi Ian.
> While I'm not using any isolating environment per se, httpd.service file is
> definately using PrivateTmp=true. I didn't know this would impact autofs.
> This is the default behavior for httpd.service.  Does this change anything?

Well, maybe, I have seen problems with this in the past but it usually only
shows up after restarting the service that uses it. I believe it's because
mounts are propagated to the new mount namespace (including autofs mounts)
that should really be umounted as part of the namespace creation since they
aren't needed and shouldn't be a part of the resulting mount namespace but
TBH I never did get to the bottom of it.

> 
> I don't see  autofs-5.1.4-48.el8 I'm running 5.1.4-70.el8. If you can make
> 48 available I can try it.

I was putting together a lengthy response asking for a more detailed log until
I realised that /eecs/local/lib is contained within the NFS mount at /eecs/local.

I see what's happening, I do need to handle this by re-instating the offsets
on that umount fail (a function call to do that has disappeared from the code
at some point) and one of the function returns may be incorrect too.

Maybe I will be able to reproduce it now, ;)

Let me look at this a bit more before we try rev. 48.
I'm pretty sure rev. 48 will handle this case correctly.

Ian

Comment 10 Ian Kent 2022-02-10 08:30:18 UTC

(In reply to Ian Kent from comment #9)
> 
> I see what's happening, I do need to handle this by re-instating the offsets
> on that umount fail (a function call to do that has disappeared from the code
> at some point) and one of the function returns may be incorrect too.
> 
> Maybe I will be able to reproduce it now, ;)
> 
> Let me look at this a bit more before we try rev. 48.
> I'm pretty sure rev. 48 will handle this case correctly.

I still can't reproduce this.

This:
Feb  9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb  9 12:31:20 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/
Feb  9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src

is wrong.

The code called there hasn't been changed and AFAICS it isn't seeing a failure from the
umount and it should be. Even if the ultimate return is wrong if it doesn't see a failure
there's no chance the return will be correct anyway. Also, the call to resurrect those
offsets on a fail is in fact still present so it's down to the nfs umount failure returning
success when it has in fact failed.

Maybe I have had a logic error there for a while, I'll need to go back over the older code
changes to see what I was fixing there when I added the mounted check to guard against
incorrect returns from the umount sub-process, maybe it was never quite enough.

This is a bit odd!

I would like to know if autofs rev 48 does in fact get this right though.
I'll post a tar.

Ian

Comment 11 Ian Kent 2022-02-10 08:33:31 UTC

Created attachment 1860288 [details]
autofs-5.1.4-48.el8 packages

I don't think these packages are signed, sorry.

They come direct from the build system which I think is before
they go to QE and are tested and ultimately signed.

Comment 12 jas 2022-02-10 14:12:29 UTC

Hi Ian,
I downloaded the package and did:
# dnf localinstall autofs-5.1.4-48.el8.x86_64.rpm 
Last metadata expiration check: 3:16:31 ago on Thu 10 Feb 2022 05:03:07 AM EST.
Dependencies resolved.
================================================================================
 Package       Architecture  Version                  Repository           Size
================================================================================
Downgrading:
 autofs        x86_64        1:5.1.4-48.el8           @commandline        782 k

Transaction Summary
================================================================================
Downgrade  1 Package

Total size: 782 k
Is this ok [y/N]: y
Downloading Packages:
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Preparing        :                                                        1/1 
  Running scriptlet: autofs-1:5.1.4-48.el8.x86_64                           1/1 
  Downgrading      : autofs-1:5.1.4-48.el8.x86_64                           1/2 
warning: /etc/sysconfig/autofs created as /etc/sysconfig/autofs.rpmnew

  Running scriptlet: autofs-1:5.1.4-48.el8.x86_64                           1/2 
  Running scriptlet: autofs-1:5.1.4-74.el8.x86_64                           2/2 
  Cleanup          : autofs-1:5.1.4-74.el8.x86_64                           2/2 
  Running scriptlet: autofs-1:5.1.4-74.el8.x86_64                           2/2 
  Verifying        : autofs-1:5.1.4-48.el8.x86_64                           1/2 
  Verifying        : autofs-1:5.1.4-74.el8.x86_64                           2/2 

Downgraded:
  autofs-1:5.1.4-48.el8.x86_64     

I rebooted the system just in case.  It's a test system anyway.

Verify that it's running correct autofs..

# /usr/sbin/automount -V

Linux automount version 5.1.4-48.el8

Directories:
	config dir:	/etc/sysconfig
	maps dir:	/etc
	modules dir:	/usr/lib64/autofs

Compile options:
  DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_SYSTEMD 
  WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC 

----

lsof shows module is in use from /eecs/local/lib..

I logged into the system with ssh.  /eecs/local/share was mounted, /eecs/local/src was not mounted (because not required as part of login)
ls /eecs/local/src mounts /eecs/local/src
logged out of system for a bit.... waited for local-share and local-src to be unmounted...
sshed to system again...
IT WORKED.  No "device is busy". 
But before declaring that -48 was the solution, I tried again... I logged out, got busy with things, and came back a half hour later, logged in, and problem is back...

Feb 10 08:44:34 toor8 automount[1281]: st_expire: state 1 path /-
Feb 10 08:44:34 toor8 automount[1281]: expire_proc: exp_proc = 139988105643776 path /-
Feb 10 08:44:34 toor8 automount[1281]: expire_proc_direct: send expire to trigger /eecs/local
Feb 10 08:44:34 toor8 automount[1281]: handle_packet: type = 6
Feb 10 08:44:34 toor8 automount[1281]: handle_packet_expire_direct: token 23, name /eecs/local
Feb 10 08:44:34 toor8 automount[1281]: expiring path /eecs/local
Feb 10 08:44:34 toor8 automount[1281]: umount_multi: path /eecs/local incl 1
Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset /eecs/local/share
Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/share
Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset /eecs/local/src
Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/src
Feb 10 08:44:34 toor8 automount[1281]: unmounting dir = /eecs/local
Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb 10 08:44:34 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option
Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy
Feb 10 08:44:34 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/
Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/share
Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/src
Feb 10 08:44:34 toor8 automount[1281]: expired /eecs/local
Feb 10 08:44:34 toor8 automount[1281]: dev_ioctl_send_ready: token = 23
Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: got thid 139988105643776 path /- stat 0
Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: sigchld: exp 139988105643776 finished, switching from 2 to 1
Feb 10 08:44:34 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /-
Feb 10 08:44:48 toor8 automount[1281]: st_expire: state 1 path /eecs/research
Feb 10 08:44:48 toor8 automount[1281]: expire_proc: exp_proc = 139988105643776 path /eecs/research
Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: got thid 139988105643776 path /eecs/research stat 0
Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: sigchld: exp 139988105643776 finished, switching from 2 to 1
Feb 10 08:44:48 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /eecs/research

of course ls of /eecs/local works... but not share or src...

Back to the drawing board...

Comment 13 Ian Kent 2022-02-11 01:08:15 UTC

(In reply to jas from comment #12)
> 
> lsof shows module is in use from /eecs/local/lib..
> 
> I logged into the system with ssh.  /eecs/local/share was mounted,
> /eecs/local/src was not mounted (because not required as part of login)
> ls /eecs/local/src mounts /eecs/local/src
> logged out of system for a bit.... waited for local-share and local-src to
> be unmounted...
> sshed to system again...
> IT WORKED.  No "device is busy". 
> But before declaring that -48 was the solution, I tried again... I logged
> out, got busy with things, and came back a half hour later, logged in, and
> problem is back...
> 
> Feb 10 08:44:34 toor8 automount[1281]: st_expire: state 1 path /-
> Feb 10 08:44:34 toor8 automount[1281]: expire_proc: exp_proc =
> 139988105643776 path /-
> Feb 10 08:44:34 toor8 automount[1281]: expire_proc_direct: send expire to
> trigger /eecs/local
> Feb 10 08:44:34 toor8 automount[1281]: handle_packet: type = 6
> Feb 10 08:44:34 toor8 automount[1281]: handle_packet_expire_direct: token
> 23, name /eecs/local
> Feb 10 08:44:34 toor8 automount[1281]: expiring path /eecs/local
> Feb 10 08:44:34 toor8 automount[1281]: umount_multi: path /eecs/local incl 1
> Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset
> /eecs/local/share
> Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount
> /eecs/local/share
> Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset
> /eecs/local/src
> Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/src
> Feb 10 08:44:34 toor8 automount[1281]: unmounting dir = /eecs/local
> Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb 10 08:44:34 toor8 automount[1281]: spawn_umount: umount failed with
> error code 16, retrying with the -f option
> Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device
> is busy
> Feb 10 08:44:34 toor8 automount[1281]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting
> offset key /eecs/local/
> Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting
> offset key /eecs/local/share
> Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting
> offset key /eecs/local/src
> Feb 10 08:44:34 toor8 automount[1281]: expired /eecs/local
> Feb 10 08:44:34 toor8 automount[1281]: dev_ioctl_send_ready: token = 23
> Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: got thid
> 139988105643776 path /- stat 0
> Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: sigchld: exp
> 139988105643776 finished, switching from 2 to 1
> Feb 10 08:44:34 toor8 automount[1281]: st_ready: st_ready(): state = 2 path
> /-
> Feb 10 08:44:48 toor8 automount[1281]: st_expire: state 1 path /eecs/research
> Feb 10 08:44:48 toor8 automount[1281]: expire_proc: exp_proc =
> 139988105643776 path /eecs/research
> Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: got thid
> 139988105643776 path /eecs/research stat 0
> Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: sigchld: exp
> 139988105643776 finished, switching from 2 to 1
> Feb 10 08:44:48 toor8 automount[1281]: st_ready: st_ready(): state = 2 path
> /eecs/research
> 
> of course ls of /eecs/local works... but not share or src...
> 
> Back to the drawing board...

Well, no, that's actually sensible based on what I said in comment #10.

While I can't work out how the mount is being kept busy that umount fail
being seen in a release that doesn't have a significant amount of change
means it's a bug that's been there for quite a while.

I just need to decide how to handle it which shouldn't be too hard.

I'll get back to you with a test package when I've done that.

Ian

Comment 14 Ian Kent 2022-02-11 07:20:32 UTC

Created attachment 1860535 [details]
autofs rev 82 test build

Here is a test build to try and resolve the incorrect umount fail handling.
Please test it and let me know how it goes.

Comment 15 Ian Kent 2022-02-11 07:22:53 UTC

Created attachment 1860536 [details]
autofs rev 82 test build

Here is a test build to try and resolve the incorrect umount fail handling, please test it out.

Comment 16 jas 2022-02-12 15:56:48 UTC

Hi Ian.
I installed rev 82.  I believe this has fixed the problem.  I have tried multiple times after the local-share and local-src mounts were unmounted, and it seems to be holding!  I will continue to try over the weekend, waiting a much longer period after the unmount, and see if anything happens.

# /usr/sbin/automount -V

Linux automount version 5.1.4-82.el8

Directories:
	config dir:	/etc/sysconfig
	maps dir:	/etc
	modules dir:	/usr/lib64/autofs

Compile options:
  DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_SYSTEMD 
  WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC

Comment 17 jas 2022-02-14 18:09:33 UTC

Still working well today... I think you've fixed it! :)

Comment 18 Ian Kent 2022-02-15 01:15:58 UTC

(In reply to jas from comment #17)
> Still working well today... I think you've fixed it! :)

Sounds good then.
I'll go ahead with the fix for RHEL-8.6.

Comment 19 jas 2022-02-15 13:14:09 UTC

Thanks a bunch, Ian! Greatly appreciated...

Comment 25 errata-xmlrpc 2022-05-10 15:29:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (autofs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2084

Note You need to log in before you can comment on or make changes to this bug.