Bug 2052122
| Summary: | autofs attempts unmount on directory in use | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | jas | ||||||||||
| Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||||||
| Status: | CLOSED ERRATA | QA Contact: | Kun Wang <kunwan> | ||||||||||
| Severity: | medium | Docs Contact: | |||||||||||
| Priority: | unspecified | ||||||||||||
| Version: | 8.5 | CC: | xzhou | ||||||||||
| Target Milestone: | rc | Keywords: | Triaged | ||||||||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||||||||
| Hardware: | Unspecified | ||||||||||||
| OS: | Unspecified | ||||||||||||
| Whiteboard: | |||||||||||||
| Fixed In Version: | autofs-5.1.4-82.el8 | Doc Type: | If docs needed, set a value | ||||||||||
| Doc Text: | Story Points: | --- | |||||||||||
| Clone Of: | |||||||||||||
| : | 2056321 (view as bug list) | Environment: | |||||||||||
| Last Closed: | 2022-05-10 15:29:06 UTC | Type: | Bug | ||||||||||
| Regression: | --- | Mount Type: | --- | ||||||||||
| Documentation: | --- | CRM: | |||||||||||
| Verified Versions: | Category: | --- | |||||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
| Embargoed: | |||||||||||||
| Bug Depends On: | |||||||||||||
| Bug Blocks: | 2056321 | ||||||||||||
| Attachments: |
|
||||||||||||
(In reply to jas from comment #0) > In my /etc/auto.direct I have the following: > > /eecs/local -bg,proto=tcp,acl \ > / > nfsserv:/obj/local-intel64 \ > /share > nfsserv:/obj/local-share \ > /src > nfsserv:/obj/local-src > > When the system boots, it starts up a self-compiled httpd from a local > directory, such as - /usr/local/pkg/httpd/bin/httpd. However, one of the > Apache modules (mod_wsgi) refers to a python library on an NFS share - > /eecs/local/lib64. > > When the machine starts, the web server starts fine, and I can ssh to the > system and access /eecs/local, or /eecs/local/share, or /eecs/local/src. In > fact, just by logging in, my home directory (also on an NFS share) sources a > file from /eecs/local/share, so /eecs/local/share is mounted on login. An > lsof on /eecs/local shows that the web module is using /eecs/local/lib64. > > After logging out of the system, when I SSH to the system again later, I get > a message that /eecs/local/share/cshrc.common is not readable. I find that > /eecs is mounted, but I cannot access /eecs/local/src or /eecs/local/share. > > It turns out this is because autofs decided that /eecs is not in use - it > tried to unmount everything, and found that /eecs/local was busy ("device > busy") so it seems to have "given up" on it: > > Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device > is busy > Feb 7 16:12:24 toor8 automount[1256]: spawn_umount: umount failed with > error code 16, retrying with the -f option > Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device > is busy > Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device > is busy > Feb 7 16:12:24 toor8 automount[1256]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/share > Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/ > Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/src > > If I don't load the mod_wsgi module which uses /eecs/local, then everything > gets unmounted properly by autofs, and works. Is this a bug? One way or another, yes, I think so. > > I've also tried to compile the latest autofs from kernel.org, and this still > exhibits the same problem. Yes, despite the difference in versions, RHEL is quite close to upstream. There was a recent re-write of the offset handling, very much needed, and despite significant testing and a number of problems found I'm getting some bug reports. This however is not quite as simple as most because of a limitation of the kernel autofs ioctl related to identifying if a mount tree is busy. Because of that I it's difficult to get this right in user space. However, I think there is a bug in that when an error like this is encountered the mount tree should be restored and expire stopped. I'll investigate. > > If lsof knows that /eecs/local is in use, then I assume that autofs should > know as well. Yes, but believe it or not it isn't quite that simple from user space. Ian (In reply to Ian Kent from comment #1) > > > > If I don't load the mod_wsgi module which uses /eecs/local, then everything > > gets unmounted properly by autofs, and works. Is this a bug? > > One way or another, yes, I think so. I wonder how the module is keeping the mount busy ... I've tried to duplicate this behaviour but I don't seem to be able to. Granted I'm using an upstream release with a number of bug fixes so this might already be fixed but there may be something odd with the way the module is used that isn't accounted for. Probably the best thing to do will be to provide a RHEL build with most of the current upstream changes (to be included in the next RHEL release) and see if you can still reproduce the problem. I'll get back to you on that. Ian Created attachment 1859982 [details]
autofs rev 80 test build
(In reply to Ian Kent from comment #3) > Created attachment 1859982 [details] > autofs rev 80 test build Can you try this build and see if it improves matters please. Hi Ian. First off, thank you so much for investigating this so quickly. After days of being able to login to the host fine, I re-enabled the mod_wsgi module, and was able to replicate the problem immediately. I assume that you're setting up a similar mount on your system. Since mod_wsgi is the Python module, lsof is showing this output: COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME httpd 299692 root mem REG 0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) httpd 299693 www mem REG 0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) httpd 299694 www mem REG 0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) httpd 299695 www mem REG 0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) httpd 299696 www mem REG 0,57 22036176 69564543 /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) Would I have not tried the upstream changes by downloading the latest autofs from kernel.org, and compiling that? Here's the output from /var/log/messages again if this reveals anything else: Feb 9 12:31:20 toor8 automount[1281]: st_expire: state 1 path /- Feb 9 12:31:20 toor8 automount[1281]: expire_proc: exp_proc = 140113313212160 path /- Feb 9 12:31:20 toor8 automount[1281]: expire_proc_direct: send expire to trigger /eecs/local Feb 9 12:31:20 toor8 automount[1281]: handle_packet: type = 6 Feb 9 12:31:20 toor8 automount[1281]: handle_packet_expire_direct: token 160, name /eecs/local Feb 9 12:31:20 toor8 automount[1281]: expiring path /eecs/local Feb 9 12:31:20 toor8 automount[1281]: umount_multi: path /eecs/local incl 1 Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount offset /eecs/local/share Feb 9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/share Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount offset /eecs/local/src Feb 9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/src Feb 9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/ Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src Feb 9 12:31:20 toor8 automount[1281]: expired /eecs/local Feb 9 12:31:20 toor8 automount[1281]: dev_ioctl_send_ready: token = 160 Feb 9 12:31:20 toor8 automount[1281]: expire_cleanup: got thid 140113313212160 path /- stat 0 Feb 9 12:31:20 toor8 automount[1281]: expire_cleanup: sigchld: exp 140113313212160 finished, switching from 2 to 1 Feb 9 12:31:20 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /- Thank you again. (In reply to jas from comment #5) > Hi Ian. > First off, thank you so much for investigating this so quickly. np. > After days of being able to login to the host fine, I re-enabled the > mod_wsgi module, and was able to replicate the problem immediately. > I assume that you're setting up a similar mount on your system. > Since mod_wsgi is the Python module, lsof is showing this output: Only in so much as I'm simulating the mount being busy by changing directory to it in another process. That should be enough to simulate the business since the cd will cause a reference to be taken on the mount in the same way an open file will. > > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > httpd 299692 root mem REG 0,57 22036176 69564543 > /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) > httpd 299693 www mem REG 0,57 22036176 69564543 > /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) > httpd 299694 www mem REG 0,57 22036176 69564543 > /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) > httpd 299695 www mem REG 0,57 22036176 69564543 > /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) > httpd 299696 www mem REG 0,57 22036176 69564543 > /eecs/local/lib/libpython3.10.so.1.0 (nfsserv:/obj/local-intel64) This implies these are open, that should be enough for the containing mount to appear busy unless the process that's done the open is running within a different mount namespace. Is the application running in some sort of isolated environment? Something like systemd-nspawn I think could do this, but there are other ways it can be done. Certainly the systemd service PrivateTmp directive can have similar side effects in some cases too. This would probably trigger that kernel bug ... > > Would I have not tried the upstream changes by downloading the latest autofs > from kernel.org, and compiling that? It won't, this is very recent work and I'm not quite ready to commit it and push it to the kernel.org repository. > > Here's the output from /var/log/messages again if this reveals anything else: > > Feb 9 12:31:20 toor8 automount[1281]: st_expire: state 1 path /- > Feb 9 12:31:20 toor8 automount[1281]: expire_proc: exp_proc = > 140113313212160 path /- > Feb 9 12:31:20 toor8 automount[1281]: expire_proc_direct: send expire to > trigger /eecs/local > Feb 9 12:31:20 toor8 automount[1281]: handle_packet: type = 6 > Feb 9 12:31:20 toor8 automount[1281]: handle_packet_expire_direct: token > 160, name /eecs/local > Feb 9 12:31:20 toor8 automount[1281]: expiring path /eecs/local > Feb 9 12:31:20 toor8 automount[1281]: umount_multi: path /eecs/local incl 1 > Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount > offset /eecs/local/share > Feb 9 12:31:20 toor8 automount[1281]: umounted offset mount > /eecs/local/share > Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_umount_offset: umount > offset /eecs/local/src > Feb 9 12:31:20 toor8 automount[1281]: umounted offset mount /eecs/local/src > Feb 9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local > Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with > error code 16, retrying with the -f option > Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 9 12:31:20 toor8 automount[1281]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/share > Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/ > Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: > deleting offset key /eecs/local/src > Feb 9 12:31:20 toor8 automount[1281]: expired /eecs/local > Feb 9 12:31:20 toor8 automount[1281]: dev_ioctl_send_ready: token = 160 > Feb 9 12:31:20 toor8 automount[1281]: expire_cleanup: got thid > 140113313212160 path /- stat 0 > Feb 9 12:31:20 toor8 automount[1281]: expire_cleanup: sigchld: exp > 140113313212160 finished, switching from 2 to 1 > Feb 9 12:31:20 toor8 automount[1281]: st_ready: st_ready(): state = 2 path > /- Yes, it's definitely in use and it's not appearing that way to automount in the init mount namespace ... there's also the question of why I'm not handling this case properly since a umount can fail ... I will need to duplicate the behaviour somehow ... Ian I wonder how autofs-5.1.4-48.el8 behaves? If don't have access to this I can provide it. Hi Ian. While I'm not using any isolating environment per se, httpd.service file is definately using PrivateTmp=true. I didn't know this would impact autofs. This is the default behavior for httpd.service. Does this change anything? I don't see autofs-5.1.4-48.el8 I'm running 5.1.4-70.el8. If you can make 48 available I can try it. (In reply to jas from comment #8) > Hi Ian. > While I'm not using any isolating environment per se, httpd.service file is > definately using PrivateTmp=true. I didn't know this would impact autofs. > This is the default behavior for httpd.service. Does this change anything? Well, maybe, I have seen problems with this in the past but it usually only shows up after restarting the service that uses it. I believe it's because mounts are propagated to the new mount namespace (including autofs mounts) that should really be umounted as part of the namespace creation since they aren't needed and shouldn't be a part of the resulting mount namespace but TBH I never did get to the bottom of it. > > I don't see autofs-5.1.4-48.el8 I'm running 5.1.4-70.el8. If you can make > 48 available I can try it. I was putting together a lengthy response asking for a more detailed log until I realised that /eecs/local/lib is contained within the NFS mount at /eecs/local. I see what's happening, I do need to handle this by re-instating the offsets on that umount fail (a function call to do that has disappeared from the code at some point) and one of the function returns may be incorrect too. Maybe I will be able to reproduce it now, ;) Let me look at this a bit more before we try rev. 48. I'm pretty sure rev. 48 will handle this case correctly. Ian (In reply to Ian Kent from comment #9) > > I see what's happening, I do need to handle this by re-instating the offsets > on that umount fail (a function call to do that has disappeared from the code > at some point) and one of the function returns may be incorrect too. > > Maybe I will be able to reproduce it now, ;) > > Let me look at this a bit more before we try rev. 48. > I'm pretty sure rev. 48 will handle this case correctly. I still can't reproduce this. This: Feb 9 12:31:20 toor8 automount[1281]: unmounting dir = /eecs/local Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 9 12:31:20 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/ Feb 9 12:31:20 toor8 automount[1281]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src is wrong. The code called there hasn't been changed and AFAICS it isn't seeing a failure from the umount and it should be. Even if the ultimate return is wrong if it doesn't see a failure there's no chance the return will be correct anyway. Also, the call to resurrect those offsets on a fail is in fact still present so it's down to the nfs umount failure returning success when it has in fact failed. Maybe I have had a logic error there for a while, I'll need to go back over the older code changes to see what I was fixing there when I added the mounted check to guard against incorrect returns from the umount sub-process, maybe it was never quite enough. This is a bit odd! I would like to know if autofs rev 48 does in fact get this right though. I'll post a tar. Ian Created attachment 1860288 [details]
autofs-5.1.4-48.el8 packages
I don't think these packages are signed, sorry.
They come direct from the build system which I think is before
they go to QE and are tested and ultimately signed.
Hi Ian, I downloaded the package and did: # dnf localinstall autofs-5.1.4-48.el8.x86_64.rpm Last metadata expiration check: 3:16:31 ago on Thu 10 Feb 2022 05:03:07 AM EST. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Downgrading: autofs x86_64 1:5.1.4-48.el8 @commandline 782 k Transaction Summary ================================================================================ Downgrade 1 Package Total size: 782 k Is this ok [y/N]: y Downloading Packages: Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Running scriptlet: autofs-1:5.1.4-48.el8.x86_64 1/1 Downgrading : autofs-1:5.1.4-48.el8.x86_64 1/2 warning: /etc/sysconfig/autofs created as /etc/sysconfig/autofs.rpmnew Running scriptlet: autofs-1:5.1.4-48.el8.x86_64 1/2 Running scriptlet: autofs-1:5.1.4-74.el8.x86_64 2/2 Cleanup : autofs-1:5.1.4-74.el8.x86_64 2/2 Running scriptlet: autofs-1:5.1.4-74.el8.x86_64 2/2 Verifying : autofs-1:5.1.4-48.el8.x86_64 1/2 Verifying : autofs-1:5.1.4-74.el8.x86_64 2/2 Downgraded: autofs-1:5.1.4-48.el8.x86_64 I rebooted the system just in case. It's a test system anyway. Verify that it's running correct autofs.. # /usr/sbin/automount -V Linux automount version 5.1.4-48.el8 Directories: config dir: /etc/sysconfig maps dir: /etc modules dir: /usr/lib64/autofs Compile options: DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_SYSTEMD WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC ---- lsof shows module is in use from /eecs/local/lib.. I logged into the system with ssh. /eecs/local/share was mounted, /eecs/local/src was not mounted (because not required as part of login) ls /eecs/local/src mounts /eecs/local/src logged out of system for a bit.... waited for local-share and local-src to be unmounted... sshed to system again... IT WORKED. No "device is busy". But before declaring that -48 was the solution, I tried again... I logged out, got busy with things, and came back a half hour later, logged in, and problem is back... Feb 10 08:44:34 toor8 automount[1281]: st_expire: state 1 path /- Feb 10 08:44:34 toor8 automount[1281]: expire_proc: exp_proc = 139988105643776 path /- Feb 10 08:44:34 toor8 automount[1281]: expire_proc_direct: send expire to trigger /eecs/local Feb 10 08:44:34 toor8 automount[1281]: handle_packet: type = 6 Feb 10 08:44:34 toor8 automount[1281]: handle_packet_expire_direct: token 23, name /eecs/local Feb 10 08:44:34 toor8 automount[1281]: expiring path /eecs/local Feb 10 08:44:34 toor8 automount[1281]: umount_multi: path /eecs/local incl 1 Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset /eecs/local/share Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/share Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset /eecs/local/src Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/src Feb 10 08:44:34 toor8 automount[1281]: unmounting dir = /eecs/local Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 10 08:44:34 toor8 automount[1281]: spawn_umount: umount failed with error code 16, retrying with the -f option Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device is busy Feb 10 08:44:34 toor8 automount[1281]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/ Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/share Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting offset key /eecs/local/src Feb 10 08:44:34 toor8 automount[1281]: expired /eecs/local Feb 10 08:44:34 toor8 automount[1281]: dev_ioctl_send_ready: token = 23 Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: got thid 139988105643776 path /- stat 0 Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: sigchld: exp 139988105643776 finished, switching from 2 to 1 Feb 10 08:44:34 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /- Feb 10 08:44:48 toor8 automount[1281]: st_expire: state 1 path /eecs/research Feb 10 08:44:48 toor8 automount[1281]: expire_proc: exp_proc = 139988105643776 path /eecs/research Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: got thid 139988105643776 path /eecs/research stat 0 Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: sigchld: exp 139988105643776 finished, switching from 2 to 1 Feb 10 08:44:48 toor8 automount[1281]: st_ready: st_ready(): state = 2 path /eecs/research of course ls of /eecs/local works... but not share or src... Back to the drawing board... (In reply to jas from comment #12) > > lsof shows module is in use from /eecs/local/lib.. > > I logged into the system with ssh. /eecs/local/share was mounted, > /eecs/local/src was not mounted (because not required as part of login) > ls /eecs/local/src mounts /eecs/local/src > logged out of system for a bit.... waited for local-share and local-src to > be unmounted... > sshed to system again... > IT WORKED. No "device is busy". > But before declaring that -48 was the solution, I tried again... I logged > out, got busy with things, and came back a half hour later, logged in, and > problem is back... > > Feb 10 08:44:34 toor8 automount[1281]: st_expire: state 1 path /- > Feb 10 08:44:34 toor8 automount[1281]: expire_proc: exp_proc = > 139988105643776 path /- > Feb 10 08:44:34 toor8 automount[1281]: expire_proc_direct: send expire to > trigger /eecs/local > Feb 10 08:44:34 toor8 automount[1281]: handle_packet: type = 6 > Feb 10 08:44:34 toor8 automount[1281]: handle_packet_expire_direct: token > 23, name /eecs/local > Feb 10 08:44:34 toor8 automount[1281]: expiring path /eecs/local > Feb 10 08:44:34 toor8 automount[1281]: umount_multi: path /eecs/local incl 1 > Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset > /eecs/local/share > Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount > /eecs/local/share > Feb 10 08:44:34 toor8 automount[1281]: umount_multi_triggers: umount offset > /eecs/local/src > Feb 10 08:44:34 toor8 automount[1281]: umounted offset mount /eecs/local/src > Feb 10 08:44:34 toor8 automount[1281]: unmounting dir = /eecs/local > Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 10 08:44:34 toor8 automount[1281]: spawn_umount: umount failed with > error code 16, retrying with the -f option > Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 10 08:44:34 toor8 automount[1281]: >> umount.nfs4: /eecs/local: device > is busy > Feb 10 08:44:34 toor8 automount[1281]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting > offset key /eecs/local/ > Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting > offset key /eecs/local/share > Feb 10 08:44:34 toor8 automount[1281]: cache_delete_offset_list: deleting > offset key /eecs/local/src > Feb 10 08:44:34 toor8 automount[1281]: expired /eecs/local > Feb 10 08:44:34 toor8 automount[1281]: dev_ioctl_send_ready: token = 23 > Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: got thid > 139988105643776 path /- stat 0 > Feb 10 08:44:34 toor8 automount[1281]: expire_cleanup: sigchld: exp > 139988105643776 finished, switching from 2 to 1 > Feb 10 08:44:34 toor8 automount[1281]: st_ready: st_ready(): state = 2 path > /- > Feb 10 08:44:48 toor8 automount[1281]: st_expire: state 1 path /eecs/research > Feb 10 08:44:48 toor8 automount[1281]: expire_proc: exp_proc = > 139988105643776 path /eecs/research > Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: got thid > 139988105643776 path /eecs/research stat 0 > Feb 10 08:44:48 toor8 automount[1281]: expire_cleanup: sigchld: exp > 139988105643776 finished, switching from 2 to 1 > Feb 10 08:44:48 toor8 automount[1281]: st_ready: st_ready(): state = 2 path > /eecs/research > > of course ls of /eecs/local works... but not share or src... > > Back to the drawing board... Well, no, that's actually sensible based on what I said in comment #10. While I can't work out how the mount is being kept busy that umount fail being seen in a release that doesn't have a significant amount of change means it's a bug that's been there for quite a while. I just need to decide how to handle it which shouldn't be too hard. I'll get back to you with a test package when I've done that. Ian Created attachment 1860535 [details]
autofs rev 82 test build
Here is a test build to try and resolve the incorrect umount fail handling.
Please test it and let me know how it goes.
Created attachment 1860536 [details]
autofs rev 82 test build
Here is a test build to try and resolve the incorrect umount fail handling, please test it out.
Hi Ian. I installed rev 82. I believe this has fixed the problem. I have tried multiple times after the local-share and local-src mounts were unmounted, and it seems to be holding! I will continue to try over the weekend, waiting a much longer period after the unmount, and see if anything happens. # /usr/sbin/automount -V Linux automount version 5.1.4-82.el8 Directories: config dir: /etc/sysconfig maps dir: /etc modules dir: /usr/lib64/autofs Compile options: DISABLE_MOUNT_LOCKING ENABLE_IGNORE_BUSY_MOUNTS WITH_SYSTEMD WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC Still working well today... I think you've fixed it! :) (In reply to jas from comment #17) > Still working well today... I think you've fixed it! :) Sounds good then. I'll go ahead with the fix for RHEL-8.6. Thanks a bunch, Ian! Greatly appreciated... Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (autofs bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2084 |
In my /etc/auto.direct I have the following: /eecs/local -bg,proto=tcp,acl \ / nfsserv:/obj/local-intel64 \ /share nfsserv:/obj/local-share \ /src nfsserv:/obj/local-src When the system boots, it starts up a self-compiled httpd from a local directory, such as - /usr/local/pkg/httpd/bin/httpd. However, one of the Apache modules (mod_wsgi) refers to a python library on an NFS share - /eecs/local/lib64. When the machine starts, the web server starts fine, and I can ssh to the system and access /eecs/local, or /eecs/local/share, or /eecs/local/src. In fact, just by logging in, my home directory (also on an NFS share) sources a file from /eecs/local/share, so /eecs/local/share is mounted on login. An lsof on /eecs/local shows that the web module is using /eecs/local/lib64. After logging out of the system, when I SSH to the system again later, I get a message that /eecs/local/share/cshrc.common is not readable. I find that /eecs is mounted, but I cannot access /eecs/local/src or /eecs/local/share. It turns out this is because autofs decided that /eecs is not in use - it tried to unmount everything, and found that /eecs/local was busy ("device busy") so it seems to have "given up" on it: Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy Feb 7 16:12:24 toor8 automount[1256]: spawn_umount: umount failed with error code 16, retrying with the -f option Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy Feb 7 16:12:24 toor8 automount[1256]: >> umount.nfs4: /eecs/local: device is busy Feb 7 16:12:24 toor8 automount[1256]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/share Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/ Feb 7 16:12:24 toor8 automount[1256]: tree_mapent_delete_offset_tree: deleting offset key /eecs/local/src If I don't load the mod_wsgi module which uses /eecs/local, then everything gets unmounted properly by autofs, and works. Is this a bug? I've also tried to compile the latest autofs from kernel.org, and this still exhibits the same problem. If lsof knows that /eecs/local is in use, then I assume that autofs should know as well.