Description of problem: /net mounts work as expected for a short while then the mount fails to work and cannot be recovered without forcing an unmount and restarting autofs. Version-Release number of selected component (if applicable): Version : 5.0.6 Release : 19.fc17 How reproducible: Occuring on 10+ machines regularly Steps to Reproduce: 1. Mount a /net mount 2. wait 3. Actual results: The mount eventually disappears. Expected results: The mount should never disappear. Additional info: With debug on in autofs the log at the point of failure is: Jul 4 15:33:03 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home Jul 4 15:34:18 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home Jul 4 15:35:33 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home Jul 4 15:35:33 fyvie automount[868]: handle_packet_expire_direct: token 16, name /net/host.example.com/export/home Jul 4 15:35:33 fyvie automount[868]: expiring path /net/host.example.com/export/home Jul 4 15:35:33 fyvie automount[868]: umount_multi: path /net/host.example.com/export/home incl 1 Jul 4 15:35:33 fyvie automount[868]: umount_subtree_mounts: unmounting dir = /net/host.example.com/export/home Jul 4 15:35:34 fyvie automount[868]: expired /net/host.example.com/export/home Directory listing on the mount shows nothing and autofs logs nothing when trying this: $ ls /net/host.example.com/export/home Restarting autofs does not always work - force unmounting of /net/host.example.com/export/home is sometimes needed. Servers are running RHEL 6.3 and Centos 5.8 with NFSv4 and NFSv3 mounts, the failure is a random server/mount combination but once one export fails on a given server all exports fail.
Can you post a full debug log from autofs start until after the problem occurs please. Also a listing of /proc/mounts before the expire above and then after it may be helpful and also the export list from showmount -e <server name> for each server.
Btw, I'm not sure the log fragment above is showing all the debug logging, can you ensure systog is recording all facility daemon levels to the log. Something like: daemon.* /var/log/debug will be sure to get everthing.
Created attachment 596373 [details] showmount output
Created attachment 596374 [details] /proc/mounts at start
Created attachment 596375 [details] /proc/mounts at end
Created attachment 596376 [details] Debug log
(In reply to comment #5) > Created attachment 596375 [details] > /proc/mounts at end At this point, where the automount has stopped working, can you stop autofs and then, as root, successfully manually umount /net?
(In reply to comment #7) > (In reply to comment #5) > > Created attachment 596375 [details] > > /proc/mounts at end > > At this point, where the automount has stopped working, > can you stop autofs and then, as root, successfully > manually umount /net? I've only done this when restarting autofs doesn't work, and most of the time it seems to, but when I have done 'umount /net' has hung up and I've had to do 'umount -fl /net' instead. I'll double check this the next time I experience a failure though.
(In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #5) > > > Created attachment 596375 [details] > > > /proc/mounts at end > > > > At this point, where the automount has stopped working, > > can you stop autofs and then, as root, successfully > > manually umount /net? > > I've only done this when restarting autofs doesn't work, and most of the > time it seems to, but when I have done 'umount /net' has hung up and I've > had to do 'umount -fl /net' instead. I'll double check this the next time I > experience a failure though. Is there nothing in the log when that happens? If you can't umount /net, assuming there are no other mounts underneath it then it should return a fail not a hang. That's strange indeed.
There is nothing in the logs when it happens, I've not had the failure occur since yesterday so am still waiting to check.
It happened this morning and stopping autofs correctly unmounted /net I've attached the failure log from this morning.
Created attachment 597032 [details] Latest autofs failure log It appears that this problem may be more frequent shortly after booting a machine.
(In reply to comment #11) > It happened this morning and stopping autofs correctly unmounted /net > > I've attached the failure log from this morning. That's very good, that makes me think the in use issue is a temporary condition. I think I have a patch that might help with the error recovery. Let me have a look at what it will mean cherry picking that out of the series it belongs to. I'll need to check the log first too.
(In reply to comment #12) > Created attachment 597032 [details] > Latest autofs failure log > > It appears that this problem may be more frequent shortly after booting a > machine. It still worries me that the directory removal failed, it really shouldn't. That might be a symptom of another issue I've been trying to track down. It is true that the directory removal failure will prevent the /net mount from triggering again because /net is no longer an empty directory, so it needs those autofs trigger mounts to be put back when the directory removal fails. The problem is that the directory removal should never fail, something outside of autofs is messing with the mount point directories, and I can't work out what it is. At least, in this case, I should be able to add some recovery code.
After having this happen some more it certainly now seems to be happing within a short while of rebooting and once autofs is restarted it doesn't seem to fail again.
(In reply to comment #15) > After having this happen some more it certainly now seems to be happing > within a short while of rebooting and once autofs is restarted it doesn't > seem to fail again. Could you grab the Rawhide autofs srpm, build it and see if that helps with the problem. Some recent changes should have added the ability to recover from the failed directory removal. Even if it does help the failed directory removal should not happen so something else is probably going on. If you are able to test this rpm keep a track of what's happening in /proc/mounts too, please.
This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.