Bug 837649 - /net mount being unmounted and never mounted again
/net mount being unmounted and never mounted again
Status: CLOSED WONTFIX
Product: Fedora
Classification: Fedora
Component: autofs (Show other bugs)
17
x86_64 Linux
unspecified Severity high
: ---
: ---
Assigned To: Ian Kent
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-04 11:02 EDT by Martin Donnelly
Modified: 2013-08-01 13:08 EDT (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-08-01 13:08:07 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
showmount output (377 bytes, text/plain)
2012-07-05 06:55 EDT, Martin Donnelly
no flags Details
/proc/mounts at start (3.14 KB, text/plain)
2012-07-05 06:56 EDT, Martin Donnelly
no flags Details
/proc/mounts at end (2.60 KB, text/plain)
2012-07-05 06:56 EDT, Martin Donnelly
no flags Details
Debug log (1.57 MB, text/x-log)
2012-07-05 06:57 EDT, Martin Donnelly
no flags Details
Latest autofs failure log (28.44 KB, text/x-log)
2012-07-09 06:02 EDT, Martin Donnelly
no flags Details

  None (edit)
Description Martin Donnelly 2012-07-04 11:02:15 EDT
Description of problem:

/net mounts work as expected for a short while then the mount fails to work and cannot be recovered without forcing an unmount and restarting autofs.

Version-Release number of selected component (if applicable):

Version     : 5.0.6
Release     : 19.fc17

How reproducible:

Occuring on 10+ machines regularly


Steps to Reproduce:
1. Mount a /net mount
2. wait
3.
  
Actual results:

The mount eventually disappears. 

Expected results:

The mount should never disappear.

Additional info:

With debug on in autofs the log at the point of failure is:

Jul  4 15:33:03 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home
Jul  4 15:34:18 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home
Jul  4 15:35:33 fyvie automount[868]: expire_proc_indirect: expire /net/host.example.com/export/home
Jul  4 15:35:33 fyvie automount[868]: handle_packet_expire_direct: token 16, name /net/host.example.com/export/home
Jul  4 15:35:33 fyvie automount[868]: expiring path /net/host.example.com/export/home
Jul  4 15:35:33 fyvie automount[868]: umount_multi: path /net/host.example.com/export/home incl 1
Jul  4 15:35:33 fyvie automount[868]: umount_subtree_mounts: unmounting dir = /net/host.example.com/export/home
Jul  4 15:35:34 fyvie automount[868]: expired /net/host.example.com/export/home


Directory listing on the mount shows nothing and autofs logs nothing when trying this:

$ ls /net/host.example.com/export/home


Restarting autofs does not always work - force unmounting of /net/host.example.com/export/home is sometimes needed.

Servers are running RHEL 6.3 and Centos 5.8 with NFSv4 and NFSv3 mounts, the failure is a random server/mount combination but once one export fails on a given server all exports fail.
Comment 1 Ian Kent 2012-07-04 11:40:41 EDT
Can you post a full debug log from autofs start until after
the problem occurs please.

Also a listing of /proc/mounts before the expire above and
then after it may be helpful and also the export list from
showmount -e <server name> for each server.
Comment 2 Ian Kent 2012-07-04 11:45:51 EDT
Btw, I'm not sure the log fragment above is showing all the
debug logging, can you ensure systog is recording all facility
daemon levels to the log.

Something like:
daemon.*                       /var/log/debug

will be sure to get everthing.
Comment 3 Martin Donnelly 2012-07-05 06:55:37 EDT
Created attachment 596373 [details]
showmount output
Comment 4 Martin Donnelly 2012-07-05 06:56:27 EDT
Created attachment 596374 [details]
/proc/mounts at start
Comment 5 Martin Donnelly 2012-07-05 06:56:54 EDT
Created attachment 596375 [details]
/proc/mounts at end
Comment 6 Martin Donnelly 2012-07-05 06:57:46 EDT
Created attachment 596376 [details]
Debug log
Comment 7 Ian Kent 2012-07-05 10:12:12 EDT
(In reply to comment #5)
> Created attachment 596375 [details]
> /proc/mounts at end

At this point, where the automount has stopped working,
can you stop autofs and then, as root, successfully
manually umount /net?
Comment 8 Martin Donnelly 2012-07-05 11:17:36 EDT
(In reply to comment #7)
> (In reply to comment #5)
> > Created attachment 596375 [details]
> > /proc/mounts at end
> 
> At this point, where the automount has stopped working,
> can you stop autofs and then, as root, successfully
> manually umount /net?

I've only done this when restarting autofs doesn't work, and most of the time it seems to, but when I have done 'umount /net' has hung up and I've had to do 'umount -fl /net' instead. I'll double check this the next time I experience a failure though.
Comment 9 Ian Kent 2012-07-06 08:16:15 EDT
(In reply to comment #8)
> (In reply to comment #7)
> > (In reply to comment #5)
> > > Created attachment 596375 [details]
> > > /proc/mounts at end
> > 
> > At this point, where the automount has stopped working,
> > can you stop autofs and then, as root, successfully
> > manually umount /net?
> 
> I've only done this when restarting autofs doesn't work, and most of the
> time it seems to, but when I have done 'umount /net' has hung up and I've
> had to do 'umount -fl /net' instead. I'll double check this the next time I
> experience a failure though.

Is there nothing in the log when that happens?

If you can't umount /net, assuming there are no other
mounts underneath it then it should return a fail not a
hang. That's strange indeed.
Comment 10 Martin Donnelly 2012-07-06 08:57:35 EDT
There is nothing in the logs when it happens, I've not had the failure occur since yesterday so am still waiting to check.
Comment 11 Martin Donnelly 2012-07-09 06:01:29 EDT
It happened this morning and stopping autofs correctly unmounted /net

I've attached the failure log from this morning.
Comment 12 Martin Donnelly 2012-07-09 06:02:42 EDT
Created attachment 597032 [details]
Latest autofs failure log

It appears that this problem may be more frequent shortly after booting a machine.
Comment 13 Ian Kent 2012-07-09 08:49:59 EDT
(In reply to comment #11)
> It happened this morning and stopping autofs correctly unmounted /net
> 
> I've attached the failure log from this morning.

That's very good, that makes me think the in use issue is
a temporary condition. I think I have a patch that might
help with the error recovery. Let me have a look at what
it will mean cherry picking that out of the series it
belongs to.

I'll need to check the log first too.
Comment 14 Ian Kent 2012-07-09 09:00:30 EDT
(In reply to comment #12)
> Created attachment 597032 [details]
> Latest autofs failure log
> 
> It appears that this problem may be more frequent shortly after booting a
> machine.

It still worries me that the directory removal failed, it
really shouldn't. That might be a symptom of another issue
I've been trying to track down.

It is true that the directory removal failure will prevent
the /net mount from triggering again because /net is no
longer an empty directory, so it needs those autofs trigger
mounts to be put back when the directory removal fails.

The problem is that the directory removal should never fail,
something outside of autofs is messing with the mount point
directories, and I can't work out what it is.

At least, in this case, I should be able to add some recovery
code.
Comment 15 Martin Donnelly 2012-08-02 11:01:58 EDT
After having this happen some more it certainly now seems to be happing within a short while of rebooting and once autofs is restarted it doesn't seem to fail again.
Comment 16 Ian Kent 2012-08-02 11:51:33 EDT
(In reply to comment #15)
> After having this happen some more it certainly now seems to be happing
> within a short while of rebooting and once autofs is restarted it doesn't
> seem to fail again.

Could you grab the Rawhide autofs srpm, build it and see if
that helps with the problem. Some recent changes should have
added the ability to recover from the failed directory removal.
Even if it does help the failed directory removal should not
happen so something else is probably going on.

If you are able to test this rpm keep a track of what's
happening in /proc/mounts too, please.
Comment 17 Fedora End Of Life 2013-07-04 01:51:51 EDT
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.
Comment 18 Fedora End Of Life 2013-08-01 13:08:13 EDT
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Note You need to log in before you can comment on or make changes to this bug.