Bug 222872

Summary: auto-umount of network shares seems to have problems
Product: [Fedora] Fedora Reporter: Piergiorgio Sartor <piergiorgio.sartor>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED CURRENTRELEASE QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: 6CC: deerfieldtech, jmoyer
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.0.1-0.rc3.10 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-01-24 01:59:51 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Correction to mount status check in expire
none
autofs debug output
none
Correct check for busy offset mount prior to offset umount none

Description Piergiorgio Sartor 2007-01-16 17:48:50 UTC
Description of problem:
Since a couple of "autofs" updates, the timed-out un-mount of network shares
(using nfs) does not succeed properly, causing some sort of lock and successive
mount failures.

Specifically, after the 5 mins (300 secs) timeout, the following is reported in
/var/log/messages:

Jan 16 10:10:31 localhost automount[2112]: umount_autofs_offset: attempt to
umount busy offset /net/myserver/usr/local/cell

After another 5 mins (second try, I guess):

Jan 16 10:15:31 localhost automount[2112]: umount_autofs_offset: ioctl
failed: Inappropriate ioctl for device

After trying to access (and so automount) the network share (under /net),
with something like:

[root@localhost ~]# ls /net/myserver/usr/local/data
ls: /net/myserver/usr/local/data: No such file or directory

and /var/log/messages error:

Jan 16 11:10:34 localhost automount[2112]: umount_autofs_indirect: ask umount
returned busy /net

Version-Release number of selected component (if applicable):
5.0.1-0.rc3.2

How reproducible:
Always

Steps to Reproduce:
1.
Setup nfs network share and autofs
2.
Automount a network directory (with ls or cd)
3.
Wait-Wait-Wait... (timeout once, timeout twice...)
4.
Try to re-automount the network directory
  
Actual results:
An error occurs and the directory is not mounted

Expected results:
Everything should be auto-mounted, as usual.

Additional info:
Restarting automount with "service autofs restart" fix the things.
Of course, also setting the timeout to zero (no unmount), solves the issue.

This occurs under the very latest fedora kernel and under a vanilla 2.6.19.2,
the systems are all to the latest FC6 updates (15 Jan).

Comment 1 Ian Kent 2007-01-17 02:12:47 UTC
(In reply to comment #0)
> Description of problem:
> Since a couple of "autofs" updates, the timed-out un-mount of network shares
> (using nfs) does not succeed properly, causing some sort of lock and successive
> mount failures.

Can we have a debug log please.
You can find instructions to do this at
http://people.redhat.com/jmoyer/

Ian


Comment 2 Ian Kent 2007-01-17 02:53:06 UTC
(In reply to comment #1)
> (In reply to comment #0)
> > Description of problem:
> > Since a couple of "autofs" updates, the timed-out un-mount of network shares
> > (using nfs) does not succeed properly, causing some sort of lock and successive
> > mount failures.
> 
> Can we have a debug log please.
> You can find instructions to do this at
> http://people.redhat.com/jmoyer/

And could you post the output of
showmount -e "host name"
please.

Ian


Comment 3 Ian Kent 2007-01-17 04:57:14 UTC
(In reply to comment #0)
> Description of problem:
> Since a couple of "autofs" updates, the timed-out un-mount of network shares
> (using nfs) does not succeed properly, causing some sort of lock and successive
> mount failures.

When this happens can you also check that /etc/mtab
is consistent with /proc/mounts wrt what is really
mounted? Don't worry about the autofs mounts you see
just check the actual mounts you expect to be mounted.

Ian


Comment 4 Ian Kent 2007-01-17 12:40:54 UTC
Created attachment 145799 [details]
Correction to mount status check in expire

I have found what appears to be regression in a recent
change.

Could you try out this patch please.

Comment 5 Piergiorgio Sartor 2007-01-17 19:58:31 UTC
Well, wow, you flooded me of emails, maybe I should update my spam filter :-)
Jokes apart, here is something, but I do not think useful.
First of all, the reported line:

Jan 16 10:10:31 localhost automount[2112]: umount_autofs_offset: attempt to
umount busy offset /net/myserver/usr/local/cell

has the wrong mount, point, it should read /net/myserver/usr/local/data, that
was a wrong c&p from my side, in any case there is no wrong mount point...

About comment #1, I was able to generate the log, but, don't know why, with
"--debug" I was not able to reproduce the problem. The system worked fine, so
fine I was almost sure the issue went away after today updates (avahi namely,
which has nothing to do with automount, I guess).

About comment #2:

[root@localhost ~]# showmount -e myserver
Export list for myserver:
/usr/local/data      43.196.x.y

Hope you do not need the real IP addresses...

About comment #3, I checked /etc/mtab and /proc/mount under different conditions
and they seems to be always consistent, i.e. the network mountpoints are always
present (or absent) on both places.

About comment #4, if you provide somewhere (web site?) a test rpm package for
autofs with the patch built in, I'll be happy to test it. The plain C patch I
cannot test, sorry.

Comment 6 Ian Kent 2007-01-18 02:42:59 UTC
(In reply to comment #5)
> Well, wow, you flooded me of emails, maybe I should update my spam filter :-)
> Jokes apart, here is something, but I do not think useful.

Ha .. sorry about that.

> First of all, the reported line:
> 
> Jan 16 10:10:31 localhost automount[2112]: umount_autofs_offset: attempt to
> umount busy offset /net/myserver/usr/local/cell
> 
> has the wrong mount, point, it should read /net/myserver/usr/local/data, that
> was a wrong c&p from my side, in any case there is no wrong mount point...

But the message itself still indicates a problem.

> 
> About comment #1, I was able to generate the log, but, don't know why, with
> "--debug" I was not able to reproduce the problem. The system worked fine, so
> fine I was almost sure the issue went away after today updates (avahi namely,
> which has nothing to do with automount, I guess).

I get that a lot.
The debug logging often changes the timing so as to prevent
bugs from being triggered.

Indeed, the problem I discovered was a little hard to produce.
That's probably why I missed the regression in the patch that
introduced this (about 2 weeks ago).

In our test suite, with more that 50 mounts entries that could
be open to this bug I saw it only once or twice in two or three
runs.

> 
> About comment #2:
> 
> [root@localhost ~]# showmount -e myserver
> Export list for myserver:
> /usr/local/data      43.196.x.y
> 
> Hope you do not need the real IP addresses...

No just the form of the access of the export.
I assume this allows the system in question to mount the
export.

> 
> About comment #3, I checked /etc/mtab and /proc/mount under different conditions
> and they seems to be always consistent, i.e. the network mountpoints are always
> present (or absent) on both places.

Had to check.

> 
> About comment #4, if you provide somewhere (web site?) a test rpm package for
> autofs with the patch built in, I'll be happy to test it. The plain C patch I
> cannot test, sorry.

Update autofs-5.0.1-0.rc3.8 should be in Fedora updates testing now.
Please look for it and test it out.

Ian


Comment 7 Piergiorgio Sartor 2007-01-18 17:49:39 UTC
Created attachment 145934 [details]
autofs debug output

Comment 8 Piergiorgio Sartor 2007-01-18 17:52:42 UTC
OK, I updated to rc3.8 and I have some news, some bad and some (maybe) good.
First of all, this is the PC with the vanilla 2.6.19.2 kernel and this one
mounts two remote nfs shares: /usr/local/data and /usr/local/sequences.

The bad news is that, with rc3.8, the problem is still there.
The (maybe) good news is that I was able to capture the issue in debug mode.
Please see attachment around time 11:04, something is reported about the
impossibility to umount.

Last thing, I have the impression that, with this new rc3.8, one of the mount is
never unmount or, better, is unmount and then re-mounted.
I did not investigate, but this should not a system issue (i.e. nobody is
remounting the share, I suppose).

Hope this helps.

Comment 9 Ian Kent 2007-01-19 07:52:56 UTC
(In reply to comment #8)
> The bad news is that, with rc3.8, the problem is still there.
> The (maybe) good news is that I was able to capture the issue in debug mode.
> Please see attachment around time 11:04, something is reported about the
> impossibility to umount.

Yes, that log was really helpfull, thanks.

It got me out of my thinking about what I thought the
problem was and set me on track to look for the real
problem.

I was eventually able to duplicate the problem (at least it
looked like the problem) and identify the mistake. It was
introduced on about the 27th Dec in a patch to correct another
bug. I hope this is the only problem but we will see.

fyi, the patch is attached.
autofs-5.0.1-0.rc3.10 should appear in Fedora testing soon.
Please try that one out.

> 
> Last thing, I have the impression that, with this new rc3.8, one of the mount is
> never unmount or, better, is unmount and then re-mounted.
> I did not investigate, but this should not a system issue (i.e. nobody is
> remounting the share, I suppose).

That can happen with system utilities scanning the filesystem.
autofs should be able to handle that anyway.

Ian

Comment 10 Ian Kent 2007-01-19 07:54:55 UTC
Created attachment 145970 [details]
Correct check for busy offset mount prior to offset umount

Comment 11 Piergiorgio Sartor 2007-01-23 17:56:00 UTC
Hi, I was trying today the rc3.10 and I did not have any problem anymore.
IMHO you can close the bug, in case of further problems eventually we'll re-open it.
Thanks.

Comment 12 Ian Kent 2007-01-24 01:59:51 UTC
(In reply to comment #11)
> Hi, I was trying today the rc3.10 and I did not have any problem anymore.
> IMHO you can close the bug, in case of further problems eventually we'll
re-open it.
> Thanks.

Thanks.
That's good news.