193340 – /net mount does not work

Bug 193340 - /net mount does not work

Summary: /net mount does not work

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	autofs
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Ian Kent
QA Contact:	Brock Organ
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-05-27 17:46 UTC by Alexandre Oliva
Modified:	2007-11-30 22:11 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-06-17 18:07:54 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Alexandre Oliva 2006-05-27 17:46:40 UTC

Description of problem:
After upgrading to autofs-5.0.0_beta3-2, /net/<host>/<mountpoint> stopped
working.  /var/log/messages says:

May 26 17:37:11 free automount[2780]: lookup_mount: >> /usr/sbin/showmount: can'
t get address for /net/free/l/aoliva
May 26 17:37:11 free automount[2780]: lookup_mount: lookup(program): lookup for
/net/free/l/aoliva failed


Version-Release number of selected component (if applicable):
autofs-5.0.0_beta3-2

How reproducible:
Every time

Steps to Reproduce:
1.Upgrade to the latest autofs in rawhide
2.Try to use /net mount points

Actual results:
It fails even before attempting to mount filesystems, so access fails

Expected results:
Functional /net

Additional info:

Comment 1 Ian Kent 2006-05-29 06:04:31 UTC

(In reply to comment #0)
> Description of problem:
> After upgrading to autofs-5.0.0_beta3-2, /net/<host>/<mountpoint> stopped
> working.  /var/log/messages says:
> 
> May 26 17:37:11 free automount[2780]: lookup_mount: >> /usr/sbin/showmount: can'
> t get address for /net/free/l/aoliva
> May 26 17:37:11 free automount[2780]: lookup_mount: lookup(program): lookup for
> /net/free/l/aoliva failed

I'm not able to reproduce the error where lookup trys to get the
extended path instead of the host name component only, as you
describe here.

Can you give more information such as kernel version, action which
lead to the log entry included here.

Jeff Moyers page, http://people.redhat.com/jmoyer, gives an excellent
description of the information needed for us to be able to work out
what's going on.

Ian

Comment 2 Alexandre Oliva 2006-05-29 06:30:29 UTC

2.6.16-1.2224_FC6 triggers the problem, but I was running rawhide as of the bug
report day initially.  This is broken on both x86_64 and i686.

On what OS are you trying this?  I remember having got similar problems a while
ago, when coreutils tried to discontinue POSIX-conflicting options, and that was
quickly taken back because it broke a lot of stuff in addition to autofs.  I
think autofs was fixed then, but maybe the upgrade breaks it again?

Comment 3 Ian Kent 2006-05-29 06:52:58 UTC

(In reply to comment #2)
> 2.6.16-1.2224_FC6 triggers the problem, but I was running rawhide as of the bug
> report day initially.  This is broken on both x86_64 and i686.

OK.

> 
> On what OS are you trying this?  I remember having got similar problems a while
> ago, when coreutils tried to discontinue POSIX-conflicting options, and that was
> quickly taken back because it broke a lot of stuff in addition to autofs.  I
> think autofs was fixed then, but maybe the upgrade breaks it again?

I can't use the Rawhide kernels, my Radeon card doesn't work.
However, my FC5 kernel contains all my autofs development patches.

I've tried this with the the kernel module at the same patch level
as the kernel above and with a patch that's not yet present in the
Rawhide kernel. For an invalid host both show a request with a
single path component going to auto.net. That would be "free" in
the case above.

I should be able to reproduce it.
I need to know what actions you take to make it happen?

Ian

Comment 4 Ian Kent 2006-05-29 08:34:52 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > 2.6.16-1.2224_FC6 triggers the problem, but I was running rawhide as of the bug
> > report day initially.  This is broken on both x86_64 and i686.
> 
> OK.
> 
> > 
> > On what OS are you trying this?  I remember having got similar problems a while
> > ago, when coreutils tried to discontinue POSIX-conflicting options, and that was
> > quickly taken back because it broke a lot of stuff in addition to autofs.  I
> > think autofs was fixed then, but maybe the upgrade breaks it again?
> 
> I can't use the Rawhide kernels, my Radeon card doesn't work.
> However, my FC5 kernel contains all my autofs development patches.
> 
> I've tried this with the the kernel module at the same patch level
> as the kernel above and with a patch that's not yet present in the
> Rawhide kernel. For an invalid host both show a request with a
> single path component going to auto.net. That would be "free" in
> the case above.
> 
> I should be able to reproduce it.

Got it.

I'll sort this out.

Thanks.

Comment 5 Michal Jaegermann 2006-05-30 18:47:13 UTC

The latest update has the following, somewhat mysterious, log entry for
kernel-2.6.16-1.2230_FC6
------------------------
* Mon May 29 2006 Dave Jones <davej>
- 2.6.17rc5-git5
- autofs4: spoof negative dentries from mount fails on browseable
  indirect map mount points
....

I am afraid that it does not help at all with these failed lookups
whatever that was supposed to do.

Comment 6 Ian Kent 2006-05-31 04:27:50 UTC

(In reply to comment #5)
> The latest update has the following, somewhat mysterious, log entry for
> kernel-2.6.16-1.2230_FC6
> ------------------------
> * Mon May 29 2006 Dave Jones <davej>
> - 2.6.17rc5-git5
> - autofs4: spoof negative dentries from mount fails on browseable
>   indirect map mount points
> ....
> 
> I am afraid that it does not help at all with these failed lookups
> whatever that was supposed to do.

Yes. That is the patch I spoke about above (actually a completely
different one to do the same job).

I need it to be well tested so I've asked for it to be included
in the Rawhide kernel.

You're right about it not addressing the issue you are seeing.
I was able to duplicate the issue here and I'm fairly sure I've
found the cause of the problem. I pushed an update last night
so it should be available on the mirrors fairly soon.

Look for autofs-5.0.0_beta3-6 and you'll see a similarly mysterious
statement in the changelog.

Looking forward to hearing back.
Ian

Comment 7 Michal Jaegermann 2006-05-31 19:23:08 UTC

After the latest series of updates with autofs-5.0.0_beta3-6 and running
kernel 2.6.16-1.2232_FC6 so far I can mount all filesystems exported on
remotes.  Subtrees do not actually mount until explicitely referenced
which, I think, is actually nicer then a behaviour of older autofs versions.

With a line like

/net /etc/auto.net --timeout=60

in /etc/auto.master these mounts actually do get away after a while.

There is one thing which I cannot figure out.  With some remote mounts
via autofs if I will do, as root, 'umount -t nfs -a' then I will see
"device is busy" for every such filesystem.  Neither 'fuser' nor 'lsof'
want to reveal why this "busy" shows up.  OTOH these mounts will go
away; eventually.  Expected?  BTW - stopping hald does not change here
anything.

Comment 8 Michal Jaegermann 2006-05-31 20:56:26 UTC

I think that I have some clue why 'umount -t nfs -a' causes "device is busy".
To make this specific: server 'zeno' exports /, /home and /home/spare.
When trying to unmount zeno:/home then, regardless if zeno:/home/spare was
mounted or not, I get in logs for a process automount:

 umount_autofs_offset: couldn't get ioctl fd for offset /net/zeno/home/spare
 umount_offsets: failed to umount offset /net/zeno/home/spare
 umount_multi: could not umount some offsets under /net/zeno/home

and /net/zeno/home is busy which makes /net/zeno busy too.

Two worse troubles are that when I did look at /net/zeno/home/spare,
and all three filesystems got mounted, then 'umount -t nfs -a' did
unmount zeno:/home/spare but other two "got stuck" and were not going
away, even after a long wait, until I did 'service autofs restart'.
Without 'umount -t nfs -a' all three file systems got unmounted after
some timeout even if messages like the above were still logged.

The other trouble is that I found in logs:

 automount[2905]: segfault at 0000000000000080 rip
000055555556da6d rsp 0000000040433fd0 error 6

apparently when stopping the current version of automount with some
remote filesystems mounted, but I could not reproduce that in a few tries.

Comment 9 Ian Kent 2006-06-01 01:00:54 UTC

(In reply to comment #8)
> 
> The other trouble is that I found in logs:
> 
>  automount[2905]: segfault at 0000000000000080 rip
> 000055555556da6d rsp 0000000040433fd0 error 6
> 
> apparently when stopping the current version of automount with some
> remote filesystems mounted, but I could not reproduce that in a few tries.

Yes. I see this occasionally but can't reproduce it reliably.
autofs is configured to produce a core at the moment so a gbd
backtrace would be good. You need to install the debuginfo package
of course.

Ian

Comment 10 Ian Kent 2006-06-01 02:25:54 UTC

(In reply to comment #7)
> After the latest series of updates with autofs-5.0.0_beta3-6 and running
> kernel 2.6.16-1.2232_FC6 so far I can mount all filesystems exported on
> remotes.  Subtrees do not actually mount until explicitely referenced
> which, I think, is actually nicer then a behaviour of older autofs versions.

The "mount only on demand" is is meant to prevent resource exhaustion
for people with servers that have a large number of exports. Hopefully
it will help.

Ian

Comment 11 Ian Kent 2006-06-01 03:40:42 UTC

(In reply to comment #7)
> 
> With a line like
> 
> /net /etc/auto.net --timeout=60
> 
> in /etc/auto.master these mounts actually do get away after a while.
> 
> There is one thing which I cannot figure out.  With some remote mounts
> via autofs if I will do, as root, 'umount -t nfs -a' then I will see
> "device is busy" for every such filesystem.  Neither 'fuser' nor 'lsof'
> want to reveal why this "busy" shows up.  OTOH these mounts will go
> away; eventually.  Expected?  BTW - stopping hald does not change here
> anything.

If you look at /proc/mounts you will see that there are autofs 
"offset" mount triggers for the exports. The mount on demand works
by mounting triggers for each mount within a nesting point (point at
which we cross a filesystem) in the mount heirarchy so any mounts
above them will be busy. When one of these triggers causes a mount
a file handle is opened to perform operations such as expires. So
there's some consistency problem with this which I haven't yet seen.
I'll need more info (see below).

The other thing to consider is that nested mounts, as we see when
mounting exports from a host, will not expire all at once. They
expire from the deepest nesting level up. So mounts above will be
"busy" until mounts below are umounted. This can make it look like
they take longer than expected to go away.

Doing the "umount -t nfs -a" will undoubtedly have interesting
results. So I need some logging info. so I can duplicate what
your seeing.

I noticed the config option "DEFAULT_LOGGING" is not working
correctly so to get the logging you'll need to run autofs from
the command line using "automount -d" which should be essentially
the same as using the init script.

Ian

Comment 12 Michal Jaegermann 2006-06-01 19:58:57 UTC

In reply to comment #9.

So far I have one core from autofs-5.0.0_beta3-6 (and another from the
previous version).  With autofs-debuginfo loaded gdb gives the following
backtrace:

Core was generated by `automount'.
Program terminated with signal 11, Segmentation fault.
....
#0  0x000055555556da6d in master_mount_mounts () from /usr/sbin/automount
#1  0x000055555556e112 in master_read_master () from /usr/sbin/automount
#2  0x000055555555bb26 in main () from /usr/sbin/automount

but so far that is it.

When I am trying to cause segfault "on demand" then, of course,
I cannot do that. :-)

Comment 13 Ian Kent 2006-06-02 04:54:00 UTC

(In reply to comment #12)
> In reply to comment #9.
> 
> So far I have one core from autofs-5.0.0_beta3-6 (and another from the
> previous version).  With autofs-debuginfo loaded gdb gives the following
> backtrace:
> 
> Core was generated by `automount'.
> Program terminated with signal 11, Segmentation fault.
> ....
> #0  0x000055555556da6d in master_mount_mounts () from /usr/sbin/automount
> #1  0x000055555556e112 in master_read_master () from /usr/sbin/automount
> #2  0x000055555555bb26 in main () from /usr/sbin/automount
> 
> but so far that is it.

Aaagh .. there's no line numbers.
I must have something wrong in the spec file as well!

Ian

Comment 14 Ian Kent 2006-06-02 04:58:36 UTC

(In reply to comment #12)
> In reply to comment #9.
> 
> So far I have one core from autofs-5.0.0_beta3-6 (and another from the
> previous version).  With autofs-debuginfo loaded gdb gives the following
> backtrace:

btw what gdb command line are you using to get this?

gdb -c /core.nnnn /usr/sbin/automount

always gives line numbers for me?

Ian

Comment 15 Michal Jaegermann 2006-06-02 16:39:55 UTC

> btw what gdb command line are you using to get this?
The same as yours.  I.e. 'gdb -c /core.nnnn /usr/sbin/automount' and
after there 'bt' at (gdb) prompt.   I was also surprised that nothing
else showed up.  Maybe because core happened before autofs-debuginfo
was installed?  This does not seem likely.

Looking at disassembler output from gdb segfault seems to happen at
master.c:924 which is 'source->stale = 1;' assignment.  Somewhat weird.
This happens only sometimes so maybe 'source' gets garbled?

Comment 16 Ian Kent 2006-06-02 17:09:08 UTC

(In reply to comment #15)
> > btw what gdb command line are you using to get this?
> The same as yours.  I.e. 'gdb -c /core.nnnn /usr/sbin/automount' and
> after there 'bt' at (gdb) prompt.   I was also surprised that nothing
> else showed up.  Maybe because core happened before autofs-debuginfo
> was installed?  This does not seem likely.
> 
> Looking at disassembler output from gdb segfault seems to happen at
> master.c:924 which is 'source->stale = 1;' assignment.  Somewhat weird.
> This happens only sometimes so maybe 'source' gets garbled?

Has to be a race between the map lookup and re-reading the map.
That's where I'm going to be looking for the moment.

Thanks for this.

Ian

Comment 17 Michal Jaegermann 2006-06-07 18:20:06 UTC

About these segfaults discussed in comment #8 and subsequent ones -
see bug #193718.  Attempts to reload edited maps at least help to
cause that.

Comment 18 Ian Kent 2006-06-16 00:49:29 UTC

Hi Alexandre,

I believe these problems have been resolved in autofs-5.0.0_beta4-11.
Could you try this out and let me know please.

Ian

Comment 19 Alexandre Oliva 2006-06-17 18:07:54 UTC

/net mounts have long been fixed for me, indeed.  I didn't close the bug because
people were re-using it for other crashes that I've never experienced myself. 
If that's still unresolved, I suggest someone who sees it to file a new bug.

Note You need to log in before you can comment on or make changes to this bug.