Bug 441794

Summary: intermittant mount failures
Product: Red Hat Enterprise Linux 4 Reporter: Andrew Ryan <aryan>
Component: kernelAssignee: Ian Kent <ikent>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: low Docs Contact:
Priority: low    
Version: 4.6CC: cward, ikent, jmoyer, tjb
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-18 19:36:23 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 460083    
Bug Blocks:    
Attachments:
Description Flags
autofs log w/ debug verbosity
none
/etc/sysconfig/autofs
none
trigger a crashdump when the key redhat13/data1 is encountered
none
updated patch to trigger a crash dump none

Description David Robinson 2008-04-10 06:45:16 UTC
Description of problem:
Since applying the latest autofs errata autofs mounts have been failing
intermittantly. The key passed to the lookup module consists of the server name
and directory instead of just the server name.

Version-Release number of selected component (if applicable):
autofs-4.1.3-231
kernel-smp-2.6.9-67.EL

How reproducible:
Intermittant

Steps to Reproduce:
Unknown.
  
Actual results:

When attempting to walk/enter an automount managed mountpoint the export isn't
mounted and the entry below appears in the logs. Note the key is redhat13/data1
whereas it should be just redhat13.

Feb 13 11:54:38 L76639 automount[6184]: handle_packet: type = 0 
Feb 13 11:54:38 L76639 automount[6184]: handle_packet_missing: token 4276, name
redhat13/data1 
Feb 13 11:54:38 L76639 automount[6184]: attempting to mount entry
/net/redhat13/data1
Feb 13 11:54:38 L76639 automount[7624]: lookup(program): looking up redhat13/data1
Feb 13 11:54:38 L76639 automount[6184]: mt->key set to redhat13/data1 
Feb 13 11:54:38 L76639 automount[7624]: >> /usr/sbin/showmount: can't get
address for redhat13/data1
Feb 13 11:54:38 L76639 automount[7624]: lookup(program): lookup for
redhat13/data1 failed
Feb 13 11:54:38 L76639 automount[7624]: failed to mount /net/redhat13/data1
Feb 13 11:54:38 L76639 automount[7624]: umount_multi: path=/net/redhat13/data1
incl=1 
Feb 13 11:54:38 L76639 automount[6184]: handle_child: got pid 7624, sig 0 (0),
stat 1
Feb 13 11:54:38 L76639 automount[6184]: sig_child: found pending iop pid 7624:
signalled 0 (sig 0), exit status 1
Feb 13 11:54:38 L76639 automount[6184]: send_fail: token=4276

Expected results:

The export is mounted.

Feb 13 11:53:55 L76639 automount[6184]: handle_packet: type = 0 
Feb 13 11:53:55 L76639 automount[6184]: handle_packet_missing: token 4275, name
redhat13 
Feb 13 11:53:55 L76639 automount[6184]: attempting to mount entry /net/redhat13
Feb 13 11:53:55 L76639 automount[7564]: lookup(program): looking up redhat13
Feb 13 11:53:55 L76639 automount[6184]: mt->key set to redhat13 
Feb 13 11:53:55 L76639 automount[7564]: lookup(program): redhat13 ->
-fstype=nfs,hard,intr,nodev,nosuid         /data1 redhat13:/data1         
/data2 redhat
13:/data2
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): expanded entry:
-fstype=nfs,hard,intr,nodev,nosuid          /data1 redhat13:/data1         
/data2 redhat
13:/data2
Feb 13 11:53:55 L76639 automount[7564]: parse(sun):
dequote("fstype=nfs,hard,intr,nodev,nosuid") -> fstype=nfs,hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): gathered options:
fstype=nfs,hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("/data1") -> /data1
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("redhat13:/data1   
") -> redhat13:/data1
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("/data2") -> /data2
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): dequote("redhat13:/data2")
-> redhat13:/data2
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): multimount: redhat13:/data2
on /data2 with options fstype=nfs,hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): mounting root /net/redhat13,
mountpoint data2, what redhat13:/data2, fstype nfs, options hard,intr,nodev,
nosuid 
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs):  root=/net/redhat13
name=data2 what=redhat13:/data2, fstype=nfs, options=hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): nfs
options="hard,intr,nodev,nosuid", nosymlink=0
Feb 13 11:53:55 L76639 automount[7564]: prune_host_list: added host redhat13,
cost 292 to list 
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mkdir_path
/net/redhat13/data2
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mount -t nfs -s -o
hard,intr,nodev,nosuid redhat13:/data2 /net/redhat13/data2
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): mounted redhat13:/data2 on
/net/redhat13/data2
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): multimount: redhat13:/data1
on /data1 with options fstype=nfs,hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: parse(sun): mounting root /net/redhat13,
mountpoint data1, what redhat13:/data1, fstype nfs, options hard,intr,nodev,
nosuid 
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs):  root=/net/redhat13
name=data1 what=redhat13:/data1, fstype=nfs, options=hard,intr,nodev,nosuid
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): nfs
options="hard,intr,nodev,nosuid", nosymlink=0
Feb 13 11:53:55 L76639 automount[7564]: prune_host_list: added host redhat13,
cost 359 to list 
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mkdir_path
/net/redhat13/data1
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): calling mount -t nfs -s -o
hard,intr,nodev,nosuid redhat13:/data1 /net/redhat13/data1
Feb 13 11:53:55 L76639 automount[7564]: mount(nfs): mounted redhat13:/data1 on
/net/redhat13/data1
Feb 13 11:53:55 L76639 automount[6184]: handle_child: got pid 7564, sig 0 (0),
stat 0
Feb 13 11:53:55 L76639 automount[6184]: sig_child: found pending iop pid 7564:
signalled 0 (sig 0), exit status 0

Additional info:

Attached are logs captured after the problem occurred. The mount points have an
timeout/expiry time of 60 seconds. Notice in the logs (at 11:53:55) that
redhat13 hadn't expired and therefore hasn't been unmounted before the lookup of
redhat13/data1.

$ grep looking 10-automount.problem.log 
Feb 13 11:42:42 L76639 automount[6488]: lookup(program): looking up redhat13
Feb 13 11:43:56 L76639 automount[6627]: lookup(program): looking up redhat13
Feb 13 11:43:56 L76639 automount[6636]: lookup(program): looking up biscuit
Feb 13 11:45:53 L76639 automount[6722]: lookup(program): looking up redhat13
Feb 13 11:48:41 L76639 automount[7071]: lookup(program): looking up redhat13
Feb 13 11:51:07 L76639 automount[7175]: lookup(program): looking up redhat13
Feb 13 11:53:55 L76639 automount[7564]: lookup(program): looking up redhat13
Feb 13 11:54:38 L76639 automount[7624]: lookup(program): looking up redhat13/data1
Feb 13 11:54:38 L76639 automount[7629]: lookup(program): looking up redhat13/data1
Feb 13 11:54:45 L76639 automount[7637]: lookup(program): looking up redhat13/data2
Feb 13 11:54:45 L76639 automount[7642]: lookup(program): looking up redhat13
Feb 13 11:54:51 L76639 automount[7657]: lookup(program): looking up biscuit
$ grep expired 10-automount.problem.log 
Feb 13 11:43:46 L76639 automount[6620]: expired /net/redhat13
Feb 13 11:45:01 L76639 automount[6688]: expired /net/biscuit
Feb 13 11:45:01 L76639 automount[6691]: expired /net/redhat13
Feb 13 11:47:02 L76639 automount[6774]: expired /net/redhat13
Feb 13 11:49:47 L76639 automount[7121]: expired /net/redhat13
Feb 13 11:52:17 L76639 automount[7506]: expired /net/redhat13

Comment 1 David Robinson 2008-04-10 06:45:16 UTC
Created attachment 301934 [details]
autofs log w/ debug verbosity

Comment 2 David Robinson 2008-04-10 06:46:24 UTC
Created attachment 301936 [details]
/etc/sysconfig/autofs

Comment 3 Ian Kent 2008-04-10 07:32:10 UTC
(In reply to comment #1)
> Created an attachment (id=301934) [edit]
> autofs log w/ debug verbosity
> 

As you point out, this bit is really wierd, since host redhat13
was mounted less than a minute before and hasn't been umounted.
This should not be able to happen!
The VFS should not call any of the file system callbacks when
something is mounted on the directory.

automount[6184]: handle_packet_missing: token 4276, name redhat13/data1 
automount[6184]: attempting to mount entry /net/redhat13/data1
automount[7624]: lookup(program): looking up redhat13/data1
automount[6184]: mt->key set to redhat13/data1 
automount[7624]: >> /usr/sbin/showmount: can't get address for redhat13/data1
automount[7624]: lookup(program): lookup for redhat13/data1 failed
automount[7624]: failed to mount /net/redhat13/data1
automount[7624]: umount_multi: path=/net/redhat13/data1 incl=1 
automount[6184]: handle_child: got pid 7624, sig 0 (0), stat 1
automount[6184]: sig_child: found pending iop pid 7624: signalled 0 (sig 0),
exit status 1
automount[6184]: send_fail: token=4276 



Comment 4 Jeff Moyer 2008-04-10 16:27:23 UTC
David,

I know this is a lot to ask, but if we are unable to reproduce this issue, we'll
need some more information to go on.  To that end, we'd like to get a crashdump
from your system when this happens.  If this is your workstation, I'm sure you
may not like that!  But, if you're willing to give it a try, I'll attach a new
auto.net script for you to use which will trigger a crash when this condition is
found.  Once you get a crash, you can then go back to the old init script.

Comment 5 Jeff Moyer 2008-04-10 16:28:40 UTC
Created attachment 302014 [details]
trigger a crashdump when the key redhat13/data1 is encountered

You can apply the patch like so:

cd /etc
patch -p1 < <path-to-patch-file>

You won't need to restart autofs.  You will need to make sure that your system
is configured to capture crashdumps, though.

Comment 6 Jeff Moyer 2008-04-10 16:35:15 UTC
Created attachment 302017 [details]
updated patch to trigger a crash dump

I should not have added the echo line, as it will be processed by automount.

Comment 7 Jeff Moyer 2008-04-10 16:36:23 UTC
(In reply to comment #5)
> Created an attachment (id=302014) [edit]
> trigger a crashdump when the key redhat13/data1 is encountered
> 
> You can apply the patch like so:
> 
> cd /etc
> patch -p1 < <path-to-patch-file>

Ooops!  That should be patch -p0.  Sorry for the confusion.


Comment 9 Jeff Moyer 2008-05-21 19:43:15 UTC
After inspecting the log again, this looks to be the same problem as in bug
447420.  Unfortunately, that's a private bugzilla, I think.  The basic idea is this:

a lookup occurs for a multimount entry (that is, an automount entry with
multiple offset mounts, all of which are mounted as a single unit).

Then, while the mounting is ongoing, another process performs a lookup of a
subdirectory that is not yet mounted.  Instead of blocking on the pending mount,
the process is allowed to walk down into the mount tree and we get a request as
you see for a path element under the top.

So, we're curently testing a fix for this.  When we have a patch prepared we
will post it here.

Thanks for your patience on this matter!

Comment 12 RHEL Program Management 2008-09-05 17:06:41 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 Ian Kent 2009-01-22 01:51:14 UTC
This bug should be fixed by the patch series committed in
bug 460083. Making this bug as MODIFIED as those patches
have been committed for RHEL-4.8.

Comment 16 Chris Ward 2009-05-05 13:56:10 UTC
Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later kernel?

Comment 17 Ian Kent 2009-05-05 14:05:32 UTC
(In reply to comment #16)
> Any updates here? Has this issue been resolved in the RHEL 4.8 Beta? later
> kernel?  

You should have the customer use the RHEL-4.8 kernel.

Comment 18 Chris Ward 2009-05-05 14:24:59 UTC
I guess at this point, it's too late. Though, i'll see what i can do. Thanks for the info.

Comment 19 Ian Kent 2009-05-05 14:41:11 UTC
(In reply to comment #18)
> I guess at this point, it's too late. Though, i'll see what i can do. Thanks
> for the info.  

It's worth asking them to test.

This looks like the only one we didn't get something out for testing.
The RHEL-4.8 kernel update resolved almost all of our reported autofs
kernel issues and this doesn't look like it's anything new.

Comment 21 errata-xmlrpc 2009-05-18 19:36:23 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1024.html