Bug 161875 - autofs doesn't remount if nfs server is unreachable at expire time
Summary: autofs doesn't remount if nfs server is unreachable at expire time
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks: 168424
TreeView+ depends on / blocked
 
Reported: 2005-06-27 23:55 UTC by Greg Marsden
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-03-15 16:09:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
fix failed lookup when nfs server is back online (4.26 KB, patch)
2005-10-19 20:49 UTC, Jeff Moyer
no flags Details | Diff
Patch to fix autofs4 against RHEL3 U6 kernel (1.15 KB, patch)
2005-10-21 00:29 UTC, Greg Marsden
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 0 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 05:00:00 UTC

Description Greg Marsden 2005-06-27 23:55:46 UTC
Issue 68758 

We upgraded our environment to U4, after which we are seeing auto mounter
mount points failing. Also, some times if gives an effect of
directories not found.

Error Message:
==============

Jan 18 22:24:24 stajf16 automount[7647]: >> nfs server reported service
unavailable: Connection timed out
Jan 18 22:24:24 stajf16 automount[7647]: mount(nfs): nfs: mount failure
@ stlinma3.us.oracle.com:/vol/ade_linux on /ade_autofs/ade_linux
Jan 18 22:24:24 stajf16 automount[7647]: failed to mount
/ade_autofs/ade_linux

System Info:
============
bash-2.05# uname -a
Linux stajf16 2.4.21-27.ELsmp #1 SMP Wed Dec 1 21:59:02 EST 2004 i686 i686
i386 GNU/Linux

bash-2.05# rpm -qa | grep autofs
autofs-4.1.3-47

bash-2.05# cat /etc/redhat-release
Red Hat Enterprise Linux AS release 3 (Taroon Update 4)

Comment 1 Greg Marsden 2005-06-27 23:56:15 UTC
This test case invovled two Linux Red Hat 3.0 u4 machines.

Setup for the test case:
========================

1) Take two Linux machines <automountclient> and <nfsserver> (Both should
have Red Hat 3.0 u4 installed.)
@ 2) Login to <automountclient> as root.

  a) Make sure the autofs version is autofs-4.1.3-47.
  b) Make the following entry in the /etc/auto.master

    /test /etc/test_autofs tcp,retrans=5 --ghost --debug
 
  c) Make the following entry in the /etc/test_autofs

     autotest            
-ro,intr,timeo=600,actimeo=1200,rsize=32768,wsize=32768    \
                          <nfsserver>:/testdir

  d) cd /
  e) umount -a -t nfs;/sbin/service autofs stop
  f) /sbin/service autofs start

@ 3) Login to <nfsserver> as root.

  a) Make the following entry in the /etc/exports

     /testdir *(rw)

  b) mkdir /testdir
  c) mkdir -p /testdir/test1/test2
  d) /sbin/service nfs stop
  e) /sbin/service nfs start

How to reproduce the case?
==========================

@ 1) Open two connects, on each to <automountclient> and <nfsserver> and login
as root.
2) Let's assume "Windows 1"  is connected to <automountclient> and "Windows
2" connected to <nfsserver>.
3) On "Windows 1" do the following.

  ls -l /test/autotest/test1

  Output will show ls -l info of test2

4) Using mount command check if /test/autotest is auto umounted.
5) On "Window 2" do the following

  /sbin/service nfs stop

6) On "Window 1" do the following

 ls -l /test/autotest/test1

 Ouput will show "directory not found."

7) On "Window 2" do the following

  /sbin/service nfs start

8) On "Window 1" do the following.

  ls -l /test/autotest/test1

 Ouput will show "directory not found." (This is the case even after the nfs
server is up and running.)

Comment 2 Greg Marsden 2005-06-27 23:57:37 UTC
The testcase listed here only works if you do not umount the autofs mounted
partition by hand, but rather shut off the target nfsd and wait for autofs to
timeout the volume by itself. the failure seems to be in the cleanup after an
unsuccessful unmount.

Comment 3 Jeff Moyer 2005-06-28 14:11:49 UTC
I'll take a look.

Comment 5 Jeff Moyer 2005-07-05 18:06:44 UTC
I can't reproduce the problem with U5, so please test with U5.  If the problem
persists, then provide all of the information requested under the "Filing bug
reports" section of the following URL:

  http://people.redhat.com/jmoyer/

Thanks.

Comment 6 Greg Marsden 2005-07-05 18:25:53 UTC
bash-2.05# cat /etc/auto.master
------------------------------------------------------
# $Id: auto.master,v 1.2 1997/10/06 21:52:03 hpa Exp $
# Sample auto.master file
# Format of this file:
# mountpoint map options
# For details of the format look at autofs(8).
# /misc /etc/auto.misc  --timeout=60
#
# ST Specific mount maps
#

##########################################################################
#  WARNING: Each field must be separated by exactly ONE SPACE
#           and ONE SPACE ONLY or a restart/reload of autofs
#           will create multiple automount processes.
##########################################################################

### Master auto_master map - currently not used
###  +auto_master

### Home directory mappings
/home yp:auto_home_adc tcp,intr,timeo=600,rsize=8192,wsize=8192,retrans=5
--ghost

### Mapping for /net - work around
@ /net /etc/auto.net --ghost

### Does not apply (We use DNS - we do - really)
#/xfn -xfn

# Auto Direct mappings
/usr/local/redhat /etc/auto_redhat tcp,retrans=5 --timeout=0 --ghost
/usr/local/solaris /etc/auto_solaris tcp,retrans=5 --ghost
/usr/local/remote /etc/auto_remote tcp,retrans=5 --timeout=0 --ghost

### ADE - label map
/ade_autofs /etc/ade_autofs tcp,retrans=5 --ghost 

Comment 7 Greg Marsden 2005-07-05 18:26:22 UTC
Jan 18 19:49:33 stajf13 automount[23763]: >> nfs server reported service
unavailable: Connection timed out
Jan 18 19:49:33 stajf13 automount[23763]: mount(nfs): nfs: mount failure
stdmlina4:/vol/home1/aime on /home/aime
Jan 18 19:49:33 stajf13 automount[23763]: failed to mount /home/aime 

Comment 8 Greg Marsden 2005-07-05 18:28:30 UTC
Note that the mount point DOES recover if you cd to the directory and do an
'ls', however, the directory is reported as empty which breaks things like
symlinked executables until that ls command is issued...

Comment 10 Greg Marsden 2005-07-05 19:14:43 UTC
*** JAPATEL  06/20/05 05:35 pm ***
autofs 4.1.3-130 does not seem to have a fix for the autofs problem.
Still the problem persists in the above reproducible case. 

Comment 11 Jeff Moyer 2005-07-05 19:53:22 UTC
I've managed to reproduce on a U5 install.  To be clear, the sequence of events
necessary to trigger the bug is as follows:

1) cause mount point to be automounted
2) stop the nfs service on the server
3) wait until the mount point is expired
4) access *a path element within the automount point* (i.e. not the root)
   At this point, you will receive a No such file or directory error.
5) start the nfs service on the server
6) access a non-root directory or file within the automount point
   At this point, you will receive a No such file or directory error.

This appears to be a kernel bug, and only affects the browsable map case. 
Please file this through the proper support channels so this can be scheduled
for an update release.


Comment 16 Jeff Moyer 2005-07-06 18:44:00 UTC
autofs version 4.0.0pre10 does not support ghosting.  This problem only occurs
when ghosting is enabled.  As such, why is this issue holding up deployment of
U4?  If you disable ghosting (i.e. don't specify --ghost), then you get the same
behaviour you had before.

Comment 21 Greg Marsden 2005-08-16 01:14:44 UTC
--------------------------------------
After disabling the ghost option we donot
see the autofs problem.

What is missed if we disable ghosting?
-------------------------------------

To answer this question clearly, here is
an example.

1) The nfs server1 is sharing
   /sratch
  it contains the following file path.
  /scratch/test1/test1/myfile

2) The autofs client on server2 has the following
   entry in the /etc/auto.master
   /jay /etc/ade_jay tcp,retrans=5 --debug

   and /etc/ade_jay has the following entry.

   jay             -ro,intr,timeo=600,actimeo=1200,rsize=32768,wsize=32768
   \
                     server1:/scratch

3) cd /jay
   ls

   There no output to the above ls command

4) cd jay
   ls

   ls command does show output in this case.

We have a strong requirement the ls in 3) should show
the output. Hence the autofs problem should be fixed
with the ghosting option enable.


Comment 34 Jeff Moyer 2005-09-26 15:17:06 UTC
I posted this issue upstream, and got a response from the autofs maintainer. 
His solution would reintroduce a problem with ghosted direct maps.  I've pointed
this out to him, and am awaiting a response.

Comment 37 Jeff Moyer 2005-10-19 20:48:06 UTC
Ian Kent posted a patch while I was out of the country.  I tested his patch, and
it fixes the problem in my environment.  I want to spend some additional time
verifying that the patch does not introduce regressions.

I'll post the patch that Ian posted in this bugzilla.  It may or may not apply
cleanly to a RHEL 3 kernel tree.

Comment 38 Jeff Moyer 2005-10-19 20:49:16 UTC
Created attachment 120173 [details]
fix failed lookup when nfs server is back online

Comment 40 Greg Marsden 2005-10-21 00:29:46 UTC
Created attachment 120231 [details]
Patch to fix autofs4 against RHEL3 U6 kernel

This patch cleans up our autofs issues! Same as previous patch, but against
RHEL3u6.

Comment 41 Jeff Moyer 2005-10-21 12:11:18 UTC
Great.  Thanks for testing this.  This patch went into the rawhide kernel build
last night.  It's up to you, but I'd like to give it some soak time there before
releasing it has a hotfix.  The hope is that if it introduces any regressions,
we will hear about it in short order.  Is that acceptable for you guys?

Comment 44 Greg Marsden 2005-10-21 17:57:00 UTC
We're deployed the patch on a testing basis to some racks in the farm, will let
you know how it goes with wider deployment.

Comment 45 Jeff Moyer 2005-10-28 16:10:59 UTC
Hi, Greg,

Any news?  Does it work?  Does it fail in new and exciting ways?

Comment 46 Jeff Moyer 2005-10-31 03:27:18 UTC
The upstream maintainer found a regression introduced by this patch, though he
was scant on the details.  For now, I woud not deploy this in a production
environment.

Comment 47 Greg Marsden 2005-11-02 23:06:04 UTC
Let me know when you have more information about the regression. We're seeing no
problems with the patch deployed in our environment.

Comment 48 Ernie Petrides 2005-11-04 19:55:09 UTC
Hi, Greg.  Do you wish for this bugzilla to remain confidential to Oracle?

If not, please uncheck the "Oracle Confidential Group" box below to make the
bug public.  If so, just let me know so that I can add appropriate Red Hat
accessibility.  Thanks in advance.  -ernie


Comment 49 Jeff Moyer 2005-11-09 17:32:07 UTC
OK, Ian claims that the problem was specific to his environment (he was running
a patched autofs4 module).  So, I'll do some further testing, and will likely
propose this patch, as-is, for inclusion in the next update.

Comment 51 Greg Marsden 2005-11-11 18:03:40 UTC
Likewise here, we're not seeing any problems (and definitely looks fixed) with
the new patch. Let me know what update this will go into.

Comment 52 Jeff Moyer 2005-11-11 19:52:32 UTC
Greg,

I just looked through your backported patch to RHEL 3 U6, and it misses a hunk.
 Here is the hunk from the original patch that you are missing:

@@ -269,7 +269,7 @@ static struct dentry *autofs4_expire(str
 			goto next;
 		}
 
-		if ( simple_empty(dentry) )
+		if (simple_empty(dentry))
 			goto next;
 
 		/* Case 2: tree mount, expire iff entire tree is not busy */


All of my testing has been with a full patch.  I'm going to roll a kernel for
you to test.

-Jeff

Comment 53 Jeff Moyer 2005-11-11 19:57:01 UTC
Boy, can you tell it's Friday?  Disregard that last comment.  I don't think the
white-space change is critical.  =P

Comment 54 Ernie Petrides 2005-11-28 23:01:11 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.11.EL).


Comment 59 Ernie Petrides 2006-02-01 21:28:12 UTC
Jason Willeford, please create a new bugzilla for the problem you are
investigating and then relink IT 83498 to that one (and remove it from
this BZ 161875).

Thanks in advance.

Comment 61 Red Hat Bugzilla 2006-03-15 16:09:16 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html



Note You need to log in before you can comment on or make changes to this bug.