Bug 238534 - Fails to mount filesystems via /net
Summary: Fails to mount filesystems via /net
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: autofs
Version: 5.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Ian Kent
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On: 203277
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-05-01 08:14 UTC by Ian Kent
Modified: 2007-11-30 22:07 UTC (History)
2 users (show)

Fixed In Version: RHBA-2007-0621
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 17:30:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0621 0 normal SHIPPED_LIVE autofs bug fix update 2007-10-30 16:17:01 UTC

Description Ian Kent 2007-05-01 08:14:08 UTC
+++ This bug was initially created as a clone of Bug #203277 +++

Description of problem:
Filesystems that should be mounted under /net are often not accessible until
after autofs is restarted.  Then works OK for a while.

Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc1.15

How reproducible:
Consistently.

Steps to Reproduce:
1. Boot rawhide system
2. Wait a few minutes
3. Attempt to access NFS filesystem via /net
  
Actual results:
Access fails.

Expected results:
Filesystem automounted.

Additional info:

Looks a lot like BZ 20516 but that is closed, so starting a new one.

Example:
[root@ping0 ~]# ls /net/tabb1/share
ls: /net/tabb1/share: No such file or directory
You have new mail in /var/spool/mail/root
[root@ping0 ~]# ls /net/tabb1
home  share
[root@ping0 ~]# service autofs restart
Stopping automount:                                        [  OK  ]
Starting automount:                                        [  OK  ]
[root@ping0 ~]# ls /net/tabb1/share
Avast_tabb2.reg  CentOS  Download  Fedora  jeremy  Kubuntu  lost+found  Mandriva
 Music  prs  root  ssh  tabb1  tabb2  tabb3  vmware
[root@ping0 ~]# tail -20 /var/log/messages
Aug 20 07:03:46 ping0 syslogd 1.4.1: restart.
Aug 20 07:06:25 ping0 automount[4080]: umount_autofs_indirect: ask umount
returned busy /net
Aug 20 07:06:57 ping0 automount[30566]: lookup_read_master: lookup(nisplus):
couldn't locat nis+ table auto.master
Aug 20 07:06:57 ping0 kernel: SELinux: initialized (dev autofs, type autofs),
uses genfs_contexts
Aug 20 07:06:59 ping0 last message repeated 3 times
Aug 20 07:06:59 ping0 kernel: SELinux: initialized (dev 0:19, type nfs), uses
genfs_contexts
Aug 20 07:20:10 ping0 init: Trying to re-exec init
Aug 20 07:27:27 ping0 automount[4427]: lookup_read_master: lookup(nisplus):
couldn't locat nis+ table auto.master
Aug 20 07:27:27 ping0 kernel: SELinux: initialized (dev autofs, type autofs),
uses genfs_contexts
Aug 20 07:27:30 ping0 last message repeated 3 times
Aug 20 07:27:31 ping0 kernel: SELinux: initialized (dev 0:19, type nfs), uses
genfs_contexts

Another attempt to access a file after a few minutes fails again.  Restart
of autofs again fixes it temporarily.

-- Additional comment from Philip.R.Schaffner on 2006-08-20 07:30 EST --
Typo on the BZ reference.  Should have been 202516.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202516



-- Additional comment from ikent on 2006-08-20 09:44 EST --
(In reply to comment #1)
> Typo on the BZ reference.  Should have been 202516.
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202516
> 

No, I don't think this is the same as 202516, I'll investigate.

Ian



-- Additional comment from ikent on 2006-08-20 09:54 EST --
(In reply to comment #2)
> (In reply to comment #1)
> > Typo on the BZ reference.  Should have been 202516.
> > 
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=202516
> > 
> 
> No, I don't think this is the same as 202516, I'll investigate.
> 

I can't seem to duplicate this.
Can you post a "showmount -e" for the server please.

Ian


-- Additional comment from redhat-bugzilla-f on 2006-08-20 17:43 EST --
This happens to me as well: automounted NFS file systems become inaccessible
after a few minutes. Example, server behemoth exports the following filesystems:

$ showmount -e behemoth
/opt             192.168.1.0/255.255.255.0
/usr             192.168.0.0/255.255.0.0
/var             192.168.0.0/255.255.0.0
/mnt/ext2/4      192.168.0.0/255.255.0.0
/mnt/ext2/1      192.168.0.0/255.255.0.0
/mnt/iso9660/3   192.168.0.0/255.255.0.0
/mnt/iso9660/2   192.168.0.0/255.255.0.0
/mnt/iso9660/1   192.168.0.0/255.255.0.0
/var/share/media 192.168.0.0/255.255.0.0

$ ls /net/behemoth
mnt  opt  usr  var

$ ls /net/behemoth/var
ls: /net/behemoth/var: No such file or directory

$ s /etc/init.d/autofs restart
Stopping automount:                                        [  OK  ]
Starting automount:                                        [  OK  ]

$ ls /net/behemoth/var
account   fiction   local       net-snmp  scrollkeeper  state     www
arpwatch  gdm       lock        nis       share         tmp       yp
cache     home      log         opt       shm           tomcat4
db        kerberos  lost+found  preserve  spool         tpm
empty     lib       mail        run       ssl           ucd-snmp

This has been happening for at least the last 3 days (I yum upgrade just about
every day on this (test-)system)

-- Additional comment from ikent on 2006-08-20 23:05 EST --
Created an attachment (id=134547)
Prevent autofs4 follow_link method returning false negative


Could someone try this kernel patch and see if it resolves
the issue please.

Ian


-- Additional comment from Philip.R.Schaffner on 2006-08-22 07:25 EST --
For the record...

[root@tabb1 ~]# showmount -e
Export list for tabb1.tabb:
/home  192.168.1.255/24
/share 192.168.1.255/24

Workaround seems to be commenting out the last line of /etc/auto.master
"+auto.master" and/or getting rid of the nisplus entry for automount in
/etc/nsswitch.conf.


-- Additional comment from ikent on 2006-08-22 08:08 EST --
(In reply to comment #6)
> For the record...
> 
> [root@tabb1 ~]# showmount -e
> Export list for tabb1.tabb:
> /home  192.168.1.255/24
> /share 192.168.1.255/24
> 
> Workaround seems to be commenting out the last line of /etc/auto.master
> "+auto.master" and/or getting rid of the nisplus entry for automount in
> /etc/nsswitch.conf.
> 

Aaha.
An obvious problem with my nsswitch processing.
Or maybe that's the way nsswitch is supposed to work.
I'll review that bit of the code.

Thanks.
Ian


-- Additional comment from ikent on 2007-03-14 06:45 EST --
(In reply to comment #7)
> (In reply to comment #6)
> > For the record...
> > 
> > [root@tabb1 ~]# showmount -e
> > Export list for tabb1.tabb:
> > /home  192.168.1.255/24
> > /share 192.168.1.255/24
> > 
> > Workaround seems to be commenting out the last line of /etc/auto.master
> > "+auto.master" and/or getting rid of the nisplus entry for automount in
> > /etc/nsswitch.conf.
> > 
> 
> Aaha.
> An obvious problem with my nsswitch processing.
> Or maybe that's the way nsswitch is supposed to work.
> I'll review that bit of the code.

This bug seems to have fallen through the cracks, sorry.
I know a lot of work has been done in this area so
can you check if this is still a problem with the
current package please.

Ian


-- Additional comment from Philip.R.Schaffner on 2007-03-14 09:59 EST --
For EL5 Beta - had to make the following changes to /etc/auto.master to get
automount to work:

diff -u auto.master~ auto.master
--- auto.master~        2007-01-07 17:14:35.000000000 -0500
+++ auto.master 2007-03-14 04:56:49.000000000 -0400
@@ -7,7 +7,8 @@
 # For details of the format look at autofs(5).
 #
 /misc  /etc/auto.misc
-/net   -hosts
+#/net  -host
+/net   /etc/auto.net
 #
 # Include central master map if it can be found using
 # nsswitch sources.
@@ -17,4 +18,4 @@
 # same will not be seen as the first read key seen takes
 # precedence.
 #
-+auto.master
+#+auto.master


-- Additional comment from ikent on 2007-03-14 11:39 EST --
(In reply to comment #9)
> For EL5 Beta - had to make the following changes to /etc/auto.master to get
> automount to work:
> 
> diff -u auto.master~ auto.master
> --- auto.master~        2007-01-07 17:14:35.000000000 -0500
> +++ auto.master 2007-03-14 04:56:49.000000000 -0400
> @@ -7,7 +7,8 @@
>  # For details of the format look at autofs(5).
>  #
>  /misc  /etc/auto.misc
> -/net   -hosts
> +#/net  -host
> +/net   /etc/auto.net
>  #
>  # Include central master map if it can be found using
>  # nsswitch sources.
> @@ -17,4 +18,4 @@
>  # same will not be seen as the first read key seen takes
>  # precedence.
>  #
> -+auto.master
> +#+auto.master
> 

Are the NFS servers you have problems with Solaris
based?

Ian


-- Additional comment from Philip.R.Schaffner on 2007-03-14 13:07 EST --
No - CentOS 4.4

-- Additional comment from ikent on 2007-03-14 13:18 EST --
(In reply to comment #11)
> No - CentOS 4.4

Thanks.
I'll see if I can reproduce this.

Ian


-- Additional comment from ikent on 2007-03-14 13:25 EST --
(In reply to comment #12)
> (In reply to comment #11)
> > No - CentOS 4.4
> 
> Thanks.
> I'll see if I can reproduce this.

Sorry to bug you again but what is the revision of autofs
that you're using.

Ian


-- Additional comment from Philip.R.Schaffner on 2007-03-14 14:32 EST --
Should have said:

autofs-5.0.1-0.rc2.15


-- Additional comment from ikent on 2007-03-15 01:02 EST --
(In reply to comment #12)
> (In reply to comment #11)
> > No - CentOS 4.4
> 
> Thanks.
> I'll see if I can reproduce this.

I've tried to duplicate this without success.
I tested revision 0.rc2.15 and the current RHEL5 revision
0.rc2.43.0.2.

I don't have a CentOS server but I tried with Solaris9,
an old Debian server and an FC6 machine and they worked
OK.

I also tested using the network broadcast address in the
export instead of the network address, as you have in your
exports above.

So, we need more information to take this further.

You will need to update to the current RHEL5 release
revision and check that it is still a problem as quite
a few updates have been applied. We would need to change
the Product in this bug to RHEL5 also or log a new bug.

Ian


-- Additional comment from Philip.R.Schaffner on 2007-03-20 11:13 EST --
Don't have a RHEL5 release install to test; however, this is still a problem on
FC6 with all current updates.  With the default auto.master nfs directories fail
to mount.  With the patch shown above everything works fine.  Changed version to
fc6.


-- Additional comment from ikent on 2007-03-20 13:02 EST --
(In reply to comment #16)
> Don't have a RHEL5 release install to test; however, this is still a problem on
> FC6 with all current updates.  With the default auto.master nfs directories fail
> to mount.  With the patch shown above everything works fine.  Changed version to
> fc6.
> 

What revision of autofs?

-- Additional comment from Philip.R.Schaffner on 2007-03-20 13:29 EST --
autofs-5.0.1-0.rc3.26

-- Additional comment from ikent on 2007-03-20 21:59 EST --
(In reply to comment #18)
> autofs-5.0.1-0.rc3.26

Yes, that's the latest revision.

As I wasn't able to reproduce this could you provide a
debug log of this happening please. Information on how
to do this can be found at http://people.redhat.com/jmoyer.
Clearly there is some difference between how my test
environment and your system is setup which we need to work
out.

Also, is Selinux in enforcing mode?
If so could you disable it and try to reproduce the problem.

Ian


-- Additional comment from Philip.R.Schaffner on 2007-04-11 10:28 EST --
OK - here's a summary of recent tests.

1.  Install FC6.  Disable selinux during firstboot, configure networking for
DHCP, add local servers (including wx1) to /etc/hosts.

2.  Update to latest autofs.

3.  Attempt to use autofs to mount NFS share /home on server wx1:
[ggg@fc6 ~]$ ls /net/wx1/home
ls: /net/wx1/home: No such file or directory

4. Change one line in /etc/auto.master and restart autofs:
[root@fc6 etc]# diff auto.master.orig auto.master
10c10,11
< /net  -hosts
---
> #/net -hosts
> /net  /etc/auto.net

5. Try again:
[ggg@fc6 ~]$ ls /net/wx1/home
CentOS5beta  ggg     LARC    LiveCDtools  lost+found  prs  tsd
gewet        gustaf  LiveCD  LiveCD_v1    phil        rtn
[ggg@fc6 ~]$ 

This is the easiest problem to reproduce.  Will attach debug log.  Last entry
with failure before change to modified auto.master is:

Apr 11 10:20:52 fc6 automount[3590]: failed to mount /net/wx1



-- Additional comment from Philip.R.Schaffner on 2007-04-11 10:34 EST --
Created an attachment (id=152277)
/var/log/debug with and without /net problem

The attached debug log shows the failures with the out-of-the-box FC6 files,
followed by correct automount in /net after one-line change to auto.master. 
Only other change was enabling logging as requested.  Testing done in a VMware
WorkStation 5.5 VM.  Only update to original FC6 was autofs.  Will run all FC6
updates and repeat.


-- Additional comment from Philip.R.Schaffner on 2007-04-11 17:11 EST --
The problem with autofs consistently failing to mount via /net with default
auto.master file persists with all current FC6 updates installed -
autofs-5.0.1-0.rc3.26 kernel-2.6.20-1.2933.fc6

Have not yet reproduced the intermittent problem with the /net mounts
disappearing once they are active with the "+auto.master" entry present in
/etc/auto.master and logging enabled.  Will report again if I can capture that
behavior.

Should have noted - not using NIS. No changes to /etc/nsswitch.conf


-- Additional comment from ikent on 2007-04-12 00:59 EST --
(In reply to comment #21)

I must be missing something really simple, but what.

> Created an attachment (id=152277) [edit]
> /var/log/debug with and without /net problem
> 
> The attached debug log shows the failures with the out-of-the-box FC6 files,
> followed by correct automount in /net after one-line change to auto.master. 
> Only other change was enabling logging as requested.  Testing done in a VMware
> WorkStation 5.5 VM.  Only update to original FC6 was autofs.  Will run all FC6
> updates and repeat.
> 

Does the client machine match either of these network addresses?
Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 146.165.204.0 pmask 24
Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 198.119.136.0 pmask 24

Are these the entries you expect to see in the export list of
host wx1?

Ian


-- Additional comment from ikent on 2007-04-12 01:56 EST --
(In reply to comment #23)
> 
> Does the client machine match either of these network addresses?
> Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 146.165.204.0 pmask 24
> Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 198.119.136.0 pmask 24
> 
> Are these the entries you expect to see in the export list of
> host wx1?

And coupld you post the output of ifconfig for the matching
interface please.

Ian


-- Additional comment from Philip.R.Schaffner on 2007-04-12 10:19 EST --
> Does the client machine match either of these network addresses?
> Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 146.165.204.0 pmask 24
> Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 198.119.136.0 pmask 24

The client in this case is on a VMware NAT subnet, so it appears to hosts as
being on the 146.165.204.0 network.  

> Are these the entries you expect to see in the export list of
> host wx1?

Yes...[root@wx1 ~]# cat /etc/exports
/home 198.119.136.0/24(rw,no_root_squash,insecure,async)
146.165.204.0/24(rw,no_root_squash,insecure,async)

146.165.204.0 - Building Ethernet Subnet
198.119.136.0 - Wifi Subnet

> And coupld you post the output of ifconfig for the matching
> interface please.

On the Host OS:

[root@wx1 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:E0:81:2C:B7:56  
          inet addr:146.165.204.75  Bcast:146.165.204.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe2c:b756/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:193238408 errors:0 dropped:0 overruns:0 frame:0
          TX packets:364018867 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1433591889 (1.3 GiB)  TX bytes:2246610319 (2.0 GiB)
          Interrupt:193 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3227098 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3227098 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:1343272733 (1.2 GiB)  TX bytes:1343272733 (1.2 GiB)

vmnet1    Link encap:Ethernet  HWaddr 00:50:56:C0:00:01  
          inet addr:192.168.3.1  Bcast:192.168.3.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

vmnet8    Link encap:Ethernet  HWaddr 00:50:56:C0:00:08  
          inet addr:192.168.2.1  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:339 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

On the VMware Guest OS:

[root@fc6 ~]# ifconfig 
eth0      Link encap:Ethernet  HWaddr 00:0C:29:1B:21:FD  
          inet addr:192.168.2.108  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe1b:21fd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2230 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1907 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:920987 (899.4 KiB)  TX bytes:509890 (497.9 KiB)
          Interrupt:18 Base address:0x1424 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:91 errors:0 dropped:0 overruns:0 frame:0
          TX packets:91 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:24732 (24.1 KiB)  TX bytes:24732 (24.1 KiB)

NAT (although not necessarily VMware) does seem to be relevant as I can no
longer duplicate the problem on a physical FC6 machine on the same 146.165.204.0
network.  That one now works with the default auto.master file restored,
although it did not previously.



-- Additional comment from ikent on 2007-04-12 21:54 EST --
(In reply to comment #25)
> > Does the client machine match either of these network addresses?
> > Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 146.165.204.0 pmask 24
> > Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 198.119.136.0 pmask 24
> 
> The client in this case is on a VMware NAT subnet, so it appears to hosts as
> being on the 146.165.204.0 network.  

I can see the NAT being a problem for sure.
I expect I'll be able to reproduce this problem now.

This is the first valid reason I've had so far to drop the
exports access validation from the hosts module and just
deal with the mount fail instead. I'll need to do a fair
bit of testing before I actually do that though.

Ian


-- Additional comment from ikent on 2007-04-19 04:06 EST --
(In reply to comment #26)
> (In reply to comment #25)
> > > Does the client machine match either of these network addresses?
> > > Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 146.165.204.0
pmask 24
> > > Apr 11 10:20:49 fc6 automount[3590]: match_network: pcnet 198.119.136.0
pmask 24
> > 
> > The client in this case is on a VMware NAT subnet, so it appears to hosts as
> > being on the 146.165.204.0 network.  
> 
> I can see the NAT being a problem for sure.
> I expect I'll be able to reproduce this problem now.
> 
> This is the first valid reason I've had so far to drop the
> exports access validation from the hosts module and just
> deal with the mount fail instead. I'll need to do a fair
> bit of testing before I actually do that though.

I've removed the exports access control check from
autofs-5.0.1-0.rc3.29 which is in updates/testing.

Could you try this out and see if this update resolves
the problem your seeing please.

Ian

-- Additional comment from Philip.R.Schaffner on 2007-04-20 14:27 EST --
Updated to autofs-5.0.1-0.rc3.29 on a fully up to date FC6 system and could not
replicate the problem.  Seems to be fixed for FC6 by the test version.  The
problem still exists in CentOS5 with autofs-5.0.1-0.rc2.43.0.2 and thus very
likely in RHEL5.


-- Additional comment from ikent on 2007-04-23 04:28 EST --
(In reply to comment #28)
> Updated to autofs-5.0.1-0.rc3.29 on a fully up to date FC6 system and could not
> replicate the problem.  Seems to be fixed for FC6 by the test version.  The
> problem still exists in CentOS5 with autofs-5.0.1-0.rc2.43.0.2 and thus very
> likely in RHEL5.
> 

Yes, there's no doubt of that.
I'm going to actually remove the code used for the
checking (instead of just disabling it) and then
clone this bug so I can fix it in RHEL 5.1. That's
about all I can do for the moment.

Ian

Comment 1 RHEL Program Management 2007-05-01 08:24:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 errata-xmlrpc 2007-11-07 17:30:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0621.html



Note You need to log in before you can comment on or make changes to this bug.