Bug 170548

Summary: autofs mounts fail when using dhcp, ok with static ip
Product: [Fedora] Fedora Reporter: John Ellson <john.ellson>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Brock Organ <borgan>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: cfeist, steved, triage
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: bzcl34nup
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 20:14:48 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Attachments:
Description Flags
/etc/auto.master
none
/etc/auto.net
none
/var/log/debug
none
result of: "tethereal -i eth0 -w /tmp/capture host 192.168.0.6"
none
same test with static ip
none
output of: rpcinfo -p barrel none

Description John Ellson 2005-10-12 15:14:52 EDT
Description of problem:
In the last two weeks or so autofs has failed for me on the only machine I have
that uses dhcp.  Providing static IP information and changing BOOTPROTO=dhcp to
BOOTPROTO=none in /etc/sysconfig/network-scripts/ifcfg-eth0 and running "service
network restart" "service autofs restart" reliably fixes the problem. Just
changing back to BOOTPROTO=dhcp and restarting the services brings the problem back.

The NFS server is on the local network and its name resolution is via /etc/hosts.

Version-Release number of selected component (if applicable):
autofs-4.1.4-9
dhclient-3.0.3-7
bind-9.3.1-18
nfs-utils-1.0.7-18.FC5

How reproducible:
100%

Steps to Reproduce:
1. Change to BOOTPROTO=dhcp
2. service network restart
3. service autofs restart
4. cd /net/barrel/usr/export/....
  
Actual results:
cd attempt fails after 30 sec or so.  "No such file or directory"

/var/log/messages contains:
Oct 12 15:14:22 localhost automount[5952]: >> /usr/sbin/showmount: can't get
address for barrel/usr/export
Oct 12 15:14:22 localhost automount[5952]: lookup(program): lookup for
barrel/usr/export failed
Oct 12 15:14:22 localhost automount[5952]: failed to mount /net/barrel/usr/export


Expected results:
NFS automounts available on DHCP client.

Additional info:
Comment 1 Jeffrey Moyer 2005-10-17 13:21:11 EDT
The error message is familiar, but your reproducer is a bit strange!  I use
autofs extensively with dhcp clients...

Anyway, please follow the directions on filing bug reports found on my people page:

  http://people.redhat.com/jmoyer/

Perhaps with some more detailed log information we can figure out what is going on.

Thanks!
Comment 2 John Ellson 2005-10-17 13:49:10 EDT
Created attachment 120067 [details]
/etc/auto.master
Comment 3 John Ellson 2005-10-17 13:50:16 EDT
Created attachment 120068 [details]
/etc/auto.net
Comment 4 John Ellson 2005-10-17 13:51:15 EDT
Created attachment 120069 [details]
/var/log/debug
Comment 5 John Ellson 2005-10-17 13:54:51 EDT
autofs-4.1.4-9
kernel-2.6.13-1.1611_FC5
Comment 6 Jeffrey Moyer 2005-10-17 14:03:55 EDT
Oct 17 13:37:49 tux automount[20983]: >> mount: RPC: Timed out

I think that is the real problem.  It probably exists independently of the
automounter.  Have you tried simply nfs mounting the share by hand?  I'd be
interested to know if that works.

From the logs, the automounter is working as it should.  But, I'd be happy to
help in narrowing this down further, so we know who to bother next.  ;)

Let me know if you can do the mount manually with a dhcp address.

Thanks!
Comment 7 John Ellson 2005-10-17 14:16:31 EDT
No, I get:

root@tux:~# mount barrel:/usr/export /net/barrel/usr/export
mount: RPC: Timed out


Comment 8 John Ellson 2005-10-17 14:21:09 EDT
The other end (barrel) seems to authenticate the rpc request ok.

Oct 17 13:55:23 barrel rpc.mountd: authenticated mount request from 192.168.0.15
8:627 for /usr/export (/usr/export)
Comment 9 Jeffrey Moyer 2005-10-17 14:29:42 EDT
ok, stupid question: when you configure your IP address statically, do you use
the same address?  I.e. the one that dhcp gives you?

Comment 10 John Ellson 2005-10-17 14:32:17 EDT
No.   DHCP gives me 192.168.0.158
The static IP is: 192.168.0.10
Comment 11 Jeffrey Moyer 2005-10-17 14:40:43 EDT
OK.  Is it possible that there are firewalling rules in effect that would cause
problems?  Can the server do a reverse lookup on the dhcp address?  (I'm not
sure if the latter matters at all, but it certainly can be a source of delays).

Could you get a packet trace of the failed mount?  Something like the following
should do:

tethereal -w /tmp/data.pcp client server

Then we can hopefully figure out where in the chain things are going wrong.

Thanks!
Comment 12 John Ellson 2005-10-17 15:27:37 EDT
Not a firewall problem.  I stopped iptables at both ends.

No, reverse lookups won't work.  Neither machine is in DNS.
Using /etc/hosts 

I'll get back to you with a packet trace when I can work out how to 
make tethereal work.
Comment 13 Jeffrey Moyer 2005-10-17 15:37:29 EDT
Sorry, I typed above.  The tethereal line should be:

tethereal -w <capturefile> host <servername>

where capturefile is replaced by the file you want to use to store the packet
capture, and server name is the name of the nfs server.  This should be run on
the client, as root.

Comment 14 John Ellson 2005-10-17 15:56:38 EDT
OK. I'll attach the result of:
    tethereal -i eth0 -w /tmp/capture host 192.168.0.6

when I tried:
    root@tux:~# cd /net/barrel/usr/export
    -bash: cd: /net/barrel/usr/export: No such file or directory
Comment 15 John Ellson 2005-10-17 15:58:36 EDT
Created attachment 120080 [details]
result of: "tethereal -i eth0 -w /tmp/capture host 192.168.0.6"
Comment 16 John Ellson 2005-10-17 16:15:30 EDT
Created attachment 120083 [details]
same test with static ip
Comment 17 John Ellson 2005-10-17 16:28:24 EDT
In the failed transaction, message #65 is a RST.
In the successful transaction, the corresponding message #67 is an ACK.
Comment 18 Steve Dickson 2005-10-17 16:46:14 EDT
The ethereal trace in Comment #15 (capture) shows the protocol being used
for the mount is TCP but the server does not seem to support that
protocol (Note: Packet 26 and its replay Packet 28).

The ethereal trace in Comment #16 (capture2) shows the protocol being
used is UDP and since the server support UDP, thats why the mount
is working...

So to prove this theory, I would like to add the '-o udp' to
the mount command in Comment #7 so it would be:

mount -o udp barrel:/usr/export /net/barrel/usr/export

also please post the output of 'rpcinfo -p <server>' which
will show what the server supports and what it does not.

Question: Is this a Linux server?
Comment 19 John Ellson 2005-10-17 17:04:42 EDT
Yes, only udp is supported.  

Using the working static IP configuration, "mount -o udp ...." succeeds and
"mount -o tcp ..." fails

I'll attach the output of 'rpcinfo -p <server>' next.

The server is Redhat 9:
    kernel-smp-2.4.20-31.9
    nfs-utils-1.0.1-3.9

So why doesn't udp work if the client is started with dhcp ?
Comment 20 John Ellson 2005-10-17 17:07:58 EDT
Created attachment 120086 [details]
output of: rpcinfo -p barrel
Comment 21 Jeffrey Moyer 2005-10-18 12:58:19 EDT
One more question: what version of util-linux are you running?
Comment 22 John Ellson 2005-10-18 13:02:24 EDT
On the client (tux):    util-linux-2.13-0.5.pre4
On the server (barrel): util-linux-2.11y-9
Comment 23 Jeffrey Moyer 2005-10-21 11:06:59 EDT
steve, did you see this last update?  util-linux version (on the client) is
2.13-0.5pre4.  Do you know which patches are in that one?
Comment 24 Jeffrey Moyer 2006-01-09 18:10:01 EST
Steve, I'm reassigning this to you so it doesn't get lost.
Comment 25 Steve Dickson 2007-03-09 12:37:42 EST
Is this still happening with a more recent nfs-utils rpm?
A verion that has the mounting code included?
Comment 26 Bug Zapper 2008-04-03 12:29:47 EDT
Based on the date this bug was created, it appears to have been reported
against rawhide during the development of a Fedora release that is no
longer maintained. In order to refocus our efforts as a project we are
flagging all of the open bugs for releases which are no longer
maintained. If this bug remains in NEEDINFO thirty (30) days from now,
we will automatically close it.

If you can reproduce this bug in a maintained Fedora version (7, 8, or
rawhide), please change this bug to the respective version and change
the status to ASSIGNED. (If you're unable to change the bug's version
or status, add a comment to the bug and someone will change it for you.)

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.
Comment 27 Bug Zapper 2008-05-06 20:14:46 EDT
This bug has been in NEEDINFO for more than 30 days since feedback was
first requested. As a result we are closing it.

If you can reproduce this bug in the future against a maintained Fedora
version please feel free to reopen it against that version.

The process we're following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp