Bug 868584

Summary: mount.nfs: Cannot allocate memory -- F17 <--> F17 on some but not all mounts
Product: [Fedora] Fedora Reporter: Wendell Baker <wendellcraigbaker>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: bfields, jlayton, sta040, steved
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-28 15:08:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
on frequented: nfs between frequented and drippie, experiment #1; edited; compressed text
none
on frequented: nfs between frequented and drippie, experiment #1; edited; compressed text
none
on frequented: nfs between frequented and drippie, experiment #2; edited; compressed text
none
on drippie: nfs between frequented and drippie, experiment #2; edited; compressed text
none
on drippie: nfs between frequented and drippie, experiment #1; edited; compressed text
none
experiment #1, raw pcap
none
experiment #2, raw pcap
none
experiment #1; raw pcap (client side)
none
experiment #2; raw pcap (client side)
none
trials across nfs4, nfs3, nfs2
none
for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done (after the experiment)
none
testing and repair; restarting nfs; and other output
none
(after restart nfs-server.service) sudo strace -p$(pidof rpc.mountd) -s4096 -eopen,read,write -oo.on_drippie_strace_rpc_mountd_open_read_write.out
none
(after restart nfs-server.service) for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done >& o.on_drippie_after_restart_nfs_proc_net_rpc.out none

Description Wendell Baker 2012-10-21 00:09:42 UTC
Description of problem:

nfs4 mount fails with

mount.nfs: Cannot allocate memory

Yet on the same server on a different HDD (but same type hardware), nfsv4 works to this same client.
Yet on the same server, on the boot hdd, nfsv4 works to this same client.
This should rule out quirky issues with iptables, ip6tables, exports, misconfiguration of nfs, etc.)

NFS "works" ... just not on that one mount point.

This is, at least, an F17<-->F17(server) issue.
I also exhibit F14<-->F17(server) fails similarly, which is unfortunate, but illustrative.

In this scenario

nfs4 fails (exhibited)
nfs3 fails (sketched)
nfs2 succeeds ... the workaround is to force nfsv2 which will use IPv4

Version-Release number of selected component (if applicable):

client(eggplants)
$ cat /etc/fedora-release ; uname -a
Fedora release 17 (Beefy Miracle)
Linux eggplants 3.6.1-1.fc17.i686.PAE #1 SMP Wed Oct 10 12:32:58 UTC 2012 i686 i686 i386 GNU/Linux
$ rpm -q -a | grep nfs | sort
libnfsidmap-0.25-3.fc17.i686
nfs-utils-1.2.6-3.fc17.i686



server(drippie)
$ cat /etc/fedora-release ; uname -a
Fedora release 17 (Beefy Miracle)
Linux drippie 3.6.1-1.fc17.i686.PAE #1 SMP Wed Oct 10 12:32:58 UTC 2012 i686 i686 i386 GNU/Linux
$ rpm -q -a | grep nfs | sort
libnfsidmap-0.25-3.fc17.i686
nfs-utils-1.2.6-3.fc17.i686


How reproducible:

very (can't make it go away)

Steps to Reproduce:
1. on server(drippie)

systemctl start nfs-server.service
exportfs -va

$ cat /etc/exports
/opt   *.example.com(rw,no_all_squash,no_root_squash)
/local *.example.com(rw,no_all_squash,no_root_squash)
/mnt/bisquick      *.example.com(rw,no_all_squash,no_root_squash)
/mnt/clarification *.example.com(rw,no_all_squash,no_root_squash)

2. on client(eggplants)
for i in /mnt/bisquick /mnt/clarification /local /opt ; do b=$(basename $i) ; echo $i ; mkdir -p /tmp/drippie/$b ; sudo mount -t nfs drippie:$i /tmp/drippie/$b ; done
(see error message for drippie:/mnt/bisquick


/etc/idmapd.conf is configured appropriately for "the old days"
Domain is set appropriately.
- F14 clients, F13 server; NFSv3
- this pair "used to work"

On the server(drippie)
$ systemctl is-active nfs-idmap.service
active
$ systemctl is-active rpcidmapd.service
unknown

On the server(drippie)
$ find /lib/systemd /etc/systemd -name '*idmap*'
/lib/systemd/system/nfs-idmap.service
/lib/systemd/system/rpcidmapd.service
/etc/systemd/system/nfs.target.wants/nfs-idmap.service

  
Actual results:

on the client (eggplants)
$ for i in /mnt/bisquick /mnt/clarification /local /opt ; do b=$(basename $i) ; echo $i ; mkdir -p /tmp/drippie/$b ; sudo mount -t nfs drippie:$i /tmp/drippie/$b ; done
/mnt/bisquick
mount.nfs: Cannot allocate memory <----------- fail on /mnt/bisquick
/mnt/clarification <-------------------------- success on /mnt/clarification
/local <---------------- success
/opt <------------------ success

(retry manually)
$ sudo mount.nfs -v  drippie:/mnt/bisquick /tmp/drippie/bisquick
mount.nfs: timeout set for Sat Oct 20 16:28:46 2012
mount.nfs: trying text-based options 'vers=4,addr=clientaddr=fd12:3456:789a:0:212:34ff:fe56:789a,local_lock=none,addr=fd12:3456:789a:0:298:76ff:fe54:3210'
mount.nfs: mount(2): Cannot allocate memory
mount.nfs: Cannot allocate memory

(yet /tmp/drippie/clarification functions just fine as an nfs mount point)

on client (eggplants)
$ df /tmp/drippie/clarification/.
Filesystem                   1K-blocks       Used  Available Use% Mounted on
drippie:/mnt/clarification/ 5768225280 3498904448 1976311808  64% /tmp/drippie/clarification

on client (eggplants) ... lotsa verbiage but IPv6 and all mounts "took" except for /mnt/bisquick
$ grep drippie /proc/mounts
$ grep drippie /proc/mounts | sed -f edit_away_ip_addresses.sed
drippie:/mnt/clarification/ /tmp/drippie/clarification nfs4 rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp6,port=0,timeo=600,retrans=2,sec=sys,clientaddr=fd12:3456:789a:0:212:34ff:fe56:789a,local_lock=none,addr=fd12:3456:789a:0:298:76ff:fe54:3210 0 0
drippie:/local/ /tmp/drippie/local nfs4 rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp6,port=0,timeo=600,retrans=2,sec=sys,clientaddr=fd12:3456:789a:0:212:34ff:fe56:789a,local_lock=none,addr=fd12:3456:789a:0:298:76ff:fe54:3210 0 0
drippie:/opt/ /tmp/drippie/opt nfs4 rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp6,port=0,timeo=600,retrans=2,sec=sys,clientaddr=fd12:3456:789a:0:212:34ff:fe56:789a,local_lock=none,addr=fd12:3456:789a:0:298:76ff:fe54:3210 0 0



Expected results:

the drives /mnt/bisquick and /mnt/clarification are the same hardware.

/mnt/bisquick "works"
/mnt/clarification "works"

Additional info:


Found this https://lkml.org/lkml/2012/6/12/629
From	"Myklebust, Trond" <>
Subject	Re: mount.nfs: cannot allocate memory
Date	Tue, 12 Jun 2012 23:28:08 +0000

Doesn't seem relevant or entirely helpful.
I'm on kernel 3.6.1-1.fc17.i686.PAE



For context:

The server (bisquick)
$ df /opt /local /mnt/bisquick /mnt/clarification
Filesystem                                             1K-blocks       Used  Available Use% Mounted on
/dev/mapper/vg_drippie-lv_root                          51606140    3829148   45155552   8% /
/dev/mapper/vg_drippie-lv_local                        370016256   54265508  296955024  16% /local
/dev/mapper/luks-4c8d49ad-a06f-49be-9e65-789b41721af8 5768225196 4667780304  807435900  86% /mnt/bisquick
/dev/mapper/luks-547d2d1a-2896-4fa5-bd05-c26597e7eff0 5768225196 3498706304 1976509900  64% /mnt/clarification

The server (bisquick)
$ ls -ld /mnt/bisquick
drwxrwxrwt. 15 root root 4096 Jul 21 10:12 /mnt/bisquick

The server (bisquick)
$ ls -ld /mnt/clarification
drwxrwsrwt. 5 wbaker source 4096 Apr 17  2012 /mnt/clarification


For other context this same situation occurs on client(F14) <--> server(F17)
I understand that F14 isn't supported any more; but it implicates the server, not the client since the failure is common across the two scenarios.

client(baggie, F14) <--> server(drippie, F17)
$ for i in /mnt/bisquick /mnt/clarification /local /opt ; do b=$(basename $i) ; echo $i ; mkdir -p /tmp/drippie/$b ; sudo mount -t nfs drippie:$i /tmp/drippie/$b ; done
/mnt/bisquick
mount.nfs: Connection timed out
/mnt/clarification
/local
/opt

client(baggie, F14) <--> server(drippie, F17)
$ sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -v -v
mount.nfs: timeout set for Sat Oct 20 17:02:35 2012
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=192.168.0.95,clientaddr=192.168.0.44'
mount.nfs: mount(2): Stale NFS file handle

Comment 1 Steve Dickson 2012-10-22 10:24:17 UTC
(In reply to comment #0)
> (retry manually)
> $ sudo mount.nfs -v  drippie:/mnt/bisquick /tmp/drippie/bisquick
> mount.nfs: timeout set for Sat Oct 20 16:28:46 2012
> mount.nfs: trying text-based options
> 'vers=4,addr=clientaddr=fd12:3456:789a:0:212:34ff:fe56:789a,local_lock=none,
> addr=fd12:3456:789a:0:298:76ff:fe54:3210'
> mount.nfs: mount(2): Cannot allocate memory
This means the actual mount(2) system call is failing. Could you
please turn on kernel debugging in the mounting code by do
    rpcdebug -m nfs -s mount
Then please post the contains of /var/log/message after the failure. 

Also a network trace could be handy as well. On the client:
    tshark -o /tmp/data.pcap host <server-name>
    bzip2 /tmp/data.pcap

Comment 2 J. Bruce Fields 2012-10-22 12:07:17 UTC
Also: is there anything different about the filesystem at /mnt/bisquick?

(Could you just paste the output of "mount" run on the server?)

Comment 3 Wendell Baker 2012-10-22 18:40:18 UTC
Both devices (clarification & bisquick) are the same.  Mounted the same.

(last two lines)

[wbaker:wbaker@drippie (F17 Beefy_Miracle, nfs server)
$ mount
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=1019204k,nr_inodes=214592,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
tmpfs on /run type tmpfs (rw,nosuid,nodev,seclabel,mode=755)
/dev/mapper/vg_drippie-lv_root on / type ext4 (rw,relatime,seclabel,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=31,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime,seclabel)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,seclabel)
tmpfs on /media type tmpfs (rw,nosuid,nodev,noexec,relatime,seclabel,mode=755)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
sunrpc on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/sda1 on /boot type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/vg_drippie-lv_home on /home type ext4 (rw,relatime,seclabel,data=ordered)
/dev/mapper/vg_drippie-lv_local on /local type ext4 (rw,relatime,seclabel,data=ordered)
/etc/auto/direct on /srv type autofs (rw,relatime,fd=6,pgrp=929,timeout=300,minproto=5,maxproto=5,direct)
/etc/auto/direct on /work type autofs (rw,relatime,fd=6,pgrp=929,timeout=300,minproto=5,maxproto=5,direct)
/etc/auto/backup on /backup type autofs (rw,relatime,fd=11,pgrp=929,timeout=300,minproto=5,maxproto=5,indirect)
/etc/auto/borg on /borg type autofs (rw,relatime,fd=17,pgrp=929,timeout=300,minproto=5,maxproto=5,indirect)
/etc/auto/net on /net type autofs (rw,relatime,fd=23,pgrp=929,timeout=60,minproto=5,maxproto=5,indirect)
/etc/auto/dlna on /dlna type autofs (rw,relatime,fd=29,pgrp=929,timeout=300,minproto=5,maxproto=5,indirect)
/etc/auto/vol on /vol type autofs (rw,relatime,fd=35,pgrp=929,timeout=300,minproto=5,maxproto=5,indirect)
/dev/mapper/luks-4c8d49ad-a06f-49be-9e65-789b41721af8 on /mnt/bisquick type ext4 (rw,nosuid,nodev,relatime,seclabel,data=ordered)
/dev/mapper/luks-547d2d1a-2896-4fa5-bd05-c26597e7eff0 on /mnt/clarification type ext4 (rw,nosuid,nodev,relatime,seclabel,data=ordered)



They are both happy as local filesystems on drippie

$ ls -al /mnt/bisquick
total 76
drwxrwxrwt. 15 root   root    4096 Jul 21 10:12 .
drwxr-xr-x.  4 root   root    4096 Oct 14 17:37 ..
drwxrwsr-t.  5 wbaker source  4096 Jan 18  2012 archive
drwxrwsr-t. 11 wbaker source  4096 Oct 22 10:00 areas
drwxrwxr-x.  4 wbaker bakers  4096 Feb  6  2012 deprecated
drwxrwsr-x.  4 wbaker bakers  4096 Dec 23  2011 devices
drwxrwsr-t. 75 wbaker bakers  4096 Feb  5  2012 disks
drwxrwsr-t.  7 wbaker source  4096 Dec 20  2011 foreign
drwxrwxr-x.  4 wbaker wbaker  4096 Jul 21 10:06 j
drwxrwsr-t.  9 wbaker source  4096 Feb  4  2012 local
drwx------.  2 root   root   16384 Jan 15  2012 lost+found
-rw-rw-r--.  1 wbaker wbaker   177 Jul 21 10:12 PROVENANCE
drwxrwsr-x.  2 wbaker source  4096 Jul 21 10:08 self
drwx------.  4 wbaker wbaker  4096 Jan 29  2011 .Trash-500
drwxrwxr-x.  6 wbaker bakers  4096 Jan  6  2012 users
drwxrwsr-x. 17 wbaker source  4096 Oct 15 10:09 whole

$ ls -al /mnt/clarification
total 36
drwxrwsrwt. 5 wbaker source  4096 Apr 17  2012 .
drwxr-xr-x. 4 root   root    4096 Oct 14 17:37 ..
drwxrwsr-x. 2 wbaker source  4096 Jan  6  2012 logs
drwx------. 2 root   root   16384 Dec 17  2011 lost+found
-rwxrwxr-x. 1 wbaker source  1684 Apr 17  2012 sync
drwxrwsrwt. 9 wbaker source  4096 Mar  9  2012 volumes


working on the tcpdump exhibition.

Comment 4 Wendell Baker 2012-10-22 19:13:39 UTC
eggplants is busy doing stuff now, so I had to use another F17 host for the client.

eggplants is on a more modern kernel
$ uname -a
Linux eggplants 3.6.1-1.fc17.i686.PAE #1 SMP Wed Oct 10 12:32:58 UTC 2012 i686 i686 i386 GNU/Linux

frequented exhibits the issue (similar to F14 oddly enough)



frequented is the client; 

$ uname -a; cat /etc/fedora-release ; rpm -q -a | grep nfs | sort
Linux frequented 3.4.6-2.fc17.i686 #1 SMP Thu Jul 19 22:15:33 UTC 2012 i686 i686 i386 GNU/Linux
Fedora release 17 (Beefy Miracle)
libnfsidmap-0.25-3.fc17.i686
nfs-utils-1.2.6-3.fc17.i686

pertaining to the nearby pcap txt files

Experiment #1

$ for i in /mnt/bisquick /mnt/clarification /local /opt ; do b=$(basename $i) ; echo $i ; mkdir -p /tmp/drippie/$b ; sudo mount -t nfs drippie:$i /tmp/drippie/$b ; done
/mnt/bisquick
mount.nfs: Connection timed out
/mnt/clarification
/local
/opt


Experiment #2

$ sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -v -v
mount.nfs: timeout set for Mon Oct 22 11:49:03 2012
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: trying text-based options 'vers=4,addr=fd12:3456:789a:0:298:76ff:fe54:3210,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
mount.nfs: mount(2): Stale NFS file handle
mount.nfs: Connection timed out


Surrounding both experiments on client & server ...

on frequented for exp in "exp1" "exp2"
sudo tcpdump -i em1 -w on_frequented_nfs_between_frequented_and_drippie_$exp.pcap -e 'host drippie || host frequented'

on drippie for exp in "exp1" "exp2"
sudo tcpdump -i em1 -w on_drippie_nfs_between_frequented_and_drippie_$exp.pcap -e 'host drippie || host frequented'






for i in *.pcap ; do tcpdump -r $i >& o.$i.txt; done
for i in *.txt ; do grep -Ee '(frequented.*drippie|drippie.*frequented)' $i > $i.only; done
perl -i -p -e "s/$(dnsdomainname)/example.com/g" *.only
for i in *.only ; do bzip2 < $i > $i.bz2; done


*.bz2 submitted nearby

Comment 5 Wendell Baker 2012-10-22 19:14:50 UTC
Created attachment 631699 [details]
on frequented: nfs between frequented and drippie, experiment #1; edited; compressed text

Comment 6 Wendell Baker 2012-10-22 19:15:21 UTC
Created attachment 631700 [details]
on frequented: nfs between frequented and drippie, experiment #1; edited; compressed text

Comment 7 Wendell Baker 2012-10-22 19:15:52 UTC
Created attachment 631701 [details]
on frequented: nfs between frequented and drippie, experiment #2; edited; compressed text

Comment 8 Wendell Baker 2012-10-22 19:16:27 UTC
Created attachment 631702 [details]
on drippie: nfs between frequented and drippie, experiment #2; edited; compressed text

Comment 9 Wendell Baker 2012-10-22 19:16:57 UTC
Created attachment 631703 [details]
on drippie: nfs between frequented and drippie, experiment #1; edited; compressed text

Comment 10 Wendell Baker 2012-10-22 19:27:48 UTC
exhibition of rpcdebug from the client

wbaker:wbaker@frequented
$ sudo rpcdebug -m nfs -s mount
nfs        mount

$ sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -v -v
<output previously exhibited>

$ sudo rpcdebug -m nfs -c mount
nfs       <no flags set>

$ tail /var/log/messages
Oct 22 12:18:09 frequented kernel: [7341782.538068] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:09 frequented kernel: [7341782.538089] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:09 frequented kernel: [7341782.538102] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:09 frequented kernel: [7341782.538119] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:09 frequented kernel: [7341782.538132] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:09 frequented kernel: [7341782.538136] --> nfs4_try_mount()
Oct 22 12:18:09 frequented kernel: [7341782.569053] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:09 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:10 frequented kernel: [7341783.570130] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:10 frequented kernel: [7341783.570161] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:10 frequented kernel: [7341783.570182] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:10 frequented kernel: [7341783.570208] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:10 frequented kernel: [7341783.570229] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:10 frequented kernel: [7341783.570236] --> nfs4_try_mount()
Oct 22 12:18:10 frequented kernel: [7341783.584068] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:12 frequented kernel: [7341785.584985] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:12 frequented kernel: [7341785.585057] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:12 frequented kernel: [7341785.585077] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:12 frequented kernel: [7341785.585103] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:12 frequented kernel: [7341785.585124] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:12 frequented kernel: [7341785.585130] --> nfs4_try_mount()
Oct 22 12:18:12 frequented kernel: [7341785.603054] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:12 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:16 frequented kernel: [7341789.603964] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:16 frequented kernel: [7341789.603997] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:16 frequented kernel: [7341789.604063] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:16 frequented kernel: [7341789.604089] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:16 frequented kernel: [7341789.604110] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:16 frequented kernel: [7341789.604117] --> nfs4_try_mount()
Oct 22 12:18:16 frequented kernel: [7341789.621067] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:16 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:24 frequented kernel: [7341797.621881] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:24 frequented kernel: [7341797.621914] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:24 frequented kernel: [7341797.621935] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:24 frequented kernel: [7341797.621961] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:24 frequented kernel: [7341797.621982] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:24 frequented kernel: [7341797.621988] --> nfs4_try_mount()
Oct 22 12:18:24 frequented kernel: [7341797.643090] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:24 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:27 frequented named[11830]: client 74.53.87.138#52069 (d.8.3.0.7.6.0.0.0.7.4.0.1.0.0.2.ip6.arpa): view edge-clients: query: d.8.3.0.7.6.0.0.0.7.4.0.1.0.0.2.ip6.arpa IN SOA -E (173.228.29.167)
Oct 22 12:18:34 frequented kernel: [7341807.644068] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:34 frequented kernel: [7341807.644101] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:34 frequented kernel: [7341807.644121] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:34 frequented kernel: [7341807.644148] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:34 frequented kernel: [7341807.644169] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:34 frequented kernel: [7341807.644175] --> nfs4_try_mount()
Oct 22 12:18:34 frequented kernel: [7341807.662073] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:34 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:44 frequented kernel: [7341817.663162] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:44 frequented kernel: [7341817.663196] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:44 frequented kernel: [7341817.663216] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:44 frequented kernel: [7341817.663242] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:44 frequented kernel: [7341817.663264] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:44 frequented kernel: [7341817.663270] --> nfs4_try_mount()
Oct 22 12:18:44 frequented kernel: [7341817.683088] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:44 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:18:54 frequented kernel: [7341827.684163] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:54 frequented kernel: [7341827.684193] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:18:54 frequented kernel: [7341827.684213] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:18:54 frequented kernel: [7341827.684239] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:18:54 frequented kernel: [7341827.684260] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:18:54 frequented kernel: [7341827.684267] --> nfs4_try_mount()
Oct 22 12:18:54 frequented kernel: [7341827.704088] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:18:54 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:04 frequented kernel: [7341837.705307] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:04 frequented kernel: [7341837.705339] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:04 frequented kernel: [7341837.705360] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:04 frequented kernel: [7341837.705386] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:04 frequented kernel: [7341837.705407] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:04 frequented kernel: [7341837.705413] --> nfs4_try_mount()
Oct 22 12:19:04 frequented kernel: [7341837.723045] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:04 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:14 frequented kernel: [7341847.222609] NFS: nfs mount opts='sloppy,vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:14 frequented kernel: [7341847.222632] NFS:   parsing nfs mount option 'sloppy'
Oct 22 12:19:14 frequented kernel: [7341847.222638] NFS:   relaxing parsing rules
Oct 22 12:19:14 frequented kernel: [7341847.222641] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:14 frequented kernel: [7341847.222653] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:14 frequented kernel: [7341847.222669] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:14 frequented kernel: [7341847.222683] NFS: MNTPATH: '/mnt/clarification'
Oct 22 12:19:14 frequented kernel: [7341847.222687] --> nfs4_try_mount()
Oct 22 12:19:14 frequented kernel: [7341847.243053] <-- nfs4_try_mount() = 0
Oct 22 12:19:14 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:14 frequented kernel: [7341847.724590] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:14 frequented kernel: [7341847.724623] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:14 frequented kernel: [7341847.724643] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:14 frequented kernel: [7341847.724670] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:14 frequented kernel: [7341847.724691] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:14 frequented kernel: [7341847.724697] --> nfs4_try_mount()
Oct 22 12:19:14 frequented kernel: [7341847.738055] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:24 frequented kernel: [7341857.738991] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:24 frequented kernel: [7341857.739068] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:24 frequented kernel: [7341857.739089] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:24 frequented kernel: [7341857.739115] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:24 frequented kernel: [7341857.739136] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:24 frequented kernel: [7341857.739142] --> nfs4_try_mount()
Oct 22 12:19:24 frequented kernel: [7341857.759065] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:25 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:34 frequented kernel: [7341867.760007] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:34 frequented kernel: [7341867.760083] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:34 frequented kernel: [7341867.760104] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:34 frequented kernel: [7341867.760130] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:34 frequented kernel: [7341867.760151] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:34 frequented kernel: [7341867.760158] --> nfs4_try_mount()
Oct 22 12:19:35 frequented kernel: [7341867.778060] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:35 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:45 frequented kernel: [7341877.778860] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:45 frequented kernel: [7341877.778892] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:45 frequented kernel: [7341877.778913] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:45 frequented kernel: [7341877.778939] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:45 frequented kernel: [7341877.778960] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:45 frequented kernel: [7341877.778966] --> nfs4_try_mount()
Oct 22 12:19:45 frequented kernel: [7341877.799069] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:45 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:19:49 frequented named[11830]: client 74.53.87.138#63953 (0.a.f.9.4.0.0.0.8.a.5.0.1.0.0.2.ip6.arpa): view edge-clients: query: 0.a.f.9.4.0.0.0.8.a.5.0.1.0.0.2.ip6.arpa IN SOA -E (173.228.29.167)
Oct 22 12:19:55 frequented kernel: [7341887.800150] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:55 frequented kernel: [7341887.800183] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:19:55 frequented kernel: [7341887.800203] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:19:55 frequented kernel: [7341887.800229] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:19:55 frequented kernel: [7341887.800250] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:19:55 frequented kernel: [7341887.800256] --> nfs4_try_mount()
Oct 22 12:19:55 frequented kernel: [7341887.820089] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:19:55 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:20:05 frequented kernel: [7341897.820923] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:20:05 frequented kernel: [7341897.820954] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:20:05 frequented kernel: [7341897.820974] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:20:05 frequented kernel: [7341897.821000] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:20:05 frequented kernel: [7341897.821063] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:20:05 frequented kernel: [7341897.821070] --> nfs4_try_mount()
Oct 22 12:20:05 frequented kernel: [7341897.835051] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:20:05 frequented udevd[350]: specified group 'plugdev' unknown
Oct 22 12:20:15 frequented kernel: [7341907.835965] NFS: nfs mount opts='vers=4,addr=fd12:3456:789a:0:2ab:cdff:feab:cdef,clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:20:15 frequented kernel: [7341907.835999] NFS:   parsing nfs mount option 'vers=4'
Oct 22 12:20:15 frequented kernel: [7341907.836063] NFS:   parsing nfs mount option 'addr=fd12:3456:789a:0:2ab:cdff:feab:cdef'
Oct 22 12:20:15 frequented kernel: [7341907.836089] NFS:   parsing nfs mount option 'clientaddr=fd12:3456:789a:0:2ab:cdff:feef:abcd'
Oct 22 12:20:15 frequented kernel: [7341907.836110] NFS: MNTPATH: '/mnt/bisquick'
Oct 22 12:20:15 frequented kernel: [7341907.836117] --> nfs4_try_mount()
Oct 22 12:20:15 frequented kernel: [7341907.856066] <-- nfs4_try_mount() = -116 [error]
Oct 22 12:20:15 frequented udevd[350]: specified group 'plugdev' unknown

Comment 11 Wendell Baker 2012-10-22 19:36:08 UTC
Reminder ... 
NFS "works" between frequented and drippie for /mnt/bisquick, one just has to use NFSv2.
NFS "works" between frequented and drippie for /mnt/clarification in nfs4



[on frequented]
$ sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -o vers=2 -v -v
mount.nfs: timeout set for Mon Oct 22 12:29:22 2012
mount.nfs: trying text-based options 'vers=2,addr=<drippie'sfd::/8>'
mount.nfs: prog 100003, trying vers=2, prot=6
mount.nfs: trying <drippie'sfd::/8> prog 100003 vers 2 prot TCP port 2049
mount.nfs: prog 100005, trying vers=1, prot=17
mount.nfs: trying <drippie'sfd::/8> prog 100005 vers 1 prot UDP port 20048
(OK)

[on frequented]
$ ls -l /tmp/drippie/clarification
total 28
drwxrwsr-x. 2 wbaker source  4096 Jan  6  2012 logs
drwx------. 2 root   root   16384 Dec 17  2011 lost+found
-rwxrwxr-x. 1 wbaker source  1684 Apr 17  2012 sync
drwxrwsrwt. 9 wbaker source  4096 Mar  9  2012 volumes

$ ls -l /tmp/drippie/bisquick
total 64
drwxrwsr-t.  5 wbaker source  4096 Jan 18  2012 archive
drwxrwsr-t. 11 wbaker source  4096 Oct 22 10:00 areas
drwxrwxr-x.  4 wbaker bakers  4096 Feb  6  2012 deprecated
drwxrwsr-x.  4 wbaker bakers  4096 Dec 23  2011 devices
drwxrwsr-t. 75 wbaker bakers  4096 Feb  5  2012 disks
drwxrwsr-t.  7 wbaker source  4096 Dec 20  2011 foreign
drwxrwxr-x.  4 wbaker wbaker  4096 Jul 21 10:06 j
drwxrwsr-t.  9 wbaker source  4096 Feb  4  2012 local
drwx------.  2 root   root   16384 Jan 15  2012 lost+found
-rw-rw-r--.  1 wbaker wbaker   177 Jul 21 10:12 PROVENANCE
drwxrwsr-x.  2 wbaker source  4096 Jul 21 10:08 self
drwxrwxr-x.  6 wbaker bakers  4096 Jan  6  2012 users
drwxrwsr-x. 17 wbaker source  4096 Oct 15 10:09 whole





$ grep -Ee '(clarification|bisquick)' /proc/mounts
drippie:/mnt/clarification/ /tmp/drippie/clarification nfs4 rw,relatime,vers=4.0,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp6,port=0,timeo=600,retrans=2,sec=sys,clientaddr=<frequented'sfd::/8>,local_lock=none,addr=<drippie'sfd::/8> 0 0
drippie:/mnt/bisquick/ /tmp/drippie/bisquick nfs rw,relatime,vers=2,rsize=8192,wsize=8192,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=sys,mountaddr=<drippie'sfd::/8>,mountvers=1,mountport=20048,mountproto=udp6,local_lock=none,addr=<drippie'sfd::/8> 0 0

Comment 12 J. Bruce Fields 2012-10-22 20:26:55 UTC
tcpdump output isn't very useful for NFSv4, which is why Steve asked for the full binary data.

In any case, I don't see any obvious reason why we should be getting ESTALE's.

One of the two exported directories has sgid bit set, not the other, and they have different ownership, but I can't see why either of those should matter.

Do I understand correctly that NFSv3 mounts also fail?  That's a bit surprising--I would have expected v2 and v3 to behave the same way.

Comment 13 Wendell Baker 2012-10-24 17:30:47 UTC
Created attachment 632900 [details]
experiment #1, raw pcap

Comment 14 Wendell Baker 2012-10-24 17:31:21 UTC
Created attachment 632901 [details]
experiment #2, raw pcap

Comment 15 Wendell Baker 2012-10-24 17:31:46 UTC
Created attachment 632902 [details]
experiment #1; raw pcap (client side)

Comment 16 Wendell Baker 2012-10-24 17:32:25 UTC
Created attachment 632903 [details]
experiment #2; raw pcap (client side)

Comment 17 Wendell Baker 2012-10-24 17:37:37 UTC
Created attachment 632906 [details]
trials across nfs4, nfs3, nfs2

Exhibition

for i in 4 3 2 ; do 
  echo trying vers=$i ... 
  sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -o vers=$i -v -v
done

Comment 18 J. Bruce Fields 2012-10-24 22:08:40 UTC
Just looking at attachment 632900 [details]....

In frames 319 and 320 there's a putrootfh/getfh, to get filehandle with hash 0x62d40c52.

Then a lookup of "mnt" against that filehandle in frames 339/340 gives filehandle with hash 0xc0ebafb6.

Then a lookup of "bisquick" against that filehandle in frames 353 and 354 gives filehandle with hash 0xd93120a5.

Then an attempt to use that filehandle gives ESTALE.

Also slightly odd: both of those lookups are duplicated.  Why would the client be sending identical lookups one right after the other?  (For example, frames 339/340 are and 337/338 are different rpc's with different xid's, but do the same lookup.)

The contents of /proc/net/rpc/*/content after this test might be interesting.

Is it possible that those two filesystems could have the same uuid?  For example, is one some kind of clone or snapshot of the other?

That might explain the problem as the v2 filehandle might be too small to contain uuid and might fall back on dev major, minor (I don't remember the details).  Though I'm not sure why that would cause this ESTALE.

Comment 19 Wendell Baker 2012-10-25 17:45:58 UTC
The filesystems of bisquick and clarification were made in different weeks.  they are not replicas.  They are maintained by rsync.

The hardware on each is a 6GB Western Digital MyBook formatted as a single LUKS container (thus there are two partitions on the device).  Both were created by pushing the buttons on the palimpsest of the F13 era.  Seems like 2011-02 was when palimpest became useable enough that it was easier to go graphical than run the fdisk & cryptsetup recipe from the command line (these devices would have had to use parted anyway)

The history here is that drippie was F13 until a few weeks ago when I upgraded it to F17.  Until that time drippie was working fine as a server (and using nfs2 or nfs3 ... never bothered to study it much ... it "just worked").  There's been some (ahem) teething pains getting nfs to work on F17 with the identity mapping technologies changing across F14, F16 & F17 and nfs2, nfs3, nfs4.

History:
clarification created 2011-12-17
bisquick created 2011-12-30

Neither is a replica of the other.  Both are used for backup type activity, but the structure of the ext4 filesystems on them is completely different.



on drippie (the server)

$ sudo parted -l

[deleted /dev/sda and the boot disk lvms -- WCB]

Model: WD My Book (scsi)
Disk /dev/sdb: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      20.5kB  210MB   210MB   fat32        EFI System Partition  boot
 2      210MB   6001GB  6001GB               bisquick              lvm


Model: WD My Book (scsi)
Disk /dev/sdc: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name                  Flags
 1      20.5kB  210MB   210MB   fat32        EFI System Partition  boot
 2      210MB   6001GB  6001GB               clarification         lvm


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/luks-547d2d1a-2896-4fa5-bd05-c26597e7eff0: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  6001GB  6001GB  ext4


Model: Linux device-mapper (crypt) (dm)
Disk /dev/mapper/luks-4c8d49ad-a06f-49be-9e65-789b41721af8: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  6001GB  6001GB  ext4

Comment 20 Wendell Baker 2012-10-25 17:58:36 UTC
Created attachment 633487 [details]
for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done (after the experiment)



[on frequented, the client]
$ sudo umount /tmp/drippie/bisquick

[on frequented, the client]
$ df
Filesystem                        1K-blocks       Used  Available Use% Mounted on
<snip/>
drippie:/mnt/clarification/      5768225280 3528027776 1947188480  65% /tmp/drippie/clarification
drippie:/local/                   370016256   54267904  296952704  16% /tmp/drippie/local
drippie:/opt/                      51606144    3910400   45074304   8% /tmp/drippie/opt
drippie:/mnt/bisquick/           5768225200 4697623096  777593112  86% /tmp/drippie/bisquick

(the point being that bisquick isn't mounted any more)
(run the experiment)

[on frequented, the client]
$ for i in 4 3 2 ; do echo trying vers=$i ... ; sudo mount.nfs drippie:/mnt/bisquick /tmp/drippie/bisquick -o vers=$i -v -v; done >& o.ignore.out

(we already saw the output)


[on frequented, the client]$ for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done >& o.on_frequented_nfs_proc_net_rpc.out 

(is uninteresting)

[on frequented, the client]
$ cat o.on_frequented_nfs_proc_net_rpc.out 
file /proc/net/rpc/auth.unix.gid/content
#uid cnt: gids...
file /proc/net/rpc/auth.unix.ip/content
#class IP domain






[on drippie, the server]
$ for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done > o.on_drippie_nfs_proc_net_rpc_again.out 2>&1


attached nearby.

Comment 21 J. Bruce Fields 2012-10-26 20:33:16 UTC
This line:

  # *.sanguine.emerson.baker.org 6 0x8e958c4ccee53bfa0000000000000000

in /proc/net/rpc/nfsd.fh/content says that the kernel asked mountd whether there was a filesystem with uuid 0x8e95..., and something went wrong--mountd didn't answer, I think, I'm not sure.

Despite that, we can see that there's a filesystem with that uuid:

  /mnt/bisquick *.sanguine.emerson.baker.org(rw,no_root_squash,sync,wdelay,no_subtree_check,uuid=4c8c958e:fa3be5ce:00000000:00000000)

(note that uuid's byte-swapped, but it's the same one.)

That suggests some bug or misconfiguration affecting mountd or libblkid.

The output of the "blkid" command might be interesting.

Also possibly useful: restart the server, then

  strace -p($pidof rpc.mountd) -s4096 -eopen,read,write -oOUTPUT

then try the mounts, then kill the strace and attach OUTPUT.

Comment 22 J. Bruce Fields 2012-10-26 20:35:24 UTC
I'm also not completely sure why there is a line:
*.sanguine.emerson.baker.org 5 0x8e958c4ccee53bfa /mnt/bisquick

Probably that's from a successful v2 mount, as v2 likely needs the smaller uuid to fit v2 filehandle limits.

Comment 23 Wendell Baker 2012-10-27 23:53:01 UTC
Created attachment 634369 [details]
testing and repair; restarting nfs; and other output

The problem went away with the restart of nfs-server.service

Diagnosis: mountd got confused somehow (how?)
Treatment: restart nfs (restart mountd)


The commands are roughly in the order shown.
nfs2, nfs3, nfs4 behave correctly now.

frequented behaves correctly.
eggplants behaves correctly.

the problem was on drippie (the server).

One for the record books.  Unless you guys can figure out what went wrong, there is no problem to exhibit any more.

Comment 24 Wendell Baker 2012-10-27 23:54:31 UTC
Created attachment 634370 [details]
(after restart nfs-server.service) sudo strace -p$(pidof rpc.mountd) -s4096 -eopen,read,write -oo.on_drippie_strace_rpc_mountd_open_read_write.out

Comment 25 Wendell Baker 2012-10-27 23:55:02 UTC
Created attachment 634371 [details]
(after restart nfs-server.service) for i in /proc/net/rpc/*/content; do echo "file $i" ; sudo cat $i; done >& o.on_drippie_after_restart_nfs_proc_net_rpc.out

Comment 26 J. Bruce Fields 2012-10-28 15:08:09 UTC
OK, thanks--reopen if you figure out a reproducer.

Comment 27 Wendell Baker 2012-11-23 23:03:33 UTC
It happened again.  I'm not going to reopen the ticket unless/until I have more hard data.  I didn't do a lot of detail work debugging or reproducing.  There's a "there" here somehow, but there's an easy workaround.

For reasons related here or not, drippie needed a reboot.  It was rebooted.

1. Reboot drippie.
2. on drippie mounted clarification with palimpest (supply the password)
3. on drippie mounted bisquick with palimpest (supply the password)

From remote machines on F14 & F16, via nfs autofs, try to 

ls /vol/clarification
ls /vol/bisquick

The ls of /vol/clarification worked. The ls of /vol/bisquick hung.   The tcpdump in both cases showed the funny Stale NFS File Handle behavior.

remediation: 
on drippie
systemctl restart nfs-server.service

... and all is well.


For example:

14:50:52.442547 IP6 wantowen.905 > drippie.nfs: Flags [.], ack 1069, win 56, options [nop,nop,TS val 523126768 ecr 736806], length 0
14:50:52.442592 IP6 wantowen.4106132903 > drippie.nfs: 140 fsinfo fh Unknown/010006004C8C958EFA3BE5CE0000000000000000000000000000000000000000
14:50:52.442889 IP6 drippie.nfs > wantowen.4106132903: reply ok 32 fsinfo ERROR: Stale NFS file handle
14:50:52.442936 IP6 wantowen.4122910119 > drippie.nfs: 140 fsinfo fh Unknown/010006004C8C958EFA3BE5CE000000000000000000000014010006000CF9586E
14:50:52.443222 IP6 drippie.nfs > wantowen.4122910119: reply ok 32 fsinfo ERROR: Stale NFS file handle

Comment 28 Tethys 2012-12-02 21:44:49 UTC
I know this is closed, and I probably can't add much to it, but I just
thought it worth mentioning that I've been seeing exactly the same
problem. The box had plenty of memory available, but v4 mounts where
failing with ENOMEM. v3 mounts were hanging indefinitely. v2 mounts
were working.

leto:~% rpm -q rpcbind nfs-utils
rpcbind-0.2.0-19.fc17.x86_64
nfs-utils-1.2.6-6.fc17.x86_64

Restarting the NFS server (CentOS 6.3) seemed to fix the problem,
at least for now.

Comment 29 Wendell Baker 2012-12-03 01:03:52 UTC
To clarify #27 ... (in case it's reopened)

It's deterministic and repeatable

- boot the server (clients need not cold start)
- start up nfs-server.service
- be at runlevel 5 and operating
- bring multiple new storage units online (e.g. hotplug, supply password)
- $XYZABC
- nfs mount multiple storage from the clients (autofs, manually, whatever)
- observe the problem

The remediation is to insert
$XYZABC = systemctl restart nfs-server.service

One can show this with
a laptop running F17 as server
a laptop running F17 as client
2x usb ports on the server (3x or more is fine too)
2x outboard HDD on the server (same)