Bug 1058940
| Summary: | Autofs mounted directories going stale & reporting not a directory | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Justin McCormick <justin.mccormick> | ||||||
| Component: | autofs | Assignee: | Ian Kent <ikent> | ||||||
| Status: | CLOSED NOTABUG | QA Contact: | Filesystem QE <fs-qe> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.4 | CC: | cmarcant, ctatman, ikent, justin.mccormick, steved | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-10-12 14:19:38 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Once a mount is mounted autofs (certainly as far as the kernel module goes) has little to do with the mount. Perhaps this has been seen by the nfs folks? Can you provide a full autofs debug log please. You have provided information about the contents of /etc/mtab but haven't also checked /proc/mounts. Can you also check /proc/mounts. (In reply to Ian Kent from comment #3) > Can you provide a full autofs debug log please. > > You have provided information about the contents of /etc/mtab > but haven't also checked /proc/mounts. Can you also check > /proc/mounts. I'd like to try and help with this but I've looked over the report again and checked what I can again. The next thing to check is an autofs full debug log before I start looking at the NFS kernel code. Can you provide one please. (In reply to Ian Kent from comment #4) > (In reply to Ian Kent from comment #3) > > Can you provide a full autofs debug log please. > > > > You have provided information about the contents of /etc/mtab > > but haven't also checked /proc/mounts. Can you also check > > /proc/mounts. > > I'd like to try and help with this but I've looked over > the report again and checked what I can again. The next > thing to check is an autofs full debug log before I start > looking at the NFS kernel code. > > Can you provide one please. And an example of the autofs maps in use please. There was a report recently about a problem with mount locations that have intermediate path elements that are themselves automounted directories. Ian Also, there was an autofs kernel change in 2.6.32-358.21.1.el6 but the symptoms don't sound like what your seeing. Nevertheless, it's probably worth updating the kernel to the latest errata kernel, if that's possible on any of the systems this has been seen on. NIS map of the automount:
[09:12 jlmccorm@cycrh6rtp09 ~]: ypmatch -k p4_root_tools auto.x.eng
p4_root_tools rtpetc01.rtp.netapp.com:/vol/p4_root_tools
==============================================
Shows it's currently mounted, but reporting not a directory:
[09:04 jlmccorm@cycrh6rtp09 ~]: mount | grep p4_root_tools
rtpetc01.rtp.netapp.com:/vol/p4_root_tools on /x/eng/p4_root_tools type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16)
[09:04 jlmccorm@cycrh6rtp09 ~]: cd /x/eng/p4_root_tools
bash-4.2: cd: /x/eng/p4_root_tools: Not a directory
[09:04 jlmccorm@cycrh6rtp09 ~]: ls -ald /x/eng/p4_root_tools
drwxrwxrwx 14 root root 4096 Jan 10 16:52 /x/eng/p4_root_tools
[09:04 jlmccorm@cycrh6rtp09 ~]: stat /x/eng/p4_root_tools
File: `/x/eng/p4_root_tools'
Size: 4096 Blocks: 8 IO Block: 65536 directory
Device: 2fh/47d Inode: 64 Links: 14
Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-03-26 06:32:26.151933000 -0400
Modify: 2014-01-10 16:52:48.209660000 -0500
Change: 2014-01-10 16:52:48.209660000 -0500
Contents of /proc/mount and /etc/mtab:
[09:04 jlmccorm@cycrh6rtp09 ~]: grep p4_root_tools /proc/mounts
rtpetc01.rtp.netapp.com:/vol/p4_root_tools /x/eng/p4_root_tools nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0
[09:04 jlmccorm@cycrh6rtp09 ~]: grep p4_root_tools /etc/mtab
rtpetc01.rtp.netapp.com:/vol/p4_root_tools /x/eng/p4_root_tools nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0
==============================================
On another identical client, that is currently able to access the NFS share via the same autofs path:
[09:17 jlmccorm@cycrh6rtp20 ~]: mount | grep p4_root_tools
rtpetc01.rtp.netapp.com:/vol/p4_root_tools on /x/eng/p4_root_tools type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16)
[09:17 jlmccorm@cycrh6rtp20 ~]: grep p4_root_tools /proc/mounts
rtpetc01.rtp.netapp.com:/vol/p4_root_tools /x/eng/p4_root_tools nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0
[09:18 jlmccorm@cycrh6rtp20 ~]: grep p4_root_tools /etc/mtab
rtpetc01.rtp.netapp.com:/vol/p4_root_tools /x/eng/p4_root_tools nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0
[09:18 jlmccorm@cycrh6rtp20 ~]: cd /x/eng/p4_root_tools
[09:18 jlmccorm@cycrh6rtp20 /x/eng/p4_root_tools]: cd ~
[09:18 jlmccorm@cycrh6rtp20 ~]: ls -ald /x/eng/p4_root_tools
drwxrwxrwx 14 root root 4096 Jan 10 16:52 /x/eng/p4_root_tools
[09:18 jlmccorm@cycrh6rtp20 ~]: stat /x/eng/p4_root_tools
File: `/x/eng/p4_root_tools'
Size: 4096 Blocks: 8 IO Block: 65536 directory
Device: 3ah/58d Inode: 64 Links: 14
Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2014-03-26 06:32:26.151933000 -0400
Modify: 2014-01-10 16:52:48.209660000 -0500
Change: 2014-01-10 16:52:48.209660000 -0500
==============================================
We enabled debug logging in autofs, and I am attaching a logfile showing when this autofs path ('/x/eng/p4_root_tools') was mounted. I see it was expired, unmounted, and the directory removed at Mar 25 23:14:03. And then it was mounted again at Mar 25 23:14:04. I then see it called for expire_proc_indirect, but is never unmounted.
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: handle_packet_expire_indirect: token 54452, name p4_root_tools
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: expiring path /x/eng/p4_root_tools
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: umount_multi: path /x/eng/p4_root_tools incl 1
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: unmounting dir = /x/eng/p4_root_tools
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: spawn_umount: mtab link detected, passing -n to mount
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: rm_unwanted_fn: removing directory /x/eng/p4_root_tools
Mar 25 23:14:03 cycrh6rtp09 automount[5925]: expired /x/eng/p4_root_tools
...
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: handle_packet_missing_indirect: token 54457, name p4_root_tools, request pid 11724
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: attempting to mount entry /x/eng/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: lookup_mount: lookup(program): p4_root_tools -> rtpetc01.rtp.netapp.com:/vol/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: lookup_mount: lookup(program): looking up p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: lookup_mount: lookup(program): p4_root_tools -> rtpetc01.rtp.netapp.com:/vol/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: parse_mount: parse(sun): expanded entry: rtpetc01.rtp.netapp.com:/vol/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: parse_mount: parse(sun): gathered options:
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: parse_mount: parse(sun): dequote("rtpetc01.rtp.netapp.com:/vol/p4_root_tools") -> rtpetc01.rtp.netapp.com:/vol/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: parse_mount: parse(sun): core of entry: options=, loc=rtpetc01.rtp.netapp.com:/vol/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: sun_mount: parse(sun): mounting root /x/eng, mountpoint p4_root_tools, what rtpetc01.rtp.netapp.com:/vol/p4_root_tools, fstype nfs, options (null)
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: mount_mount: mount(nfs): root=/x/eng name=p4_root_tools what=rtpetc01.rtp.netapp.com:/vol/p4_root_tools, fstype=nfs, options=(null)
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: mount_mount: mount(nfs): calling mkdir_path /x/eng/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: mount_mount: mount(nfs): calling mount -t nfs rtpetc01.rtp.netapp.com:/vol/p4_root_tools /x/eng/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: spawn_mount: mtab link detected, passing -n to mount
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: mount(nfs): mounted rtpetc01.rtp.netapp.com:/vol/p4_root_tools on /x/eng/p4_root_tools
Mar 25 23:14:04 cycrh6rtp09 automount[5925]: dev_ioctl_send_ready: token = 54457
...
Mar 25 23:16:33 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 25 23:17:48 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 25 23:19:03 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 25 23:20:18 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
...
Mar 26 09:00:24 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 26 09:01:39 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Mar 26 09:02:54 cycrh6rtp09 automount[5925]: expire_proc_indirect: expire /x/eng/p4_root_tools
Created attachment 879001 [details]
automount debug log from cycrh6rtp09 for p4_root_tools mount
I normally don't find strace output useful but, in this case, I'd like to know which system call is returning the failure so I can track the execution path through the kernel. I'm still not sure if the problem is with the automount code or if I should be looking at NFS. Maybe strace output will help with that. The debug log wasn't complete so, even though it looks like there are a couple of anomalies, I can't tell if that's so. Do you have two maps that have entries for /x/eng (odd), since there appears to be two different expires for similarly named mounts? Ian (In reply to Ian Kent from comment #9) > > The debug log wasn't complete so, even though it looks like > there are a couple of anomalies, I can't tell if that's so. > Do you have two maps that have entries for /x/eng (odd), since > there appears to be two different expires for similarly named > mounts? Umm, OK, that's just some other stuff going on during the expire that actually does look OK. > > Ian Since what's going on here isn't obvious (not withstanding checking system calls from an strace) I guess the other thing we'll have to do in schedule a reboot for a machine that's showing the problem and get a crash dump so I can have a look at that too. (In reply to Ian Kent from comment #9) > I normally don't find strace output useful but, in this case, > I'd like to know which system call is returning the failure > so I can track the execution path through the kernel. > > I'm still not sure if the problem is with the automount code > or if I should be looking at NFS. Maybe strace output will > help with that. > > The debug log wasn't complete so, even though it looks like > there are a couple of anomalies, I can't tell if that's so. > Do you have two maps that have entries for /x/eng (odd), since > there appears to be two different expires for similarly named > mounts? > > Ian What would you like an strace of? I originally provided an strace of attempting to ls one of the mounts that is reporting not a directory (see https://bugzilla.redhat.com/attachment.cgi?id=856791). From the strace, it looks like the open() system call after the stat fails, and returns ENOTDIR. 13:34:25 stat("/x/eng/rlse63", {st_dev=makedev(0, 278), st_ino=64, st_mode=S_IFDIR|0755, st_nlink=82, st_uid=20041, st_gid=30, st_blksize=65536, st_blocks=16, st_size=8192, st_atime=2014/01/23-20:27:39, st_mtime=2013/12/09-17:11:13, st_ctime=2014/01/22-13:04:17}) = 0 13:34:25 open("/x/eng/rlse63", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 ENOTDIR (Not a directory) Also, you mentioned getting a crash dump. These clients are all running on an nfsroot image, so enabling kdump will be difficult. It appears that the mkdumprd portion of mkdumprd does not support a NFS rootfs. Would there be any other way to obtain a core dump without enabling the kdump service? bash-4.1# service kdump start No kdump initial ramdisk found. [WARNING] Rebuilding /boot/initrd-2.6.32-358.11.1.el6.x86_64kdump.img device node not found /sbin/mkdumprd: line 1469: test: /sbin/fsck.rootfs: binary operator expected /etc/kdump.conf: Unsupported type rootfs nfs Failed to run mkdumprd bash-4.1# (In reply to Justin McCormick from comment #12) > (In reply to Ian Kent from comment #9) > > I normally don't find strace output useful but, in this case, > > I'd like to know which system call is returning the failure > > so I can track the execution path through the kernel. > > > > I'm still not sure if the problem is with the automount code > > or if I should be looking at NFS. Maybe strace output will > > help with that. > > > > The debug log wasn't complete so, even though it looks like > > there are a couple of anomalies, I can't tell if that's so. > > Do you have two maps that have entries for /x/eng (odd), since > > there appears to be two different expires for similarly named > > mounts? > > > > Ian > > What would you like an strace of? I originally provided an strace of > attempting to ls one of the mounts that is reporting not a directory (see > https://bugzilla.redhat.com/attachment.cgi?id=856791). From the strace, it > looks like the open() system call after the stat fails, and returns ENOTDIR. > > 13:34:25 stat("/x/eng/rlse63", {st_dev=makedev(0, 278), st_ino=64, > st_mode=S_IFDIR|0755, st_nlink=82, st_uid=20041, st_gid=30, > st_blksize=65536, st_blocks=16, st_size=8192, st_atime=2014/01/23-20:27:39, > st_mtime=2013/12/09-17:11:13, st_ctime=2014/01/22-13:04:17}) = 0 > 13:34:25 open("/x/eng/rlse63", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = > -1 ENOTDIR (Not a directory) Right, that is all I'm after. I'll work through that call and see if I can work out where the ENOTDIR is coming from. > > > > Also, you mentioned getting a crash dump. These clients are all running on > an nfsroot image, so enabling kdump will be difficult. It appears that the > mkdumprd portion of mkdumprd does not support a NFS rootfs. Would there be > any other way to obtain a core dump without enabling the kdump service? > Don't know, I'll need to consult with others on that. I doubt this error return is coming from the automount kernel code since it uses ENOTDIR for another purpose within the kernel. Ian Hi, Is this still a problem with rhel6.7? If not, can this bug be closed? --Chris I don't know if this issue exists with rhel6.7 or not. I don't think we're seeing this issue anymore in our 6.4 environment, so I can't be of any additional assistance in providing log information or troubleshooting. So we might as well close this bug. |
Created attachment 856791 [details] strace output from attempting to ls /x/eng/rlse63 Description of problem: Some automounted directories become inaccessible and return 'not a directory' when trying to access them. This is a recurring problem across many of our RHEL 6.4 hosts, it is not limited to this single path/volume. When the path is inaccessible on one of our hosts, it is still accessible from other hosts. In the output I have provided below, the path /x/eng/rlse63 is inacessible. There are (currently) 51 other mounts on this host to 'rtpetc01' which is a NetApp FlexCache filer. I'm able to access the other 51 mount points without a problem. I'm also able to access rtpetc01.rtp.netapp.com:/vol/rlse63 via other hosts without a problem. We also ran a tcpdump from the host, checking for traffic to rtpetc01 when accessing this path. However, no packets were sent to (or received from) the filer whenever we tried to change into the directory or list it's contents. I have attached a strace.out file showing a strace of attempting to ls the directory. Version-Release number of selected component (if applicable): [13:06 jlmccorm@cycrh6rtp37 ~]: cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.4 (Santiago) [13:06 jlmccorm@cycrh6rtp37 ~]: uname -a Linux cycrh6rtp14.rtp.netapp.com 2.6.32-358.11.1.el6.x86_64 #1 SMP Wed May 15 10:48:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux [13:07 jlmccorm@cycrh6rtp37 ~]: rpm -qa kernel autofs nfs-utils kernel-2.6.32-358.el6.x86_64 kernel-2.6.32-358.11.1.el6.x86_64 autofs-5.0.5-73.ntap1.el6.x86_64 nfs-utils-1.2.3-36.el6.x86_64 How reproducible: Steps to Reproduce: 1. Unknown 2. 3. Actual results: autofs mounts /x/eng/rlse63 cd /x/eng/rlse63 returns 'Not a Directory' Expected results: autofs mounts /x/eng/rlse63 cd /x/eng/rlse63 works or autofs times out /x/eng/rlse63 after inactivity Additional info: [13:22 jlmccorm@cycrh6rtp37 ~]: grep -v '^#' /etc/sysconfig/autofs TIMEOUT=300 BROWSE_MODE="no" MOUNT_NFS_DEFAULT_PROTOCOL=4 USE_MISC_DEVICE="yes" [13:22 jlmccorm@cycrh6rtp37 ~]: grep /x/eng/rlse63 /etc/mtab rtpetc01.rtp.netapp.com:/vol/rlse63 /x/eng/rlse63 nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0 [13:22 jlmccorm@cycrh6rtp37 ~]: cd /x/eng/rlse63 bash-4.2: cd: /x/eng/rlse63: Not a directory [13:23 jlmccorm@cycrh6rtp37 ~]: stat /x/eng/rlse63 File: `/x/eng/rlse63' Size: 8192 Blocks: 16 IO Block: 65536 directory Device: 100016h/1048598d Inode: 64 Links: 82 Access: (0755/drwxr-xr-x) Uid: (20041/ build) Gid: ( 30/ engr) Access: 2014-01-23 20:27:39.374348000 -0500 Modify: 2013-12-09 17:11:13.581802000 -0500 Change: 2014-01-22 13:04:17.439315000 -0500 [13:24 jlmccorm@cycrh6rtp37 ~]: mount | grep 'rtpetc01.rtp.netapp.com' | wc -l 52 [13:37 jlmccorm@cycrh6rtp37 ~]: cd /x/eng/rlse63 bash-4.2: cd: /x/eng/rlse63: Not a directory [13:37 jlmccorm@cycrh6rtp37 ~]: mount | grep '/x/eng/rlse63' rtpetc01.rtp.netapp.com:/vol/rlse63 on /x/eng/rlse63 type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16) [13:37 jlmccorm@cycrh6rtp37 ~]: sudo umount /x/eng/rlse63 [13:38 jlmccorm@cycrh6rtp37 ~]: cd /x/eng/rlse63 [13:38 jlmccorm@cycrh6rtp37 /x/eng/rlse63]: df -h . Filesystem Size Used Avail Use% Mounted on rtpetc01.rtp.netapp.com:/vol/rlse63 4.4T 3.5T 972G 79% /x/eng/rlse63 [13:38 jlmccorm@cycrh6rtp37 /x/eng/rlse63]: /bin/ls | wc -l 79 [13:38 jlmccorm@cycrh6rtp37 /x/eng/rlse63]: mount | grep '/x/eng/rlse63' rtpetc01.rtp.netapp.com:/vol/rlse63 on /x/eng/rlse63 type nfs (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16) [13:38 jlmccorm@cycrh6rtp37 /x/eng/rlse63]: grep '/x/eng/rlse63' /etc/mt grep: /etc/mt: No such file or directory [13:38 jlmccorm@cycrh6rtp37 /x/eng/rlse63]: grep '/x/eng/rlse63' /etc/mtab rtpetc01.rtp.netapp.com:/vol/rlse63 /x/eng/rlse63 nfs rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.60.248.16,mountvers=3,mountport=4046,mountproto=udp,local_lock=none,addr=10.60.248.16 0 0