Bug 833535
Summary: | autofs-5.0.6-19.fc17.x86_64 doesn't work with mock | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | H.J. Lu <hongjiu.lu> |
Component: | autofs | Assignee: | Ian Kent <ikent> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | andrejoh, Bert.Deknuydt, bill, dax, elliott.forney, fedoraproject, gepeng1983, habicht, holm, igeorgex, ikent, imc, info, irlapati, jehan.procaccia, jlayton, Marcin.Dulak, mcarter, mkfischer, saguryev.gnu, vendor-redhat |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2013-08-01 17:07:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
H.J. Lu
2012-06-19 17:15:30 UTC
But this doesn't usually happen, right? We'll need to get a debug log of it happening and see if there's anything in that which gives a clue as to what is happening. You can get a debug log by setting LOGGING="debug" in the configuration file, /etc/sysconfig/autofs and ensuring that all messages are bieng logged for facility daemon. I have the line "daemon.* /var/log/debug" in my syslog configuration for this. Jun 20 09:03:38 gnu-32 automount[23401]: umount_multi: path /net/gnu-4 incl 1 Jun 20 09:03:38 gnu-32 automount[23401]: umount_multi_triggers: umount offset /net/gnu-4/export/server Jun 20 09:03:38 gnu-32 automount[23401]: umount_autofs_offset: offset /net/gnu-4/export/server not mounted Jun 20 09:03:38 gnu-32 automount[23401]: umount_multi_triggers: umount offset /net/gnu-4/export Jun 20 09:03:38 gnu-32 automount[23401]: umounted offset mount /net/gnu-4/export Jun 20 09:03:38 gnu-32 automount[23401]: failed to remove dir /net/gnu-4/export: Device or resource busy Jun 20 09:03:38 gnu-32 automount[23401]: cache_delete_offset_list: deleting offset key /net/gnu-4/export Jun 20 09:03:38 gnu-32 automount[23401]: cache_delete_offset_list: deleting offset key /net/gnu-4/export/server Jun 20 09:03:38 gnu-32 automount[23401]: rm_unwanted_fn: removing directory /net/gnu-4/export Jun 20 09:03:38 gnu-32 automount[23401]: unable to remove directory /net/gnu-4/export: Device or resource busy Jun 20 09:03:38 gnu-32 automount[23401]: rm_unwanted_fn: removing directory /net/gnu-4 Jun 20 09:03:38 gnu-32 automount[23401]: unable to remove directory /net/gnu-4: Directory not empty Jun 20 09:03:38 gnu-32 automount[23401]: expired /net/gnu-4 Jun 20 09:03:38 gnu-32 automount[23401]: dev_ioctl_send_ready: token = 35 Jun 20 09:03:38 gnu-32 automount[23401]: expire_cleanup: got thid 140737353955072 path /net stat 0 Jun 20 09:03:38 gnu-32 automount[23401]: expire_cleanup: sigchld: exp 140737353955072 finished, switching from 2 to 1 Jun 20 09:03:38 gnu-32 automount[23401]: st_ready: st_ready(): state = 2 path /net This is a log relating to the symptom you've observed which might not be the whole story. Please post the log from the startup of the daemon to after the problem occurs. Also some more "grep gnu-4 /proc/mounts" outputs for the proceedure you described in comment #1 might be useful. Created attachment 593444 [details]
The complete autofs log
(In reply to comment #3) > Please post the log from the startup of the daemon to after > the problem occurs. Done. > Also some more "grep gnu-4 /proc/mounts" outputs for the > proceedure you described in comment #1 might be useful. When it happened, I got [hjl@gnu-32 ~]$ grep gnu-4 /proc/mounts [hjl@gnu-32 ~]$ (In reply to comment #0) > I have > > [hjl@gnu-6 glibc-x32]$ showmount --exports gnu-4 > Export list for gnu-4: > /export/server gnu* > /export gnu* > [hjl@gnu-6 glibc-x32]$ > > With autofs-5.0.6-19.fc17.x86_64, I got > > [hjl@gnu-32 glibc]$ ls /net/gnu-4/export > build gnu home intel linux lost+found redhat server spec suse > [hjl@gnu-32 glibc]$ grep gnu-4 /proc/mounts > -hosts /net/gnu-4/export autofs > rw,relatime,fd=13,pgrp=1452,timeout=300,minproto=5,maxproto=5,offset 0 0 > gnu-4:/export/ /net/gnu-4/export nfs4 > rw,nosuid,nodev,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard, > proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.3.194.135, > minorversion=0,local_lock=none,addr=10.3.194.54 0 0 > -hosts /net/gnu-4/export/server autofs > rw,relatime,fd=13,pgrp=1452,timeout=300,minproto=5,maxproto=5,offset 0 0 > [hjl@gnu-32 glibc]$ > > But /net/gnu-4/export isn't unmounted cleanly. I got According to the log the cause of that is a failure to umount the autofs trigger at /net/gnu-4/export/server. The mount list looks OK at this point so I'm not sure what caused umount(8) to think it wasn't present. > > [hjl@gnu-32 mock]$ ls /net/gnu-4 > export > [hjl@gnu-32 mock]$ ls /net/gnu-4/export > ls: cannot open directory /net/gnu-4/export: Too many levels of symbolic > links Anything else following the umount failure is not reliable. I'm also not sure how to get more information about it unless I can reproduce it, which I can't so far, sorry. It happened when I was doing # mock -r fedora-17-i386 --rebuild glibc-2.15-48.fc17.src.rpm on 64-bit Fedora 17/Core i7 965. (In reply to comment #6) > According to the log the cause of that is a failure to umount > the autofs trigger at /net/gnu-4/export/server. The mount list > looks OK at this point so I'm not sure what caused umount(8) to > think it wasn't present. I saw Jun 20 08:59:53 gnu-32 automount[23401]: umount_multi_triggers: umount offset /net/gnu-4/export/server Jun 20 08:59:53 gnu-32 automount[23401]: umounted offset mount /net/gnu-4/export/server Jun 20 08:59:53 gnu-32 automount[23401]: umount_subtree_mounts: unmounting dir = /net/gnu-4/export Jun 20 08:59:53 gnu-32 automount[23401]: spawn_umount: mtab link detected, passing -n to mount Why did spawn_umount call "mount" instead of "umount"? (In reply to comment #8) > (In reply to comment #6) > > According to the log the cause of that is a failure to umount > > the autofs trigger at /net/gnu-4/export/server. The mount list > > looks OK at this point so I'm not sure what caused umount(8) to > > think it wasn't present. > > I saw > > Jun 20 08:59:53 gnu-32 automount[23401]: umount_multi_triggers: umount > offset /net/gnu-4/export/server > Jun 20 08:59:53 gnu-32 automount[23401]: umounted offset mount > /net/gnu-4/export/server > Jun 20 08:59:53 gnu-32 automount[23401]: umount_subtree_mounts: unmounting > dir = /net/gnu-4/export > Jun 20 08:59:53 gnu-32 automount[23401]: spawn_umount: mtab link detected, > passing -n to mount > > Why did spawn_umount call "mount" instead of "umount"? It didn't, that's a mistake in the message. I tried the current autofs git repo. It has a different problem. After umount /net/gnu-4/export/server, automount complained: Jun 22 09:44:20 gnu-32 automount[12972]: st_expire: state 1 path /net Jun 22 09:44:20 gnu-32 automount[12972]: expire_proc: exp_proc = 140737353955072 path /net Jun 22 09:44:20 gnu-32 automount[12972]: expire_proc_indirect: expire /net/gnu-4/export/server Jun 22 09:44:20 gnu-32 automount[12972]: handle_packet: type = 6 Jun 22 09:44:20 gnu-32 automount[12972]: handle_packet_expire_direct: token 54, name /net/gnu-4/export/server Jun 22 09:44:20 gnu-32 automount[12972]: expiring path /net/gnu-4/export/server Jun 22 09:44:20 gnu-32 automount[12972]: umount_multi: path /net/gnu-4/export/server incl 1 Jun 22 09:44:20 gnu-32 automount[12972]: umount_subtree_mounts: unmounting dir = /net/gnu-4/export/server Jun 22 09:44:20 gnu-32 automount[12972]: spawn_umount: mtab link detected, passing -n to umount Jun 22 09:44:20 gnu-32 automount[12972]: expired /net/gnu-4/export/server Jun 22 09:44:20 gnu-32 automount[12972]: dev_ioctl_send_ready: token = 54 Jun 22 09:44:20 gnu-32 automount[12972]: expire_proc_indirect: expire /net/gnu-4/export Jun 22 09:44:20 gnu-32 automount[12972]: 1 remaining in /net and never umounted /net/gnu-4/export. I got # ls /net/gnu-4/export build gnu home intel linux lost+found redhat server spec suse # ls /net/gnu-4/export/server ls: cannot open directory /net/gnu-4/export/server: Too many levels of symbolic links With the same nfs-utils-1.2.4-3.fc15.x86_64 gnu-4 has kernel 2.6.43.8-1.fc15.x86_64 gnu-1 has kernel 2.6.43.7-3.fc15.x86_64 df shows gnu-4:/export/ 721094656 520767488 200323072 73% /net/gnu-4/export gnu-4:/export/server 721094656 520767488 200323072 73% /net/gnu-4/export/server gnu-1:/export/ 236515328 180578304 43726848 81% /net/gnu-1/export gnu-1:/export/server/ 961478656 358574080 602888192 38% /net/gnu-1/export/server Nerver mind. gnu-4 exports the same partion twice. It still happens after I fixed my nfs server. It could be a mock issue, which may hold /net/gnu-4/export. Jun 22 15:46:00 gnu-32 automount[7950]: handle_packet_expire_indirect: token 17, name gnu-4 Jun 22 15:46:00 gnu-32 automount[7950]: expiring path /net/gnu-4 Jun 22 15:46:00 gnu-32 automount[7950]: umount_multi: path /net/gnu-4 incl 1 Jun 22 15:46:00 gnu-32 automount[7950]: umount_multi_triggers: umount offset /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: umounted offset mount /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: umount_autofs_offset: failed to remove dir /net/gnu-4/export: Device or resource busy Jun 22 15:46:00 gnu-32 automount[7950]: cache_delete_offset_list: deleting offset key /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: removing directory /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: unable to remove directory /net/gnu-4/export: Device or resource busy Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: removing directory /net/gnu-4 Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: unable to remove directory /net/gnu-4: Directory not empty Jun 22 15:46:00 gnu-32 automount[7950]: umount_multi: path /net/gnu-4 left 0 Jun 22 15:46:00 gnu-32 automount[7950]: expired /net/gnu-4 Were 2 threads trying to remove /net/gnu-4/export at the same time? spawn_umount has ret = do_spawn(logopt, wait, options, prog, (const char **) argv); if (ret & MTAB_NOTUPDATED) { Is this the correct way to check return value from /bin/umount? (In reply to comment #16) > spawn_umount has > > ret = do_spawn(logopt, wait, options, prog, (const char **) > argv); > if (ret & MTAB_NOTUPDATED) { > > Is this the correct way to check return value from /bin/umount? That's been like that for a long time and has worked OK. But it may be worth checking since /etc/mtab is now a symlink to /proc/mounts. (In reply to comment #13) > It still happens after I fixed my nfs server. There are a couple of patches that haven't been committed to the git repo, one for a similar problem, but what you've described so far doesn't look like the problem it fixes. Remember that when autofs gets confused like this it can remain confused over a restart because it tries to re-establish the mount tree as it was at last shutdown when there are mounts that cannot be umounted. That's a good thing most of the time but can be a pain when the mount tree is in a state where it can't be re-constructed correctly. Have you stopped autofs and checked that all mounts related to autofs are gone? Make sure automount is not running and if there are any autofs related mounts remaining try to umount them manually and if they won't let you umount them you will have to reboot the machine. The point here is you really need to umount the base of the automount tree to be sure you have a clean slate when you start autofs if you have problems like this that won't go away. Ian I started with -hosts /net autofs rw,relatime,fd=13,pgrp=14770,timeout=300,minproto=5,maxproto=5,indirect 0 0 -hosts /net/gnu-4/export autofs rw,relatime,fd=13,pgrp=14770,timeout=300,minproto=5,maxproto=5,offset 0 0 gnu-4:/export/ /net/gnu-4/export nfs4 rw,nosuid,nodev,relatime,vers=4,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.3.194.135,minorversion=0,local_lock=none,addr=10.3.194.54 0 0 /proc/mounts. Then /net/gnu-4/export was umounted and I got -hosts /net autofs rw,relatime,fd=13,pgrp=12856,timeout=300,minproto=5,maxproto=5,indirect 0 0 -hosts /net/gnu-4/export autofs rw,relatime,fd=13,pgrp=12856,timeout=300,minproto=5,maxproto=5,offset 0 0 Before automount could remove /net/gnu-4/export, /net/gnu-4/export was used by another program. When automount tried to remove /net/gnu-4/export, it got: Jun 22 15:46:00 gnu-32 automount[7950]: umount_autofs_offset: failed to remove dir /net/gnu-4/export: Device or resource busy Jun 22 15:46:00 gnu-32 automount[7950]: cache_delete_offset_list: deleting offset key /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: removing directory /net/gnu-4/export Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: unable to remove directory /net/gnu-4/export: Device or resource busy So automount removed /net/gnu-4/export from its key list while it was still mounted. Does this make sense? umount_autofs_offset has if (!rv && me->flags & MOUNT_FLAG_DIR_CREATED) { if (rmdir(me->key) == -1) { char *estr = strerror_r(errno, buf, MAX_ERR_BUF); debug(ap->logopt, "failed to remove dir %s: %s", me->key, estr); } } return rv; When it failed to remove dir, shouldn't it set rv to 1? Even if /net/gnu-4/export isn't used by normal root, it can still be used by chroot from mock. How does automount deal with rmdir failure? (In reply to comment #19) > > Before automount could remove /net/gnu-4/export, /net/gnu-4/export was > used by another program. When automount tried to remove /net/gnu-4/export, > it got: > > Jun 22 15:46:00 gnu-32 automount[7950]: umount_autofs_offset: failed to > remove dir /net/gnu-4/export: Device or resource busy > Jun 22 15:46:00 gnu-32 automount[7950]: cache_delete_offset_list: deleting > offset key /net/gnu-4/export > Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: removing directory > /net/gnu-4/export > Jun 22 15:46:00 gnu-32 automount[7950]: rm_unwanted_fn: unable to remove > directory /net/gnu-4/export: Device or resource busy > > So automount removed /net/gnu-4/export from its key list while > it was still mounted. Does this make sense? That is similar to the bug I mentioned above. When something like this happens autofs is meant to re-construct the tree so it continues to function but there are cases where that fails to happen correctly and they need to be found and fixed. I'll have a look and see what's happening here. (In reply to comment #21) > Even if /net/gnu-4/export isn't used by normal root, > it can still be used by chroot from mock. How does > automount deal with rmdir failure? The mount (tree) shouldn't be expired if it's in use, meaning and open file or working directory present within the tree. Once selected for expire path walks into the tree should be blocked within the kernel until the expire is complete. The expire check is done within the kernel so usage within a chroot should be seen in the same as any other usage. There is a case where the kernel check for business of a tree is racey following (relatively) recent kernel changes (not autofs changes though), and I'm trying to work out a way to fix that. The case were looking at here is not that case so that should be ok. (In reply to comment #20) > umount_autofs_offset has > > if (!rv && me->flags & MOUNT_FLAG_DIR_CREATED) { > if (rmdir(me->key) == -1) { > char *estr = strerror_r(errno, buf, MAX_ERR_BUF); > debug(ap->logopt, "failed to remove dir %s: %s", > me->key, estr); > } > } > return rv; > > When it failed to remove dir, shouldn't it set rv to 1? Maybe, I'll have a look. (In reply to comment #19) > > Before automount could remove /net/gnu-4/export, /net/gnu-4/export was > used by another program. When automount tried to remove /net/gnu-4/export, > it got: Right, don't think that should happen, the process path walk should be blocked at /net/gnu-4 during the expire and then trigger a re-mount of the tree. I'll have a look at that too. (In reply to comment #24) > (In reply to comment #20) > > umount_autofs_offset has > > > > if (!rv && me->flags & MOUNT_FLAG_DIR_CREATED) { > > if (rmdir(me->key) == -1) { > > char *estr = strerror_r(errno, buf, MAX_ERR_BUF); > > debug(ap->logopt, "failed to remove dir %s: %s", > > me->key, estr); > > } > > } > > return rv; > > > > When it failed to remove dir, shouldn't it set rv to 1? > > Maybe, I'll have a look. Yeah, a failed directory removal is a problem. The problem is not so much the return as if that fails the offset trigger should be mounted back and then a fail returned. Maybe the directory removal should be removed from the function and handled on callback. I'm looking at doing that now. Of course, once the mount tree becomes broken like this for some reason there's no telling what will happen when we try to mount it back. Ian It would be worth trying the Rawhide version of autofs since there are some recent changes that might help with this problem. Grab the source rpm from the Rawhide or F18 and build against your system. You will need autofs-5.0.6-23 or later to get what I'd like to test. I tried autofs-5.0.7-1 and it didn't solve the problem. (In reply to comment #28) > I tried autofs-5.0.7-1 and it didn't solve the problem. How about a debug log from 5.0.7-1 please. I am seeing a very similar issue where autofs will start giving the messages about 'Too many levels of symbolic links'. Restarting autofs will clear up the issue. I have seen it on both my NFS server and on various clients. I have enabled syslog debugging as mentioned above and will post logs once it happens again. It is sporatic - roughly every 2 weeks. (In reply to comment #30) > I am seeing a very similar issue where autofs will start giving the messages > about 'Too many levels of symbolic links'. > > Restarting autofs will clear up the issue. I have seen it on both my NFS > server and on various clients. > > I have enabled syslog debugging as mentioned above and will post logs once > it happens again. It is sporatic - roughly every 2 weeks. And /proc/mounts and kernel version. Finally happened this morning. /proc/mounts: rootfs / rootfs rw 0 0 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0 sysfs /sys sysfs rw,seclabel,nosuid,nodev,noexec,relatime 0 0 devtmpfs /dev devtmpfs rw,seclabel,nosuid,size=2007944k,nr_inodes=501986,mode=75 5 0 0 devpts /dev/pts devpts rw,seclabel,nosuid,noexec,relatime,gid=5,mode=620,ptmxmod e=000 0 0 tmpfs /dev/shm tmpfs rw,seclabel,nosuid,nodev 0 0 tmpfs /run tmpfs rw,seclabel,nosuid,nodev,mode=755 0 0 /dev/mapper/vg_bobafett-lv_root / ext4 rw,seclabel,relatime,data=ordered 0 0 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0 selinuxfs /sys/fs/selinux selinuxfs rw,relatime 0 0 tmpfs /sys/fs/cgroup tmpfs rw,seclabel,nosuid,nodev,noexec,mode=755 0 0 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,release_age nt=/usr/lib/systemd/systemd-cgroups-agent,name=systemd 0 0 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpuacct ,cpu 0 0 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0 cgroup /sys/fs/cgroup/net_cls cgroup rw,nosuid,nodev,noexec,relatime,net_cls 0 0 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_eve nt 0 0 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=27,pgrp=1,timeout=300,m inproto=5,maxproto=5,direct 0 0 tmpfs /media tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,mode=755 0 0 debugfs /sys/kernel/debug debugfs rw,relatime 0 0 mqueue /dev/mqueue mqueue rw,seclabel,relatime 0 0 hugetlbfs /dev/hugepages hugetlbfs rw,seclabel,relatime 0 0 configfs /sys/kernel/config configfs rw,relatime 0 0 /dev/sda2 /boot ext4 rw,seclabel,relatime,data=ordered 0 0 /dev/mapper/vg_bobafett-lv_vmware /VirtualBox ext4 rw,seclabel,relatime,data=ord ered 0 0 /dev/mapper/Backup-Backup /backup ext4 rw,seclabel,relatime,data=ordered 0 0 auto.direct /services autofs rw,relatime,fd=7,pgrp=929,timeout=3600,minproto=5,m axproto=5,direct 0 0 auto.direct /usr/local autofs rw,relatime,fd=7,pgrp=929,timeout=3600,minproto=5, maxproto=5,direct 0 0 auto.direct /mythtv autofs rw,relatime,fd=7,pgrp=929,timeout=3600,minproto=5,max proto=5,direct 0 0 auto.direct /home autofs rw,relatime,fd=7,pgrp=929,timeout=3600,minproto=5,maxpr oto=5,direct 0 0 uname -a Linux bobafett 3.5.2-3.fc17.x86_64 #1 SMP Tue Aug 21 19:06:52 UTC 2012 x86_64 x8 6_64 x86_64 GNU/Linux rpm -qa | grep autofs autofs-5.0.6-22.fc17.x86_64 cd /usr/local/bin bash: cd: /usr/local/bin: Too many levels of symbolic links ypcat -k auto.master /- auto.direct --timeout=3600 ypcat -k auto.direct /home -rw deathstar:/export/users /mythtv -rw deathstar:/export/services/mythtv/ /usr/local -rw deathstar:/export/services/local/x86_64/local /services -rw deathstar:/export/services Created attachment 609140 [details]
automount logs from when the error occurs
I should add that this has only over occurred for the /usr/local/bin mount point. Never for the home directory or other mount points. I have no idea what makes that mount point more problematic than the others I am using autofs-5.0.7-1. The problem still happens, but much less often. (In reply to comment #34) > I should add that this has only over occurred for the /usr/local/bin mount > point. Never for the home directory or other mount points. I have no idea > what makes that mount point more problematic than the others You mean /usr/local, right? (In reply to comment #33) > Created attachment 609140 [details] > automount logs from when the error occurs Yes, logs from when you see the error but if there is anything to be learned from the log it would have had to have happened before the time range shown in the log here. (In reply to comment #35) > I am using autofs-5.0.7-1. The problem still happens, but much > less often. And that's even more puzzling. It looks like one way for this to happen is when the kernel thinks that /usr/local has submounts present within it. But the mount table says there are no mounts under it. Given that /proc/mounts is what the kernel thinks is mounted that doesn't appear to be the case. Another way for this to happen would be a race clearing the flag that causes the automounting to occur (that is the flag not being cleared when it should). But in Michael's setup that flag does not need to be cleared at mount and set at umount so there cannot be a race. I don't think either of these cases can be caused by user space except possibly automount doing a lazy umount and some sort of race between detaching the mount and a lookup trigging a mount. What's more that shouldn't be a persistent condition which is also not the case here. Perhaps I could put something into a kernel to return ENOTEMPTY in the case where the kernel thinks there are submounts present to see if that is actually where this problem happens. Would either of you be able to install and test with such a kernel? (In reply to comment #36) > (In reply to comment #34) > > I should add that this has only over occurred for the /usr/local/bin mount > > point. Never for the home directory or other mount points. I have no idea > > what makes that mount point more problematic than the others > > You mean /usr/local, right? Does the problem also occur with other directories under /usr/local? (In reply to comment #38) > (In reply to comment #35) > > I am using autofs-5.0.7-1. The problem still happens, but much > > less often. > > Perhaps I could put something into a kernel to return > ENOTEMPTY in the case where the kernel thinks there are > submounts present to see if that is actually where this > problem happens. Would either of you be able to install > and test with such a kernel? If you put the kernel patch here, I can give it a try. Yes, I mean /usr/local, not /usr/local/bin I have noticed the symlink message in two different circumstances. One in trying to run something in /usr/local/bin, and the other trying to read man pages which access /usr/local/man. Assuming the patches for the kernel yield an rpm kernel, I would be happy to install it on one of the boxes. Should I also try to install the 5.0.7 autofs? (In reply to comment #41) > > Assuming the patches for the kernel yield an rpm kernel, I would be happy to > install it on one of the boxes. Should I also try to install the 5.0.7 > autofs? That's probably not usefull since H.J. Lu has already done that without it yeilding extra information. I'm not sure why the perception is that the problem occurs less with the latest version so, in that sense, you may want to build and install the Rawhide or F18 rpm. Your choice. Created attachment 610823 [details]
new syslog output for autofs-5.0.7-2
Luckily, the issue happened again rather quickly, so we have full logs from startup until the issue occurred. I am running autofs-5.0.7-2.fc17.x86_6 rebuilt from Rawhide with kernel-3.5.2-3.fc17.x86_64 (In reply to comment #44) > Luckily, the issue happened again rather quickly, so we have full logs from > startup until the issue occurred. > > I am running autofs-5.0.7-2.fc17.x86_6 rebuilt from Rawhide with > kernel-3.5.2-3.fc17.x86_64 Just to be absolutly clear. In this case, a simple direct mount /usr/local is automounted once and after expiring the NFS mount on it at 18:38:57 the daemon never gets a callback again and user space processes that should mount it get an ELOOP error. And, the direct mount trigger itself, the autofs fs mount, is still present in /proc/mounts. Is that accurate? Ian Yes, that is how I am reading it. cat /proc/mounts | grep /usr/local auto.direct /usr/local autofs rw,relatime,fd=19,pgrp=964,timeout=3600,minproto=5,maxproto=5,direct 0 0 Michael Created attachment 611990 [details]
automount logs from /home
Today, for the first time, I saw the error on a filesystem besides /usr/local. All previously supplied logs were from an NFS/NIS client mounting from the server. Today, however, the problem filesystem was /home. This filesystem is mounted from /export/users and in this case, it happened on the server itself. ypcat -k auto.direct | grep home /home -rw deathstar:/export/users This system is running 3.5.3-1.fc17.x86_64 and autofs-5.0.6-22.fc17.x86_64 Looking at the logs, the symptom/issue looks the same. Michael (In reply to comment #48) > Today, for the first time, I saw the error on a filesystem besides > /usr/local. > > All previously supplied logs were from an NFS/NIS client mounting from the > server. > > Today, however, the problem filesystem was /home. This filesystem is > mounted from /export/users and in this case, it happened on the server > itself. > > ypcat -k auto.direct | grep home > /home -rw deathstar:/export/users > > This system is running 3.5.3-1.fc17.x86_64 and autofs-5.0.6-22.fc17.x86_64 > > Looking at the logs, the symptom/issue looks the same. It has to be a race crept into the kernel at some point. I'm still thinking about where to put the printks and what to print to get some information (in fact I need to get back to it). Problem is that this could print lots of useless info if I don't use some checks to make it more specific. I have the same problem: [root@d012-05 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links /mci/inf comes fron an NFS server mounted via autofs through ldap automount maps. other maps works fine: [root@d012-05 ~]# ls /mci/eph abib_gh boudi denohe .... [root@d012-05 ~]# automount --dumpmaps ... Mount point: /- type: ldap map: ldap:ou=direct,ou=automount,dc=int-evry,dc=fr ... /mci/inf | -rw,intr,soft gizey:/disk19/inf /mci/eph | -rw,intr,soft gizey:/disk19/eph ... the automount of /mci/inf works fine though on some other stations, on a same station it can work sometimes and other not. A workaround is to restart autofs: [root@d012-05 ~]# systemctl restart autofs.service then [root@d012-05 ~]# ls /mci/inf abid_zue berge_re shoman ... autofs debug logs for the succeded automount above: Oct 5 22:38:53 d012-05 automount[1655]: handle_packet_missing_direct: token 19, name /mci/inf, request pid 1667 Oct 5 22:38:53 d012-05 automount[1655]: attempting to mount entry /mci/inf Oct 5 22:38:53 d012-05 automount[1655]: lookup_mount: lookup(ldap): /mci/inf -> -rw,intr,soft gizey:/disk19/inf Oct 5 22:38:53 d012-05 automount[1655]: mount_mount: mount(nfs): root=/mci/inf name=/mci/inf what=gizey:/disk19/inf, fstype=nfs, options=rw,intr,soft Oct 5 22:38:53 d012-05 automount[1655]: mount_mount: mount(nfs): calling mkdir_path /mci/inf Oct 5 22:38:53 d012-05 automount[1655]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw,intr,soft gizey:/disk19/inf /mci/inf Oct 5 22:38:53 d012-05 automount[1655]: spawn_mount: mtab link detected, passing -n to mount Oct 5 22:38:53 d012-05 automount[1655]: mount_mount: mount(nfs): mounted gizey:/disk19/inf on /mci/inf Oct 5 22:38:53 d012-05 automount[1655]: dev_ioctl_send_ready: token = 19 Oct 5 22:38:53 d012-05 automount[1655]: mounted /mci/inf I run on : fedora17, autofs-5.0.6-22.fc17.i686 , kernel-PAE-3.5.4-1.fc17.i686 any advices greatly appreciated as whe have dozens of fedora17 stations for hundred of users . Thanks . as suggested by a sysadmin, in one of our lab (12 computers), I changed the way automount is fetching ldap maps in /etc/nsswitch I now have: automount: files sss instead of files ldap + adapted sssd.conf as described in https://fedoraproject.org/wiki/Features/SSSDAutoFSSupport maybe it as nothing to do with my pb, but as I am at the point to add a crontab that restarts autofs.service every 15mn (on an another 12 computers lab), I am trying different workarounds . I don't know yet if things will get better as the problem is sporadic, but for now (after 3 hours on the 12 stations) all autofs mounts still work fine. I also blindly, (as I don't really understand what it does) , did a "mount --make-private /" on 2 stations to check if it has any incidence . Thanks for any other clues or suggestions that could help on this sporadic "Too many levels of symbolic links" . (In reply to comment #51) > as suggested by a sysadmin, in one of our lab (12 computers), I changed the > way automount is fetching ldap maps > in /etc/nsswitch I now have: > automount: files sss > instead of files ldap > + adapted sssd.conf as described in > https://fedoraproject.org/wiki/Features/SSSDAutoFSSupport > maybe it as nothing to do with my pb, but as I am at the point to add a > crontab that restarts autofs.service every 15mn (on an another 12 computers > lab), I am trying different workarounds . > > I don't know yet if things will get better as the problem is sporadic, but > for now (after 3 hours on the 12 stations) all autofs mounts still work fine. > I also blindly, (as I don't really understand what it does) , did a "mount > --make-private /" on 2 stations to check if it has any incidence . See Documentation/filesystems/sharedsubtree.txt for an explaination. Basically a shared subtree mount replicates mounts made witin a tree to replicas such as those made by binding a mount to another location. It turns out that syatemd making "/" shared causes a problem with autofs expiration of indirect mounts due to mount point dentry reference count being elevated (so it appears busy) even though the autofs mount is not usable in another mount tree. I'm not sure if the sharedness has any relatition to the problem we are seeing here. Created attachment 625260 [details]
Patch - fix reset pending flag on mount fail
I can't see how this patch would fix the problem we are seeing
here but it is a newly discovered bug and should be checked
in case it is.
If someone can build a kernel with this patch and test it that
would be very much appreciated.
Hi Ian, Unfortunately, that bug is restricted and I can't view it. Can you attach the patch or add me to that ticket? I haven't built my own kernel in a LONG time. I can patch it and if you point me to the right directions to build my own kernel I will be happy to give it a try. Instructions that generate a customer kernel rpm are preferred. The sporadic error appeared today on one machine [root@d012-04 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links is there anything I can check while the problem is happening ? here's what I have [root@d012-04 ~]# uname -a Linux d012-04.int-evry.fr 3.5.4-1.fc17.i686.PAE #1 SMP Mon Sep 17 15:19:42 UTC 2012 i686 i686 i386 GNU/Linux [root@d012-04 ~]# uptime 11:05:00 up 4 days, 18:11, 2 users, load average: 0.00, 0.01, 0.05 [root@d012-04 ~]# ps auwx | grep auto root 12092 0.0 0.0 4732 508 pts/0 S+ 11:05 0:00 grep --color=auto auto root 17424 0.0 0.0 15184 2828 ? S Oct10 0:04 /usr/libexec/sssd/sssd_autofs --debug-to-files root 17433 0.0 0.1 49976 5248 ? Ssl Oct10 0:26 /usr/sbin/automount --pid-file /run/autofs.pid Although I read above in the bug report that the pb could arise because the mount point wasn't correctly umounted, I run those 2 commands [root@d012-04 ~]# df | grep /mci/inf [root@d012-04 ~]# lsof /mci/inf But they returned nothing . Let me know if there are more relevant things I could check while the pb is here . Thanks . (In reply to comment #55) > The sporadic error appeared today on one machine > > [root@d012-04 ~]# ls /mci/inf > ls: cannot open directory /mci/inf: Too many levels of symbolic links > > is there anything I can check while the problem is happening ? > > here's what I have > [root@d012-04 ~]# uname -a > Linux d012-04.int-evry.fr 3.5.4-1.fc17.i686.PAE #1 SMP Mon Sep 17 15:19:42 > UTC 2012 i686 i686 i386 GNU/Linux > [root@d012-04 ~]# uptime > 11:05:00 up 4 days, 18:11, 2 users, load average: 0.00, 0.01, 0.05 > > [root@d012-04 ~]# ps auwx | grep auto > root 12092 0.0 0.0 4732 508 pts/0 S+ 11:05 0:00 grep > --color=auto auto > root 17424 0.0 0.0 15184 2828 ? S Oct10 0:04 > /usr/libexec/sssd/sssd_autofs --debug-to-files > root 17433 0.0 0.1 49976 5248 ? Ssl Oct10 0:26 > /usr/sbin/automount --pid-file /run/autofs.pid > > Although I read above in the bug report that the pb could arise because the > mount point wasn't correctly umounted, I run those 2 commands > > [root@d012-04 ~]# df | grep /mci/inf > [root@d012-04 ~]# lsof /mci/inf > > But they returned nothing . Well, that may be evidence that the patch in comment #53 will fix the problem. The fact that the mount pending flag remains set after the call to ->d_automount() means that it will just try again, and again... but only if there's a success return when the mount is attempted, even though it didn't mount. But I don't yet understand how we can get a success when the mount fails, which really must be the case. Not only that it would only happen to certain types of mounts, such as multi-mounts (which are used by the internal hosts map, for example). Ian following http://fedoraproject.org/wiki/Building_a_custom_kernel I rebuilt a kernel including patch on comment #53 added in kernel.spec Patch833535: autofs4-bug833535.patch then after real 153m59.916s of building ... [root@d012-04 SPECS]# rpm -Uvh /root/rpmbuild/RPMS/i686/kernel-PAE-3.6.1-1.fc17.i686.rpm I cannot tell right away if autofs will fail again with that sporadic automount failure , but I will keep an eye on it For now, after a fresh reboot on that newly patched kernel, it works fine apparently, let's give it some time though ... [root@d012-04 ~]# uname -a Linux d012-04.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Mon Oct 15 20:34:39 CEST 2012 i686 i686 i386 GNU/Linux [root@d012-04 ~]# ls /mci/inf abid_zaz ben_ghue ... we have dozen of computers labs of 12 stations each, only the 2 labs installed on fedora17 showed that problem, none of the ~100 stations in others labs equiped with Fedora16 have shown that problem. Is there something relevant that changed from fedora16 to fedora17 regarding autofs ? my fedora16 runs kernel-PAE-3.4.11-1.fc16.i686 autofs-5.0.6-8.fc16.i686 (In reply to comment #57) > > we have dozen of computers labs of 12 stations each, only the 2 labs > installed on fedora17 showed that problem, none of the ~100 stations in > others labs equiped with Fedora16 have shown that problem. Is there > something relevant that changed from fedora16 to fedora17 regarding autofs ? > > my fedora16 runs > kernel-PAE-3.4.11-1.fc16.i686 > autofs-5.0.6-8.fc16.i686 From my changelog entries it looks like the main difference was to allow for changes in mount.nfs where it now passes options directly to the kernel. This introduced a change to the behavior of rpc for servers that are not available at mount time and we saw large mount waits in that case. This also means that autofs now always probes server availabilty before attempting a mount which is different and the rpc code tries to detect servers that aren't available early in this process to avoid the lengthy delays. One thing that you could do to avoid calling the changed code somewhat is to set MOUNT_WAIT to some sensible value for your site, say 15-30 seconds, and see if that makes a difference. Ian (In reply to comment #57) > > my fedora16 runs > kernel-PAE-3.4.11-1.fc16.i686 > autofs-5.0.6-8.fc16.i686 The other thing you could do is grab the autofs source rpm for f16 and build and install it on f17. That should tell us if we actually should be looking to user space rather than kernel space. I recompiled from SRPMS/autofs-5.0.6-2.fc16.src.rpm to F17 so that my F17 stations runs the F16 version (I renamed autofs-5.0.6-2.fc16 after recompile on f17 to autofs-5.0.6-23_2fc16_itsp.fc17., as on f17 there's a autofs-5.0.6-22 ! to facilitate install by rpm -Uvh ) Unfortunatly this version of autofs does'nt seem to support ldap automount maps look up from sssd at a systemctl restart autofs.service, i get in debug.log: Oct 16 10:36:34 d012-07 automount[9801]: ignored unsupported autofs nsswitch source "sss" reagarding MOUNT_WAIT in /etc/sysconfig/autofs # MOUNT_WAIT - time to wait for a response from umount(8). # Setting this timeout can cause problems when # mount would otherwise wait for a server that # is temporarily unavailable, such as when it's # restarting. The defailt of waiting for mount(8) # usually results in a wait of around 3 minutes. # #MOUNT_WAIT=-1 # # UMOUNT_WAIT - time to wait for a response from umount(8). # #UMOUNT_WAIT=12 you proposed to set it to 30s, default seems to be 3minutes ! do you confirm to test with 30s, when value is commented as above, is it set to default to 3mn ? Thanks . (In reply to comment #60) > > you proposed to set it to 30s, default seems to be 3minutes ! do you confirm > to test with 30s, when value is commented as above, is it set to default to > 3mn ? I said, set mount wait to something sensible (for your site), between 15 and 30 seconds is most likey best. The default setting is -1, which says wait until mount returns which usually means waiting for network protocol timeouts to expire, normally about 3 minutes. Unfortunatly the kernel patch didn't worked Today I found a machine with thet kernel patch (comment 53) having the pb : [root@d012-04 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links [root@d012-04 ~]# uname -a Linux d012-04.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Mon Oct 15 20:34:39 CEST 2012 i686 i686 i386 GNU/Linux [root@d012-04 ~]# uptime 17:57:01 up 18:36, 2 users, load average: 1.00, 1.01, 1.05 I will continue my tests with the MOUNT_WAIT=30 on the other half of the computer labs (6 machines) . regarding the other half of my stations running the tunning on /etc/sysconfig/autofs : #MOUNT_WAIT=-1 MOUNT_WAIT=30 today, there's also a station runing into the pb again [root@d012-12 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links [root@d012-12 ~]# ls /mci/eph ls: cannot open directory /mci/eph: Too many levels of symbolic links other autofs FS do work though o the same station : [root@d012-12 ~]# ls /mci/lor abreu_re bernadal cavallo [root@d012-12 ~]# ps auwx | grep autofs root 663 0.0 0.0 24524 1888 ? S Oct23 0:01 /usr/libexec/sssd/sssd_autofs --debug-to-files root 790 0.0 0.0 47952 2532 ? Ssl Oct23 0:11 /usr/sbin/automount --pid-file /run/autofs.pid So neither "MOUNT_WAIT=30" nor home-built kernel-PAE-3.6.1-1.fc17.i686.rpm patched with patch on comment 53 solve the problem . have you an other idea ? what about /etc/sysconfig/autofs # UMOUNT_WAIT - time to wait for a response from umount(8). # #UMOUNT_WAIT=12 could it be tunned to something else, longer ? shorter ? Thanks . (In reply to comment #63) > > have you an other idea ? Tell me again, what happened with machines you set "/" as private? I know there's something wierd going on there because of a bizare expire bug but I don't know why or how it happens. In that case the reference count of the mountpoint dentry became elevated so it would never expire, like there was some hidden kernel user, but the actual umount worked fine and the reference count returned to normal afterward. Setting "/" as private made the problem go away completely. The expire fix simply removed the troublsome check since it was an optimization and a later check using the kernel mount struct always returned accurrate information. Being sure about "/" being marked private and whether the problem persists is important. I don't think it should make a difference but, when you make "/" private stop autofs, check there are no mounts related to autofs and then start it up. > > what about /etc/sysconfig/autofs > > # UMOUNT_WAIT - time to wait for a response from umount(8). > # > #UMOUNT_WAIT=12 > > could it be tunned to something else, longer ? shorter ? I doubt it's related. I didn't generalized the "mount --make-private /" , just did it once on 2 station, so I cannot tell for now as it didn't last . I'll be glad to generelize it on all the lab stations, but I cannot find how to tell at boot time to mount / as private . is there an /etc/fstab option to do that ? for now I have /dev/mapper/vg-db_vol / ext4 defaults 1 1 I can run it manually, but as the labs station reboot regularly I need a way to automate it. How can I check that / is 'private' mounted ? thanks . We have the same problems here: ~ (habicht@pxe-122) 101 $ ls /opt/envmodules/ ls: cannot access /opt/envmodules/: Too many levels of symbolic links ~ (habicht@pxe-122) 102 $ ls /opt/eod/ Exceed_onDemand_Client_7 Exceed_onDemand_Client_8 ~ (habicht@pxe-122) 103 $ All dirs in /opt are mounted via autofs. After boot everything is ok. It started around 10min after using the dirs the first time (but this is only true for some dirs). We are using F17 and autofs-5.0.6-22.fc17.x86_64. (In reply to comment #65) > I didn't generalized the "mount --make-private /" , just did it once on 2 > station, so I cannot tell for now as it didn't last . > I'll be glad to generelize it on all the lab stations, but I cannot find how > to tell at boot time to mount / as private . > is there an /etc/fstab option to do that ? I don't think that would help if there was since systemd does this after / in mounted I believe. > for now I have > /dev/mapper/vg-db_vol / ext4 defaults 1 1 > > I can run it manually, but as the labs station reboot regularly I need a way > to automate it. Good question, I dodn't know since this is controled by systemd and I don't know of an rc.local equivalent. > How can I check that / is 'private' mounted ? Look in /proc/self/mountinfo Ian (In reply to comment #67) > (In reply to comment #65) > > I didn't generalized the "mount --make-private /" , just did it once on 2 > > station, so I cannot tell for now as it didn't last . > > I'll be glad to generelize it on all the lab stations, but I cannot find how > > to tell at boot time to mount / as private . > > is there an /etc/fstab option to do that ? > > I don't think that would help if there was since systemd does > this after / in mounted I believe. > > > for now I have > > /dev/mapper/vg-db_vol / ext4 defaults 1 1 > > > > I can run it manually, but as the labs station reboot regularly I need a way > > to automate it. > > Good question, I dodn't know since this is controled by systemd > and I don't know of an rc.local equivalent. Actually, this is all a bit too hard. I should be able to change autofs to set its mounts to private so we don't need to worry about systemd. I'll get onto that. How about giving this build a try: http://people.redhat.com/~ikent/autofs-5.0.6-22.fc17.bz833535.1 It should mount the autofs mounts as private. OK, I was on [root@d012-10 ~]# rpm -qa | grep autofs libsss_autofs-1.8.4-14.fc17.i686 autofs-5.0.6-22.fc17.i686 I update to your package: [root@d012-10 ~]# wget http://people.redhat.com/~ikent/autofs-5.0.6-22.fc17.bz833535.1/autofs-5.0.6-22.fc17.bz833535.1.i686.rpm [root@d012-10 ~]# rpm -Uvh autofs-5.0.6-22.fc17.bz833535.1.i686.rpm Préparation... ########################################### [100%] 1:autofs ########################################### [100%] [root@d012-10 ~]# rpm -qa | grep autofs autofs-5.0.6-22.fc17.bz833535.1.i686 I restart my stations and let them live for a while to see if it gets better now . is there a way to "see" that an autofs is "private" mounted ? thanks for the package. My recent experiences seem to suggest at least part of the problem starts in kernel land. Fedora 16 recently rebased to kernel 3.6.2. The very first time I booted F16 into the new kernel I got mount errors. When I boot into the previous 3.4 kernel I don't get this. I have waited a couple of days to report this just to be sure. But it is consistent: each time I boot into 3.6.2, I see automount issues, which I never see when booting in 3.4. The autofs version hasn't changed. At the same time, I'm running some experiments on two F17 machines. On both machines I see the mount issues with kernel 3.6.2, but on one machin I also still have kernel 3.5.5. With that kernel I haven't been able to reproduce any mount issue so far. The other machine still has a 3.5.6 kernel. While playing with this kernel I have had one mount issue over the course of several days, but I can't remember for sure if this was when booted into 3.6.2 or 3.5.6. I'll keep monitoring this system for further information. Maybe this information can help to narrow the search range for potential issues. More info on the machine running 3.5.6. I can reproduce the mount issues easily on this version. I got confused before, because I was working with a simplified automount map, which seemed not to hit the issue. The mount map that works without issues: vialila -rw,soft,intr files.kobaltwit.lan:/home/vialila The mount map that fails: vialila -rw,soft,intr / files.kobaltwit.lan:/home/vialila \ Pictures files.kobaltwit.lan:/home/common/pictures Does this say something regarding the suggestion to mount the root as private ? Perhaps also worth mentioning that while the symptoms on F16 and F17 are exactly the same with kernel 3.6.2, the actual error message on the console is different. F16: "Device or resource is busy" F17: "Too many levels of symbolic links" Adding an "Us Too!" comment. F17 autofs-5.0.6-22.fc17.bz833535.1.x86_64 libsss_autofs-1.8.5-2.fc17.x86_64 kernel-3.6.2-4.fc17.x86_64 nsswitch.conf -> automount: files sss We are getting "Too many levels of symbolic links also." It is inconsistent with events happening sometimes once a day, or every six or seven minutes. We are also seeing this in both F16 and F17 now. We only see the problem with the 3.6.2 kernel though and rolling back to the previous kernel seems to have stopped the problems. I wonder if there are multiple issues going on here? The autofs-5.0.6-22.fc17.bz833535.1.x86_64 build does not seem to have made any difference to the problem here. Problem also disappeared when going back to the initial kernel of F17, but using the rest updated: kernel-3.3.4-5.fc17.x86_64 autofs-5.0.6-22.fc17.x86_64 I must inform that apparently the package autofs-5.0.6-22.fc17.bz833535.1.i686 which mounts autofs mounts as private (cf comment 69) fails : [root@d012-04 ~]# uname -a Linux d012-04.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Mon Oct 15 20:34:39 CEST 2012 i686 i686 i386 GNU/Linux [root@d012-04 ~]# rpm -q autofs autofs-5.0.6-22.fc17.bz833535.1.i686 [root@d012-04 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links [root@d012-04 ~]# date Wed Oct 31 09:53:04 CET 2012 Now I will try to downgrade kernels to 3.3.4-5.fc17.i686.PAE as suggested above . for the record (so that I remember myself) stations d012-06 , 07, 08 downgraded to kernel 3.3.4-5 I am waiting for potentials "Too many levels of symbolic links" on those stations now . to be continued . Thanks for doing the testing, it is very helpful, it will narrow the search but, unfortuneately, the area of code to consider is still quite large. I've been thinking that, given the symptoms, NFS is setting the automount needed flag on the root dentry of mounts (I really can't see why that only happens sometimes). Had a look at that today but it isn't straight forward to see what's happening. I'll run some limited tests and, if it seems OK, I'll post a patch for testing. That will only establish if that really is the problem and won't be a fix. If it is the problem it will be harder to write a patch for NFS to resolve it. Created attachment 636456 [details]
Patch - check nfs automount flag on root
Can someone build a kernel with this patch to see if there's
a problem with the setting of flags on the root of the NFS
mount?
ok I am building with patch on comment 80 just above . as compiling a kernel takes quite some times ... I wonder if I still need to apply also patch on comment 53 in the same time? by the way, I relized that the kernel I built on comment 57 might have not included the patch (c-53). Indeed, although I declared the patch in the kernel.spec rpm I forgot to apply it ! Now for this build, I double check, the fs/namei.c does contain the changes before building. Kernel.spec: ... Patch22071: 3.6.2-stable-queue.patch Patch22535: autofs4-bug833535-c80.patch ... ApplyPatch 3.6.2-stable-queue.patch #autofs NFS ian Kent https://bugzilla.redhat.com/show_bug.cgi?id=833535 ApplyPatch autofs4-bug833535-c80.patch ... let me know if I apply that latest patch (c-80) only or also the one from comment 53 ? Thanks . I applied the patch (c-80) and deployed the corresponding kernel to 4 stations (d012-01 ... d012-04, for the record ...) [root@d012-02 ~]# uname -a Linux d012-02.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Thu Nov 1 18:15:14 CET 2012 i686 i686 i386 GNU/Linux lets give them some time running too see if it gets better . (In reply to comment #81) > ok I am building with patch on comment 80 just above . > as compiling a kernel takes quite some times ... > I wonder if I still need to apply also patch on comment 53 in the same time? > by the way, I relized that the kernel I built on comment 57 might have not > included the patch (c-53). Indeed, although I declared the patch in the > kernel.spec rpm I forgot to apply it ! > > Now for this build, I double check, the fs/namei.c does contain the changes > before building. > > Kernel.spec: > ... > Patch22071: 3.6.2-stable-queue.patch > Patch22535: autofs4-bug833535-c80.patch > ... > ApplyPatch 3.6.2-stable-queue.patch > #autofs NFS ian Kent https://bugzilla.redhat.com/show_bug.cgi?id=833535 > ApplyPatch autofs4-bug833535-c80.patch > ... > > let me know if I apply that latest patch (c-80) only or also the one from > comment 53 ? It's probably better not to also add the comment 53 patch since, if there is a difference, we won't know which patch was responsible. The comment 80 path is just to find out if the cause is even what I think it is (it might not be). Also, be aware that it will probably will break NFSv4 referrals if you are using them. I am afraid that the patch from comment 80 fails, I have 1 of my 4 machines patched that have today the "Too Many levels of symbolic links": [root@d012-01 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links [root@d012-01 ~]# uname -a Linux d012-01.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Thu Nov 1 18:15:14 CET 2012 i686 i686 i386 GNU/Linux That kernel had only patch "c-80", would it be usefull to create a kernel with patch from comment-53 only or to you have better plans ? Thanks . (In reply to comment #84) > I am afraid that the patch from comment 80 fails, I have 1 of my 4 machines > patched that have today the "Too Many levels of symbolic links": > > [root@d012-01 ~]# ls /mci/inf > ls: cannot open directory /mci/inf: Too many levels of symbolic links > [root@d012-01 ~]# uname -a > Linux d012-01.int-evry.fr 3.6.1-1.fc17.i686.PAE #1 SMP Thu Nov 1 18:15:14 > CET 2012 i686 i686 i386 GNU/Linux > > That kernel had only patch "c-80", would it be usefull to create a kernel > with patch from comment-53 only or to you have better plans ? It's possible this error isn't actually comming from the automount code at all. i'm joining the thread too. I see that manual mount works: # mount -o rsize=8192,wsize=8192,ro,tcp,vers=3 server:/home/test /test1 while the corresponding automount has problems with "Too many levels of symbolic links". This happens with Fedora 17 (up to date) clients, and different RHEL or proprietary IBM servers. Same problem here with kernel-3.6.5-1.fc17.x86_64 autofs-5.0.6-22.fc17.x86_64 To me it looks like that the problem starts with kernel-3.6.2, before it was okay. Created attachment 641908 [details]
Patch - use simple_empty() for empty directory check
Can someone give this patch a try please.
ok, I've rebuild the kernel with patch from comment-88 (only, not the other patchs from comment-80 & comment-53) I've now have 4 machines runing that newly patch kernel [root@d012-04 ~]# uptime 17:32:05 up 1 min, 2 users, load average: 1.52, 0.53, 0.19 [root@d012-04 ~]# uname -a Linux d012-04.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Sat Nov 10 13:30:32 CET 2012 i686 i686 i386 GNU/Linux [root@d012-04 ~]# date sam. nov. 10 17:32:32 CET 2012 [root@d012-04 ~]# ls /mci/inf abid_zue berge_re shoman ... for now that works fine. for the record stations d012-01 , 02, 03, 04 are runing patched kernel-PAE-3.6.6-1.fc17.i686.rpm thanks for the patch, let's wait a while and see ... Could bug 851131 be related to some of these problems? (In reply to comment #90) > Could bug 851131 be related to some of these problems? Probably not, since making / private has been tested and a patched autofs that makes its mounts private has also been tested. bad news for the latest patch (c-88) The patch I applied on comment 89 seems to fail :-( at least on one station [root@d012-12 ~]# uptime 10:08:36 up 35 min, 3 users, load average: 0.41, 0.15, 0.11 [root@d012-12 ~]# date Thu Nov 15 10:08:38 CET 2012 [root@d012-12 ~]# uname -a Linux d012-12.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Sat Nov 10 13:30:32 CET 2012 i686 i686 i386 GNU/Linux [root@d012-12 ~]# su - testtsp su: warning: cannot change directory to /mci/ei1215/testtsp: Too many levels of symbolic links I still have other stations running this patched kernel . (In reply to comment #92) > bad news for the latest patch (c-88) > The patch I applied on comment 89 seems to fail :-( > at least on one station > > [root@d012-12 ~]# uptime > 10:08:36 up 35 min, 3 users, load average: 0.41, 0.15, 0.11 > [root@d012-12 ~]# date > Thu Nov 15 10:08:38 CET 2012 > [root@d012-12 ~]# uname -a > Linux d012-12.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Sat Nov 10 13:30:32 > CET 2012 i686 i686 i386 GNU/Linux > [root@d012-12 ~]# su - testtsp > su: warning: cannot change directory to /mci/ei1215/testtsp: Too many levels > of symbolic links What is the map entry corresponding to this path and the corresponding server export? Created attachment 647703 [details]
Patch - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount
Created attachment 647705 [details]
Patch - use simple_empty() for empty directory check
I still can't reproduce this problem so all I can do is try and find a problem with the code. The patch in comment #94 is a result of this while the patch in comment #95 is the result of work on another bug. The patch of comment #53 is already included in kernel-3.6.6-1. Can someone test a kernel with these patches please. Ian I've compiled a new kernel including patchs from c-94 and c-95 unfortunatly after only few minutes of run, the pb arises: [root@d012-05 ~]# uptime 17:58:40 up 26 min, 3 users, load average: 0.00, 0.04, 0.13 [root@d012-05 ~]# uname -a Linux d012-05.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Wed Nov 21 12:02:50 CET 2012 i686 i686 i386 GNU/Linux [root@d012-05 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links (In reply to comment #97) > I've compiled a new kernel including patchs from c-94 and c-95 > unfortunatly after only few minutes of run, the pb arises: > > [root@d012-05 ~]# uptime > 17:58:40 up 26 min, 3 users, load average: 0.00, 0.04, 0.13 > [root@d012-05 ~]# uname -a > Linux d012-05.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Wed Nov 21 12:02:50 > CET 2012 i686 i686 i386 GNU/Linux > [root@d012-05 ~]# ls /mci/inf > ls: cannot open directory /mci/inf: Too many levels of symbolic links There are two things that we haven't done. One is to try and work out why I can't reproduce the problem. There must be something different between my enviroment and ones where this occurs. So what are those environments, server export perameters, server os and version and anything else you can think of? Second, an strace of the ls command so I can identify exactly which system call returns the ELOOP might help. Created attachment 649476 [details]
strace from ls -la
This is a strace of the ls command (Too many levels of symbolic links)
Server OS: Scientific Linux release 6.3 (Carbon), Kernel: 2.6.32-279.14.1.el6.x86_64 Client OS: Fedora 17, Kernel: 3.6.6-1.fc17.x86_64 Export parameters: /export *(rw,root_squash,no_subtree_check,async,insecure,fsid=0) /export/home *(rw,root_squash,no_subtree_check,async,insecure,nohide) Mount parameters: * -fstype=nfs,nfsvers=4,rw,nosuid,intr,noatime,acdirmin=2 nfs:/home/& Hope this helps… The mount parameters are not correct, here the correct parameters: * -fstype=nfs4,port=2049,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/& Is anyone seeing anything is syslog like "VFS: ...." at all? (In reply to comment #100) > Server OS: > Scientific Linux release 6.3 (Carbon), Kernel: 2.6.32-279.14.1.el6.x86_64 > > Client OS: > Fedora 17, Kernel: 3.6.6-1.fc17.x86_64 > > Export parameters: > /export *(rw,root_squash,no_subtree_check,async,insecure,fsid=0) > /export/home *(rw,root_squash,no_subtree_check,async,insecure,nohide) Do you still see this if you don't use the nfs4 global root? ie. remove the fsid=0 form /export and the nohide from /export/home > > Mount parameters: > * -fstype=nfs,nfsvers=4,rw,nosuid,intr,noatime,acdirmin=2 nfs:/home/& > > Hope this helps… (In reply to comment #102) > Is anyone seeing anything is syslog like "VFS: ...." at all? Nov 13 16:10:31 foo kernel: [ 4.178389] VFS: Disk quotas dquot_6.5.2 (In reply to comment #103) > (In reply to comment #100) > > Server OS: > > Scientific Linux release 6.3 (Carbon), Kernel: 2.6.32-279.14.1.el6.x86_64 > > > > Client OS: > > Fedora 17, Kernel: 3.6.6-1.fc17.x86_64 > > > > Export parameters: > > /export *(rw,root_squash,no_subtree_check,async,insecure,fsid=0) > > /export/home *(rw,root_squash,no_subtree_check,async,insecure,nohide) > > Do you still see this if you don't use the nfs4 global root? > ie. remove the fsid=0 form /export and the nohide from /export/home Oh, hang on, how does, what looks like an automount map entry below relate to /export on the server at all? > > > > > Mount parameters: > > * -fstype=nfs,nfsvers=4,rw,nosuid,intr,noatime,acdirmin=2 nfs:/home/& > > > > Hope this helps… (In reply to comment #105) > (In reply to comment #103) > > (In reply to comment #100) > > > Server OS: > > > Scientific Linux release 6.3 (Carbon), Kernel: 2.6.32-279.14.1.el6.x86_64 > > > > > > Client OS: > > > Fedora 17, Kernel: 3.6.6-1.fc17.x86_64 > > > > > > Export parameters: > > > /export *(rw,root_squash,no_subtree_check,async,insecure,fsid=0) > > > /export/home *(rw,root_squash,no_subtree_check,async,insecure,nohide) > > > > Do you still see this if you don't use the nfs4 global root? > > ie. remove the fsid=0 form /export and the nohide from /export/home > > Oh, hang on, how does, what looks like an automount map entry > below relate to /export on the server at all? Hmmm… I think I don't understand the question, the automount map entry is direct from /etc/auto.home * -fstype=nfs4,port=2049,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/& and for /home/jm (for example) it looks then like this jm -fstype=nfs4,port=2049,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/jm mount shows then this: nfs:/home/jm on /home/jm type nfs4 (rw,nosuid,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,acdirmin=2,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.140.38,local_lock=none,addr=192.168.140.78) > > > > > > > > Mount parameters: > > > * -fstype=nfs,nfsvers=4,rw,nosuid,intr,noatime,acdirmin=2 nfs:/home/& > > > > > > Hope this helps… A second server has this as export: /srv/export gss/krb5p(rw,root_squash,no_subtree_check,async,insecure,fsid=0) /srv/export/home gss/krb5p(rw,root_squash,no_subtree_check,async,insecure,nohide) and this is the automount map entry for the client: * -fstype=nfs4,port=2049,sec=krb5p,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/& I have the same problem (Too many levels of symbolic links) on the second server as well. (In reply to comment #106) > (In reply to comment #105) > > (In reply to comment #103) > > > (In reply to comment #100) > > > > Server OS: > > > > Scientific Linux release 6.3 (Carbon), Kernel: 2.6.32-279.14.1.el6.x86_64 > > > > > > > > Client OS: > > > > Fedora 17, Kernel: 3.6.6-1.fc17.x86_64 > > > > > > > > Export parameters: > > > > /export *(rw,root_squash,no_subtree_check,async,insecure,fsid=0) > > > > /export/home *(rw,root_squash,no_subtree_check,async,insecure,nohide) > > > > > > Do you still see this if you don't use the nfs4 global root? > > > ie. remove the fsid=0 form /export and the nohide from /export/home > > > > Oh, hang on, how does, what looks like an automount map entry > > below relate to /export on the server at all? > > Hmmm… I think I don't understand the question, the automount map entry is > direct from /etc/auto.home > > * -fstype=nfs4,port=2049,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/& > > and for /home/jm (for example) it looks then like this > > jm -fstype=nfs4,port=2049,rw,nosuid,intr,relatime,acdirmin=2 nfs:/home/jm > > mount shows then this: > nfs:/home/jm on /home/jm type nfs4 > (rw,nosuid,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,acdirmin=2, > hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.140.38, > local_lock=none,addr=192.168.140.78) > > > > > > > > > > > > Mount parameters: > > > > * -fstype=nfs,nfsvers=4,rw,nosuid,intr,noatime,acdirmin=2 nfs:/home/& > > > > > > > > Hope this helps… > > A second server has this as export: > /srv/export > gss/krb5p(rw,root_squash,no_subtree_check,async,insecure,fsid=0) > /srv/export/home > gss/krb5p(rw,root_squash,no_subtree_check,async,insecure,nohide) > > and this is the automount map entry for the client: > * -fstype=nfs4,port=2049,sec=krb5p,rw,nosuid,intr,relatime,acdirmin=2 > nfs:/home/& > > I have the same problem (Too many levels of symbolic links) on the second > server as well. The automount map entries don't appear to mount any of the exports from the servers you describe. (In reply to comment #107) > The automount map entries don't appear to mount any of the exports > from the servers you describe. It mounts the entries below /export/home, e.g. /export/home/jm, /export/home/gm, etc. it is not necessary to export every single Userdirectory below /export/home it is sufficient to export only the main directory /export/home. (In reply to comment #108) > (In reply to comment #107) > > > The automount map entries don't appear to mount any of the exports > > from the servers you describe. > > It mounts the entries below /export/home, e.g. /export/home/jm, > /export/home/gm, etc. it is not necessary to export every single > Userdirectory below /export/home it is sufficient to export only the main > directory /export/home. That is an obvious assumption I've made. How does the setup map from /home to /export/home? There is no way for me to even begin to try and duplicate this if I don't know how it is setup! regarding my autofs map (from ldap) here it is (for /mci/inf) # mci_inf#076, direct, automount, int-evry.fr dn: cn=mci_inf#076,ou=direct,ou=automount,dc=int-evry,dc=fr automountInformation: -fstype=autofs ldap:ou=direct.mci_inf#076,ou=direct,ou=a utomount,dc=int-evry,dc=fr cn: mci_inf#076 objectClass: top objectClass: automount # /mci/inf#076, direct.mci_inf#076, direct, automount, int-evry.fr dn: description=/mci/inf#076,ou=direct.mci_inf#076,ou=direct,ou=automount,dc=i nt-evry,dc=fr automountInformation: -rw,intr,soft gizeh:/disk19/inf cn: /mci/inf description: /mci/inf#076 objectClass: top objectClass: automount [root@d012-05 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links regarding VFS string in the logs: [root@d012-05 log]# grep VFS messages Nov 19 13:38:18 d012-05 kernel: [ 0.498067] VFS: Disk quotas dquot_6.5.2 Nov 20 09:18:09 d012-05 kernel: [ 0.499029] VFS: Disk quotas dquot_6.5.2 Nov 20 09:21:13 d012-05 kernel: [ 0.497768] VFS: Disk quotas dquot_6.5.2 Nov 21 10:15:43 d012-05 kernel: [ 0.498080] VFS: Disk quotas dquot_6.5.2 Nov 21 17:32:14 d012-05 kernel: [ 0.506066] VFS: Disk quotas dquot_6.5.2 and finally strace on the faulty /mci/inf, quite long ... hope this help ? Thanks [root@d012-05 ~]# strace ls -la /mci/inf execve("/bin/ls", ["ls", "-la", "/mci/inf"], [/* 31 vars */]) = 0 brk(0) = 0x85e0000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7768000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=112294, ...}) = 0 mmap2(NULL, 112294, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb774c000 close(3) = 0 open("/lib/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\340(SM4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=130708, ...}) = 0 mmap2(0x4d52e000, 138376, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4d52e000 mmap2(0x4d54d000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e) = 0x4d54d000 mmap2(0x4d54f000, 3208, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4d54f000 close(3) = 0 open("/lib/librt.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 \311PM4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=42224, ...}) = 0 mmap2(0x4d50b000, 33324, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4d50b000 mmap2(0x4d512000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6) = 0x4d512000 close(3) = 0 open("/lib/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\216[N4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=16900, ...}) = 0 mmap2(0x4e5b8000, 18096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4e5b8000 mmap2(0x4e5bc000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x4e5bc000 close(3) = 0 open("/lib/libacl.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\3206\371N4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=36124, ...}) = 0 mmap2(0x4ef92000, 33116, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4ef92000 mmap2(0x4ef99000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x7) = 0x4ef99000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\220\0072M4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=2011672, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774b000 mmap2(0x4d307000, 1776316, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4d307000 mprotect(0x4d4b2000, 4096, PROT_NONE) = 0 mmap2(0x4d4b3000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ab) = 0x4d4b3000 mmap2(0x4d4b6000, 10940, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4d4b6000 close(3) = 0 open("/lib/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\320JPM4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=19780, ...}) = 0 mmap2(0x4d504000, 16496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4d504000 mmap2(0x4d507000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2) = 0x4d507000 close(3) = 0 open("/lib/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0P\331NM4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=131156, ...}) = 0 mmap2(0x4d4e8000, 102908, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4d4e8000 mmap2(0x4d4fe000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15) = 0x4d4fe000 mmap2(0x4d500000, 4604, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4d500000 close(3) = 0 open("/lib/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0p~>N4\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=19460, ...}) = 0 mmap2(0x4e3e7000, 20652, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x4e3e7000 mmap2(0x4e3eb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3) = 0x4e3eb000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb774a000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7749000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7749740, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 mprotect(0x8064000, 4096, PROT_READ) = 0 mprotect(0x4d54d000, 4096, PROT_READ) = 0 mprotect(0x4d512000, 4096, PROT_READ) = 0 mprotect(0x4ef99000, 4096, PROT_READ) = 0 mprotect(0x4d4b3000, 8192, PROT_READ) = 0 mprotect(0x4d507000, 4096, PROT_READ) = 0 mprotect(0x4d303000, 4096, PROT_READ) = 0 mprotect(0x4d4fe000, 4096, PROT_READ) = 0 mprotect(0x4e3eb000, 4096, PROT_READ) = 0 munmap(0xb774c000, 112294) = 0 set_tid_address(0xb77497a8) = 15515 set_robust_list(0xb77497b0, 12) = 0 rt_sigaction(SIGRTMIN, {0x4d4ed3f0, [], SA_SIGINFO}, NULL, 8) = 0 rt_sigaction(SIGRT_1, {0x4d4ed480, [], SA_RESTART|SA_SIGINFO}, NULL, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 uname({sys="Linux", node="d012-05.int-evry.fr", ...}) = 0 statfs64("/sys/fs/selinux", 84, 0xbfbc47fc) = -1 ENOENT (No such file or directory) statfs64("/selinux", 84, 0xbfbc47fc) = -1 ENOENT (No such file or directory) brk(0) = 0x85e0000 brk(0x8601000) = 0x8601000 open("/proc/filesystems", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000 read(3, "nodev\tsysfs\nnodev\trootfs\nnodev\tb"..., 1024) = 370 read(3, "", 1024) = 0 close(3) = 0 munmap(0xb7767000, 4096) = 0 open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=105038208, ...}) = 0 mmap2(NULL, 2097152, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7549000 close(3) = 0 open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000 read(3, "# Locale name alias data base.\n#"..., 4096) = 2512 read(3, "", 4096) = 0 close(3) = 0 munmap(0xb7767000, 4096) = 0 open("/usr/lib/locale/US/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 ioctl(1, TIOCGWINSZ, {ws_row=41, ws_col=131, ws_xpixel=0, ws_ypixel=0}) = 0 lstat64("/mci/inf", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 lgetxattr("/mci/inf", "security.selinux", 0x85e3ea8, 255) = -1 EOPNOTSUPP (Operation not supported) getxattr("/mci/inf", "system.posix_acl_access", 0x0, 0) = -1 EOPNOTSUPP (Operation not supported) socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_FILE, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_FILE, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=1717, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000 read(3, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1717 read(3, "", 4096) = 0 close(3) = 0 munmap(0xb7767000, 4096) = 0 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=112294, ...}) = 0 mmap2(NULL, 112294, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb774c000 close(3) = 0 open("/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\32\0\0004\0\0\0"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=55108, ...}) = 0 mmap2(NULL, 50144, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb753c000 mmap2(0xb7547000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xa) = 0xb7547000 close(3) = 0 mprotect(0xb7547000, 4096, PROT_READ) = 0 munmap(0xb774c000, 112294) = 0 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=2072, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 2072 close(3) = 0 munmap(0xb7767000, 4096) = 0 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_FILE, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3 connect(3, {sa_family=AF_FILE, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) close(3) = 0 open("/etc/group", O_RDONLY|O_CLOEXEC) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=757, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7767000 read(3, "root:x:0:\nbin:x:1:\ndaemon:x:2:\ns"..., 4096) = 757 close(3) = 0 munmap(0xb7767000, 4096) = 0 openat(AT_FDCWD, "/mci/inf", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = -1 ELOOP (Too many levels of symbolic links) write(2, "ls: ", 4ls: ) = 4 write(2, "cannot open directory /mci/inf", 30cannot open directory /mci/inf) = 30 write(2, ": Too many levels of symbolic li"..., 35: Too many levels of symbolic links) = 35 write(2, "\n", 1 ) = 1 close(1) = 0 close(2) = 0 exit_group(2) = ? +++ exited with 2 +++ (In reply to comment #109) > (In reply to comment #108) > > (In reply to comment #107) > > > > > The automount map entries don't appear to mount any of the exports > > > from the servers you describe. > > > > It mounts the entries below /export/home, e.g. /export/home/jm, > > /export/home/gm, etc. it is not necessary to export every single > > Userdirectory below /export/home it is sufficient to export only the main > > directory /export/home. > > That is an obvious assumption I've made. > > How does the setup map from /home to /export/home? It's a simple bind /mnt/home /export/home none bind 0 0 > There is no way for me to even begin to try and duplicate this > if I don't know how it is setup! I think the problem is part of the kernel on the client system (The problem disappears when I switch back to the initial kernel of F17)… maybe it's not a problem to mount the directory, maybe autofs fails to unmount the directory correctly? I know, I'm just guessing, I have no clue what exactly triggers the error, I tried a lot of differnt mount options and still the same result, so far the only workaround is to switch back to an old kernel version. (In reply to comment #110) > regarding my autofs map (from ldap) here it is (for /mci/inf) > > # mci_inf#076, direct, automount, int-evry.fr > dn: cn=mci_inf#076,ou=direct,ou=automount,dc=int-evry,dc=fr > automountInformation: -fstype=autofs > ldap:ou=direct.mci_inf#076,ou=direct,ou=a > utomount,dc=int-evry,dc=fr > cn: mci_inf#076 > objectClass: top > objectClass: automount > > > # /mci/inf#076, direct.mci_inf#076, direct, automount, int-evry.fr > dn: > description=/mci/inf#076,ou=direct.mci_inf#076,ou=direct,ou=automount,dc=i > nt-evry,dc=fr > automountInformation: -rw,intr,soft gizeh:/disk19/inf > cn: /mci/inf > description: /mci/inf#076 > objectClass: top > objectClass: automount I'll assume the #076 isn't significant, the key /mci/inf does get a lookup kit after all. > > [root@d012-05 ~]# ls /mci/inf > ls: cannot open directory /mci/inf: Too many levels of symbolic links > > regarding VFS string in the logs: > > [root@d012-05 log]# grep VFS messages > Nov 19 13:38:18 d012-05 kernel: [ 0.498067] VFS: Disk quotas dquot_6.5.2 > Nov 20 09:18:09 d012-05 kernel: [ 0.499029] VFS: Disk quotas dquot_6.5.2 > Nov 20 09:21:13 d012-05 kernel: [ 0.497768] VFS: Disk quotas dquot_6.5.2 > Nov 21 10:15:43 d012-05 kernel: [ 0.498080] VFS: Disk quotas dquot_6.5.2 > Nov 21 17:32:14 d012-05 kernel: [ 0.506066] VFS: Disk quotas dquot_6.5.2 Right, so the spot I was wondering about isn't throughing a kernel warning, thanks for that. > > and finally strace on the faulty /mci/inf, quite long ... > hope this help ? This is what I wanted to confirm. I've been looking at the open call. Like I said the problem may not be in the automounting code but somewhere else, triggered by the act of automounting. > openat(AT_FDCWD, "/mci/inf", > O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = -1 ELOOP (Too many > levels of symbolic links) Thanks for that too. One other thing, does the server export your nfs mounts under a global root? Can I see the server export entries (again plaease if you've posted them before). And the last thing, the last patch(es) I posted seemeds to fail much more quickly than without them but it shouldn't have. Is that the case or have you not done any further testing. Ian (In reply to comment #111) > (In reply to comment #109) > > (In reply to comment #108) > > > (In reply to comment #107) > > > > > > > The automount map entries don't appear to mount any of the exports > > > > from the servers you describe. > > > > > > It mounts the entries below /export/home, e.g. /export/home/jm, > > > /export/home/gm, etc. it is not necessary to export every single > > > Userdirectory below /export/home it is sufficient to export only the main > > > directory /export/home. > > > > That is an obvious assumption I've made. > > > > How does the setup map from /home to /export/home? > > It's a simple bind > > /mnt/home /export/home none bind 0 0 I still can't see how this fits together, ie. what the /home of the nfs:/home is on the server. How is your automounting setup meant to work? > > > There is no way for me to even begin to try and duplicate this > > if I don't know how it is setup! > > I think the problem is part of the kernel on the client system (The problem > disappears when I switch back to the initial kernel of F17)… maybe it's not > a problem to mount the directory, maybe autofs fails to unmount the > directory correctly? I know, I'm just guessing, I have no clue what exactly > triggers the error, I tried a lot of differnt mount options and still the > same result, so far the only workaround is to switch back to an old kernel > version. Al that really says is that it probably isn't the kernel or autofs mounting code since that has seen very little change since very early 3.0 kernels. The path walking code in the VFS OTOH has been virtaully re-written. (In reply to comment #113) > I still can't see how this fits together, ie. what the /home > of the nfs:/home is on the server. You can use autofs on the server as well (with the nosymlink option of autofs) or a bind mount or nothing at all. The users don't log onto the server so a /home/<user> is not really necessary there. Btw. /mnt/home is "/dev/mapper/vg1-lvhome /mnt/home ext4 defaults 1 2" That's everything you need if you really want to recreate the config. > How is your automounting setup meant to work? It mounts the home directory of the user when he or she logs in. It works since many many years without any problems, just trust me :-). You are obsessed with my configuration :-), so please take the config from procaccia. > Al that really says is that it probably isn't the kernel or > autofs mounting code since that has seen very little change > since very early 3.0 kernels. The path walking code in the > VFS OTOH has been virtaully re-written. Something changed between kernel 3.3.4-5.fc17.x86_64 and 3.6.6-1.fc17.x86_64, I changed nothing on the server side and it looks like that it is not a problem with autofs itself because the same version which creates problems with kernel 3.6.6-1.fc17.x86_64 works with kernel 3.3.4-5.fc17.x86_64. I really want to help but I have no clue what exactly fails, the kernel code, autofs, or whatever… I can't trigger the error on purpose (I tried :-)) and so I can't create a configuration which triggers the error 100%, it worked for a while without problems but yesterday I had the problem again, that's the reason I answerd to your question from comment #98 my /etc/exports file regarding /mci/inf => /disk19 on the NFS server is: /disk19 @s2ia(rw,async) @serveur(rw,async) @stpfix(rw,async) [root@gizeh /etc] $ grep disk19 /proc/fs/nfs/exports /disk19 @stpfix(rw,root_squash,async,wdelay,no_subtree_check) /disk19 @serveur(rw,root_squash,async,wdelay,no_subtree_check) on the client side, on station d012-12 (kernel from comment-88, not comment-97 !) I have the problem also [root@d012-12 ~]# uptime 18:18:16 up 1 day, 3:50, 2 users, load average: 0.16, 0.12, 0.38 [root@d012-12 ~]# uname -a Linux d012-12.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Sat Nov 10 13:30:32 CET 2012 i686 i686 i386 GNU/Linux [root@d012-12 ~]# ls /mci/inf ls: cannot open directory /mci/inf: Too many levels of symbolic links but what is interesting is that from the same server export (/disk19) it works fine for another subdirectory (other sub map) [root@d012-12 ~]# ls /mci/eph abib_ghi ..... [root@d012-12 ~]# df -H /mci/eph Filesystem Size Used Avail Use% Mounted on gizeh:/disk19/eph/ 212G 87G 115G 44% /mci/eph concerning the fact that it failed more quickly on d012-05 (latest kernel with patches c-94 & c-95 Linux d012-05.int-evry.fr 3.6.6-1.fc17.i686.PAE #1 SMP Wed Nov 21 12:02:50 CET 2012 ) I cannot tell, it happened just once for now and only on that machine . Of course, I have the same problem as everybody here since 1 month approximatively, same release F17, same kernels updated by yum. When I tried to access to "/infres/s3" an automount point I observe that : /var/log/messages Nov 22 18:15:48 mesa automount[2780]: umount_autofs_indirect: ask umount returned busy /stud Nov 22 18:15:50 mesa automount[2780]: umount_autofs_indirect: ask umount returned busy /infres Nov 22 18:15:52 mesa automount[5265]: do_reconnect: lookup(ldap): failed to find available server # these 4 mounting points are not unmounted [root@mesa: 61] ls /infres bd/ ic2/ s3/ stag/ # and are unaccessible [root@mesa: 62] ls /infres/s3 /bin/ls: cannot open directory /infres/s3: Too many levels of symbolic links [root@mesa: 63] ls /infres/bd /bin/ls: cannot open directory /infres/bd: Too many levels of symbolic links [root@mesa: 64] ls /infres/ic2 /bin/ls: cannot open directory /infres/ic2: Too many levels of symbolic links [root@mesa: 65] ls /infres/stag /bin/ls: cannot open directory /infres/stag: Too many levels of symbolic links # access to a new mounting point works [root@mesa: 66] ls /infres/sr ahmed/ diamanti/ jouguet/ makiou/ natouri/ spina/ zhioua/ alleaume/ famulari/ kaplan/ marin/ pappa/ ttnguyen/ zwang/ aranda/ fotue/ kumarps/ markham/ qin/ urien/ benchaib/ hamdane/ labiod/ moalla/ riguidel/ vdang/ dau/ hecker/ leneutre/ msahli/ sohbi/ zhao/ # and we see it [root@mesa: 67] ls /infres bd/ ic2/ s3/ sr/ stag/ May be this can help ? Philippe Sorry, I forgot to precise that the logs in /var/log/messages comes from restarting the automount daemon (pid changed) by systemctl restart autofs.service but without any effects. (In reply to comment #116) > Of course, I have the same problem as everybody here since 1 month > approximatively, same release F17, same kernels updated by yum. > > When I tried to access to "/infres/s3" an automount point I observe that : > > /var/log/messages > Nov 22 18:15:48 mesa automount[2780]: umount_autofs_indirect: ask umount > returned busy /stud > Nov 22 18:15:50 mesa automount[2780]: umount_autofs_indirect: ask umount > returned busy /infres > Nov 22 18:15:52 mesa automount[5265]: do_reconnect: lookup(ldap): failed to > find available server > Trying to find a workaround fro this problem I've set TIMEOUT=0 in /etc/sysconfig/autofs, and this seems to help. Just as a temporary remedy. - Sergey (In reply to comment #118) > Trying to find a workaround fro this problem I've set TIMEOUT=0 in > /etc/sysconfig/autofs, and this seems to help. Just as a temporary remedy. I have applied your tuning and since 2 days it seems effective, everything works fine without any errors. Thanks for your help. Philippe -- Adding an additional user - multiple f17 clients and a f17 server which serves ups nfs /home automounted for any client as their /home. I just started getting hit with this when completed the upgrades this last weekend. I don't believe I was seeing it with a F17 client to F16 server. A user can apparently only log in once per system. If they try to log in again they get the symbolic link error. If they go a system where they haven't logged in, they can do so - just not twice on the same system. Restarting autofs clears the issue - whether the end problem turns out to be in NFS or kernel. Server: /home x.x.x.x/25(rw,async,insecure,no_root_squash,no_subtree_check,anonuid=65534,anongid=65534,fsid=0) Clients auto.home: * -fstype=nfs4,rw,rsize=1048576,wsize=1048576,intr,soft,proto=tcp,port=2049,noatime x.x.x.x:/& Trying the TIMEOUT=0 as a workaround, although it isn't ideal it is better than being locked out. Hope it works here too. (In reply to comment #120) > Adding an additional user - multiple f17 clients and a f17 server which > serves ups nfs /home automounted for any client as their /home. I just > started getting hit with this when completed the upgrades this last weekend. > I don't believe I was seeing it with a F17 client to F16 server. We see this with f17 clients to a CentOS 5 Server 2.6.18-308.16.1.el5 The TIMEOUT=0 in /etc/sysconfig/autofs trick does not work for me (In reply to comment #122) > The TIMEOUT=0 in /etc/sysconfig/autofs trick does not work for me I had to make the mounts browsable as well as revert to the 3.3.4 kernel before we were able to function normally. (In reply to comment #122) > The TIMEOUT=0 in /etc/sysconfig/autofs trick does not work for me Confirmed for 3.6.8-2.fc17.x86_64 Hi! Reverting to 3.3.4, set TIMEOUT=0 and browsable mounts didn't fix this completly. If you umount an automount directory, an than cd to this directory, you get the "Too many.." message. @H.J. Lu: I think, the bug priority should be set to URGENT! (In reply to comment #125) > Hi! > > Reverting to 3.3.4, set TIMEOUT=0 and browsable mounts didn't fix this > completly. > If you umount an automount directory, an than cd to this directory, you get > the "Too many.." message. According to the testing that's been done here that's a different bug and doesn't fix the original problem. It's fixed by the patches in comments #94 and #95. As I see upstream (today) there's also a correction needed, which I'll post here as well. Manual umounting isn't supported, especially for offset mounts like these, but I will at least try and fix problems that arise because of it, such as is the case with these patches. There is an NFS patch which may be related, I'll also post that for those that wish to test it. Ian Created attachment 672268 [details] Patch - Fix sparse warning: context imbalance in autofs4_d_automount() different lock contexts for basic block This patch is a correction to the patch of comment #95. Created attachment 672270 [details]
Patch - don't do blind d_drop() in nfs_prime_dcache()
Another possibly related patch for testing.
Meanwhile I upgraded one of my systems to Fedora 18 kernel-PAE-3.7.7-201.fc18.i686 autofs-5.0.7-10.fc18.i686 Autofs is still not working properly. I can access my directories straight after startup, but after some time they are not found anymore. I have switched back to kernel 3.5.5.fc17 for now. This may be tempting fate, but I don't think I have seen this problem occur on any of our lab machines since they applied kernel 3.7.3-101.fc17.x86_64 from Fedora a couple of weeks ago. (In fact, most of them are now running 3.7.6-102.fc17.x86_64.) Since my last comment I have set TIMEOUT=0 in /etc/sysconfig/autofs. I haven't experienced any timeout problems since. Current kernel is 3.7.9-201.fc18.i686.PAE. Running on 3 PC's now. With TIMEOUT=300 (the default on Fedora), I got a hanging system frequently. That bad even that is DOS'ed my file server (Centos 5, kernel 2.6.18-308.16.1.el5xen). The server is running 100% with no way to access it. Only a forced reboot helped (it's a virtual server running on the xen vm). This may be a totally unrelated problem though. If you want I can generate a separate bugreport for it. This message is a reminder that Fedora 17 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 17. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '17'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 17's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 17 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior to Fedora 17's end of life. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |