Bug 868985
Summary: | "too many symbolic links" error appears on mounted filesystems | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Art Werschulz <agw> | ||||
Component: | kernel | Assignee: | nfs-maint | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 17 | CC: | Bert.Deknuydt, gansalmon, ikent, irlapati, itamar, jforbes, jlayton, jonathan, j, jtrutwin, kernel-maint, madhu.chinakonda, mauricio.esguerra, mkfischer, moniot, nneul, w3euu | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-11 20:51:50 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Art Werschulz
2012-10-22 16:26:34 UTC
I am experiencing this problem as well - some more info: /etc/auto.net: people -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/usr/people home -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/home mail -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/var/mail physics -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/apps/linux/physics fortran -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/apps/linux/fortran plasma -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/plasma ccd -fstype=nfs,rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/ccd /usr/people is symlinked to /net/people, /usr/local/physics is symlinked to /net/physics, /home is symlinked to /net/home When the problem happens it seems like /usr/people and /usr/local/physics always fail, but /home and /var/spool/mail are ok. The problem happened after doing a full upgrade on my fedora 17 clients this weekend. Autofs is still version 5.0.6-22 but kernel is now 3.6.7-4.fc17.x86_64. Sometimes restarting autofs fixes it but lately it does not and a full reboot is needed. I've tried: service autofs stop automount -d -v -f But whenever I list /usr/people nothing is displayed but the automount command, just Too Many Levels of Symlinks on the ls. NFS server is RHEL 6.3 but has not changed configuration or kernel version when this started happening. I'll attach my /etc/exports to the ticket. Please let me know what additional information is helpful. Created attachment 652128 [details]
NFS server exports
This is the RHEL 6.3 NFS server exports. There is a mix of v3 and v4 exports due to issues with the idmapper forcing me to return to NFS3 on the fedora clients. This has been unaltered tho for months.
One thing I noticed, likely it's just a symptom - when I ls -al /net on my system, the ones with too many symlinks have different perms than the mounts that still work: # ls -al /net dr-xr-xr-x 2 root root 0 Nov 25 01:48 fortran drwxr-xr-x 36 root root 4096 Sep 13 08:05 home dr-xr-xr-x 2 root root 0 Nov 24 18:36 mail drwxr-xr-x 17 root root 4096 Apr 24 2012 people dr-xr-xr-x 2 root root 0 Nov 25 01:48 physics dr-xr-xr-x 2 root root 0 Nov 24 21:00 plasma If this case, all the 555 ones throw the error, the 755 ones are fine (home/people). Also, I noticed that nfs-utils was updated to version 1.2.6-5.fc17.x86_64 over the weekend, not sure if it's to blame. What is strange is that if I manually mount instead of using the automounter it works just fine: # mount -t nfs -s -o rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid elm:/var/mail /net/mail I'm going to try to revert to kernel 3.6.6-1 until this is fixed... Ian, might this be a duplicate of 833535? Josh, you may want to try pulling in commit 696199f8ccf from upstream kernels and see if it helps. (In reply to comment #4) > Ian, might this be a duplicate of 833535? > > Josh, you may want to try pulling in commit 696199f8ccf from upstream > kernels and see if it helps. This may be a problem that's has been around since 3.5, maybe earlier. I haven't seen this exact senario either, it's never been reported when using only indirect mounts like this but perhaps the symlink following is a new clue. I've only ever seen it myself once, and could not reproduce it after changing exports on the server and then changing them back again. The real problem is I can't reproduce it. I've thought that bug 833535 might be related but haven't checked the timeline closely for when that NFS change was made. We definitely need to check if the upstream commit fixes the problem seen here, although it may not solve my existing problem. See also bug 833535. Ian (In reply to comment #3) > > If this case, all the 555 ones throw the error, the 755 ones are fine > (home/people). Yeah, maybe a further clue, not sure. > > Also, I noticed that nfs-utils was updated to version 1.2.6-5.fc17.x86_64 > over the weekend, not sure if it's to blame. I doubt that is realated. I think it's a kernel issue. > > What is strange is that if I manually mount instead of using the automounter > it works just fine: > > # mount -t nfs -s -o rsize=8192,wsize=8192,nfsvers=3,hard,intr,nodev,nosuid > elm:/var/mail /net/mail Yeah, tell me about it, I've gone over the autofs and vfs code in detail many times looking for this and I just don't see a problem. Assuming of course this is my exiting problem ..... At this point I believe the issue is an unextected interaction between the NFS client and server, like bug 833535, but I can't nail down what leads to it. > > I'm going to try to revert to kernel 3.6.6-1 until this is fixed... That will be interesting because that kernel definitely has the problem I'm struggling with, although it hasn't been seen with indirect mounts before. Ian (In reply to comment #5) > The real problem is I can't reproduce it. I can get it to happen fairly consistently here, anything I can do to help? I've since switched all systems to manually NFS mounts in /etc/fstab, it's only a problem when using the automounter. Josh (In reply to comment #7) > (In reply to comment #5) > > > The real problem is I can't reproduce it. > > I can get it to happen fairly consistently here, anything I can do to help? I wish, I really need to work out what is different about my systems and those who are seeing the problem so I can reproduce in order to do a bisect. Doing a bisect involves using upstream sources and multiple kernel builds to identify the commit that started the problem. Right now it's most important to find out if the upstream patch Jeff mentioned makes a difference. Ian (In reply to comment #5) > > I've thought that bug 833535 might be related but haven't > checked the timeline closely for when that NFS change was > made. Umm .. that doesn't make sense. That should be "bug 874372 might be related" and the possible duplicate being bug 833535. Ian Here is a scratch build of the current F17 kernel which inludes the patch referred to in comment #4. https://koji.fedoraproject.org/koji/taskinfo?taskID=4761802 Please check to see if it makes a difference to the problem. I have experienced this bug as well. I have had it happen regularly, but not in any predictable manner for the past several months -- not sure how long, but at least 3 or 4 -- on each of 4 separate, but largely identical systems, all running FC17 with quite current Kernels. Current Kernel on all 4 systems is 3.6.10-2.fc17.i686. It always happens on the automounts, I have never seen it on a manual mount. Automounter is autofs-5.0.6-23.fc17.i686. The mounts that are failing are from a data pull that occurs every five minutes so the directories get remounted at 5 minute intervals. They time out in 60 seconds. I am able to remediate the failures with the following procedure: 1. Kill the automounter (systemctl stop autofs) 2. The mounts are in /misc/. I check to make sure that /etc/auto.misc has been unmounted. Do mount | grep auto.misc. If it is there do umount -l. 3. Then wait a few minutes for the "stuck" directory to unmount -- it seems to have to "time out". 4. Restart autofs and all is well. At least there is no need to reboot. However, I cannot reproduce the problem other than waiting for it to recur. Let me know if I can provide further data. (In reply to comment #10) > Here is a scratch build of the current F17 kernel which inludes > the patch referred to in comment #4. > > https://koji.fedoraproject.org/koji/taskinfo?taskID=4761802 > > Please check to see if it makes a difference to the problem. I wanted to test this but when I clicked on the link I couldn't find anything to download. Has it expired? If you provide the kernel I will try it on a system that has been exhibiting the problem. This bug appears to be fixed as of the 3.7 kernel. Machines that showed the problem with the 3.6 kernel have been running kernel-3.7.3-101.fc17.x86_64 for more than a week with no automount issues. The problem always manifested within a week so I believe it is cured. To whoever fixed this -- thanks! Thanks for the update! |