Bug 798809
Summary: | RHEL5.8 NFSv4 regression - "ls" returns "-ENOTDIR" when listing a subdirectory of exported mount | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Dave Wysochanski <dwysocha> | ||||||||
Component: | kernel | Assignee: | Ian Kent <ikent> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Petr Beňas <pbenas> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | high | ||||||||||
Version: | 5.8 | CC: | amsterdamos, anshockm, bgollahe, brodbd, ccui, cww, dhoward, dhowells, eguan, erobertstad, ikent, jblaine, jlayton, juanino, kalaklanar, krai, matt.dey, ndevos, nfs-maint, orion, pasteur, pbenas, phalenor, pstehlik, rrajaram, rwheeler, shane.baker, shshaikh, tomryan, toracat | ||||||||
Target Milestone: | rc | Keywords: | Regression, ZStream | ||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
The vfs-automount infrastructure assumes that the LOOKUP_DIRECTORY flag is included in nameidata flags if a trailing slash character (/) is given on a path being walked. But this flag is private to the __link_path_walk() function so it must be added when looking up the last component. Previously, during a path walk where the path included a trailing slash character, LOOKUP_DIRECTORY was not propagated to path walk functions. Consequently, directories that needed to trigger an automount failed to do so, which resulted in a -ENOTDIR error. This bug has been fixed and the error code is no longer returned in the described scenario.
|
Story Points: | --- | ||||||||
Clone Of: | |||||||||||
: | 976617 1112963 (view as bug list) | Environment: | |||||||||
Last Closed: | 2013-01-08 04:45:02 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 801726, 814418, 976617, 1112963 | ||||||||||
Attachments: |
|
Description
Dave Wysochanski
2012-02-29 22:41:12 UTC
*** Bug 796436 has been marked as a duplicate of this bug. *** (In reply to comment #10) > Verified! > > Reverting commit b78a282 on -308 kernel fixes the issue. Could you try reverting commits d0f4a676,592dda3d,ca570e71 and ad7b9d29 and see if the problem still occurs? That should tell us if ca570e71 is actually the problem. (In reply to comment #15) > (In reply to comment #10) > > Verified! > > > > Reverting commit b78a282 on -308 kernel fixes the issue. > > Could you try reverting commits d0f4a676,592dda3d,ca570e71 and > ad7b9d29 and see if the problem still occurs? > > That should tell us if ca570e71 is actually the problem. If that does resolve the problem then an strace of your ls command would save me some time as well. Created attachment 566908 [details]
strace of failing 'ls' command
This one fails on open() system call:
4466 13:16:54 open("/mnt/nfsimport1/boot", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = -1 ENOTDIR (Not a directory) <0.000060>
I've seen 'stat' and 'lstat' fail in a similar way (returns ENOTDIR).
(In reply to comment #19) > Created attachment 566908 [details] > strace of failing 'ls' command > > This one fails on open() system call: > > 4466 13:16:54 open("/mnt/nfsimport1/boot", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = > -1 ENOTDIR (Not a directory) <0.000060> > > I've seen 'stat' and 'lstat' fail in a similar way (returns ENOTDIR). This open(2) should not be returning ENOTDIR, basically because of the O_DIRECTORY, which should be causing the mount on follow. The series I've mentioned essentially stops the kernel LOOKUP_FOLLOW from causing automounts to occur in favour of using the LOOKUP_AUTOMOUNT flag, but only where syscall call sites don't already have one of the other flags that also cause an automount to occur. As you see above, LOOKUP_OPEN, LOOKUP_CREATE and LOOKUP_DIRECTORY on terminal path components and LOOKUP_CONTINUE on intermediate path components are those flags. The LOOKUP_DIRECTORY flag should be set by the O_DIRECTORY option so that's where I needs to start looking. Could you post the entire strace please.
Maybe there's a case that I didn't properly consider.
>
> How reproducible:
> Easy to reproduce.
>
> Steps to Reproduce:
> 1. RHEL6.2 NFS server, RHEL5.8 NFS client
> 2. Follow "Reproducer" steps in "Diagnostic Steps" section of:
> https://access.redhat.com/knowledge/solutions/75553
I don't have access to this so if you could post it here for
me to have a look at that would also help.
Full strace is attached to this bz: https://bugzilla.redhat.com/show_bug.cgi?id=798809#c19 Created attachment 566947 [details]
Tarball containing reproducer scripts (NFS server rhel6.2, NFS client rhel5.8)
Here's what the now published article says in the reproducer section:
I created a couple shell files to start/stop the client and server test.
Reproducer details
NFS server (RHEL6.2)
# cat /etc/exports
/export-605860 rhel5.7-node1(rw,fsid=0,crossmnt,insecure,sync,anonuid=4294967294)
/export-605860 rhel5.8-node1(rw,fsid=0,crossmnt,insecure,sync,anonuid=4294967294)
# cat start-test-nfs-server.sh
/bin/mount -o bind,defaults,nosuid,nodev,acl /boot /export-605860/boot
service nfs start
service rpcidmapd start
# cat stop-test-nfs-server.sh
service rpcidmapd stop
service nfs stop
umount /export-605860/boot
NFS client (RHEL5.8)
# cat start-test-nfs-client.sh
service rpcidmapd start
mount -t nfs4 rhel6-nfs-server:/ /mnt/nfsimport1/
# cat stop-test-nfs-client.sh
umount /mnt/nfsimport1/
service rpcidmapd stop
# ./start-test-nfs-client.sh
Starting RPC idmapd: [ OK ]
Warning: rpc.idmapd appears not to be running.
All uids will be mapped to the nobody uid.
# ls -l /mnt/nfsimport1/
total 4
dr-xr-xr-x 5 nobody nobody 3072 Dec 21 16:32 boot
# ls -l /mnt/nfsimport1/boot/
ls: /mnt/nfsimport1/boot/: Not a directory
# strace -o /tmp/ls-error.txt ls -l /mnt/nfsimport1/boot/
ls: /mnt/nfsimport1/boot/: Not a directory
# grep DIR /tmp/ls-error.txt
lstat("/mnt/nfsimport1/boot/", 0xa1226a8) = -1 ENOTDIR (Not a directory)
Created attachment 566988 [details]
Patch vfs - fix d_instantiate_unique
This may be premature but this is what I think is the problem.
I'm building test kernels now.
Created attachment 567091 [details]
Patch vfs - fix LOOKUP_DIRECTORY not propagated to managed_dentry()
Tentative patch.
I think this was actually a flaw in the original implementation
but upstream fs/namei.c has changed so much now it's hard to be
sure.
(In reply to comment #36) > Created attachment 567091 [details] > Patch vfs - fix LOOKUP_DIRECTORY not propagated to managed_dentry() > > Tentative patch. > I think this was actually a flaw in the original implementation > but upstream fs/namei.c has changed so much now it's hard to be > sure. fyi, a trailing "/" is meant to be able to override cases where and automount would otherwise not be done during a path walk. It has always meant to be like that but the fact that it didn't quite work the way it was intended was obscured by the walk not actually returning fails when the follow_link() hack was used. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The vfs-automount infrastructure assumes that the LOOKUP_DIRECTORY flag is included in nameidata flags if a trailing slash character (/) is given on a path being walked. But this flag is private to the __link_path_walk() function so it must be added when looking up the last component. Previously, during a path walk where the path included a trailing slash character, LOOKUP_DIRECTORY was not propagated to path walk functions. Consequently, directories that needed to trigger an automount failed to do so, which resulted in a -ENOTDIR error. This bug has been fixed and the error code is no longer returned in the described scenario. FWIW, RH 5.8 running 2.6.18-308.4.1.el5 is also showing this exact problem. It was my understanding that -308.3.1 fixed this for RHEL 5.8, so I would expect -308.4.1 to include that fix. Am I just mistaken? (In reply to comment #58) > FWIW, RH 5.8 running 2.6.18-308.4.1.el5 is also showing this exact problem. > It was my understanding that -308.3.1 fixed this for RHEL 5.8, so I would > expect -308.4.1 to include that fix. Am I just mistaken? The change is included in -308.3.1 and so is, of course included in 308.4.1. There were a couple of other problems reported, one relating to CIFS and one where a symlink was followed to an automount. What exactly are you seeing? ~:lider2> cd /nas/project/ff/work/html html:lider2> tail home.html | wc -l 10 html:lider2> tail /nas/project/ff/work/html/home.html tail: cannot open `/nas/project/ff/work/html/home.html' for reading: Not a directory html:lider2> uname -a Linux lider2 2.6.18-308.4.1.el5 #1 SMP Wed Mar 28 01:54:56 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux html:lider2> Note that /nas/project/ff/work/html is a symlink to /nas/env/web/projects/html The /etc/mtab information for the filesystem in question is: fasb2:/vol/mtc_env /nas/env nfs4 rw,addr=xxx.xx.10.42 0 0 It is a NetApp filer. Both shares (there are 2 involved as I described) are mounted directly in /etc/fstab, not via automount. (In reply to comment #60) > ~:lider2> cd /nas/project/ff/work/html > html:lider2> tail home.html | wc -l > 10 > html:lider2> tail /nas/project/ff/work/html/home.html > tail: cannot open `/nas/project/ff/work/html/home.html' for reading: Not a > directory > html:lider2> uname -a > Linux lider2 2.6.18-308.4.1.el5 #1 SMP Wed Mar 28 01:54:56 EDT 2012 x86_64 > x86_64 x86_64 GNU/Linux > html:lider2> > > Note that /nas/project/ff/work/html is a symlink to > /nas/env/web/projects/html Yes, that is the symlink follow problem I refered to. > > The /etc/mtab information for the filesystem in question is: > > fasb2:/vol/mtc_env /nas/env nfs4 rw,addr=xxx.xx.10.42 0 0 > > It is a NetApp filer. > > Both shares (there are 2 involved as I described) are mounted directly in > /etc/fstab, not via automount. Right, but the symlink follow problem still applies. The actual problem is that the change in comment #36 is done in in the wrong place and that causes the lookup flags to be incorrect after the symlink is followed. The changes for this bug have been included in an update release before the problem was discovered. So to get this fixed as quickly as we can, having a customer report would be best. Please report this to support and refer them to my comment here and we'll see how we go. High priority case 00650244 opened with URL reference here. Reproduced in 2.6.18-311.el5 and verified in 2.6.18-312.el5. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0006.html |