Bug 814418

Summary: NFSv4 symlink regression problem
Product: Red Hat Enterprise Linux 5 Reporter: Jackie Meese <kalaklanar>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED ERRATA QA Contact: Eryu Guan <eguan>
Severity: high Docs Contact:
Priority: high    
Version: 5.8CC: ajadhav, anshockm, ccui, cww, david.jericho, dhoward, dhowells, dsulliva, dwysocha, eguan, ggb, ikent, jblaine, jflack, jiali, jlayton, kalaklanar, ksquizza, nfs-maint, ondrejv, rwheeler, yanwang, ztao
Target Milestone: rcKeywords: Regression, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
If a path followed a symlink that ended with the slash ("/") character, the LOOKUP_DIRECTORY flag could be set earlier than the last path component. This led to an ENOTDIR (Not a directory) error. The LOOKUP_DIRECTORY flag is now propagated only for the last component. For the purpose of possible automounting, the flag is not needed for intermediate path components; the LOOKUP_CONTINUE flag is set in such a case. The ENOTDIR error no longer occurs in this scenario.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-01-08 04:29:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 798809, 976617, 1112963    
Bug Blocks: 830264    
Attachments:
Description Flags
Patch - propogate LOOKUP_DIRECTORY flag only for last component none

Description Jackie Meese 2012-04-19 20:18:20 UTC
Description of problem:
All RHEL 5.8 boxes, using kernel-PAE-2.6.18-308.1.1.el5 kernel, working fine as below. upgrade client to kernel-PAE-2.6.18-308.4.1.el5, will no longer have permission to follow symlinks (server kernel version does not appear to matter)

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.18-308.4.1.el5

How reproducible:
On server:
kernel-PAE-2.6.18-308.4.1.el5 and kernel-PAE-2.6.18-308.1.1.el5 (only rebooted back end when problem started, so it was still running old kernel)
[root@server ~]# cat /etc/exports 
/mnt/exports/home 192.168.1.0/24(rw,fsid=0,insecure,no_subtree_check,async,anonuid=65534,anongid=65534,no_root_squash)
nfs and rpcidmapd started
mkdir -p /mnt/exports/home/httpd
mkdir -p /mnt/exports/home/nool/web/images
cd /mnt/exports/home/httpd
ln -s ../nool/web/ html
cd html
wget http://someserver.com/header.jpg (this is the test file)
On client:
[root@client1 ~]# cat /etc/fstab |grep nfs4
server:/              /home                   nfs4    async,noac	0 0

/etc/passwd /etc/group and /etc/shadow same on clients and server via scp
selinux is disabled on all servers via /etc/sysconfig/selinux
all servers are connecting via a private LAN, with no firewalls, all LAN hosts defined in /etc/hosts


Steps to Reproduce:
1. install kernel-PAE-2.6.18-308.4.1.el5 on client
2. reboot into new kernel
3. [root@client web]# file /home/nool/web/images/header.jpg
/home/nool/web/images/header.jpg: JPEG image data, JFIF standard 1.02
4. [root@thnad web]# file /home/httpd/html/images/header.jpg
/home/httpd/html/images/header.jpg: writable, executable, regular file, no read permission
5. [root@server ~]# file /mnt/exports/home/httpd/html/images/header.jpg 
/mnt/vlad/home/httpd/html/images/header.jpg: JPEG image data, JFIF standard 1.02
6. [root@server ~]# file /mnt/exports/home/nool/web/images/header.jpg 
/mnt/vlad/home/nool/web/images/header.jpg: JPEG image data, JFIF standard 1.02
  
Actual results:
accessing via the symlink on the NFSv4 mount (step 4) gives permission error, accessing directly (steps 3,5,6) does not

Expected results:
both steps 3 & 4 to allow access (as in #3): JPEG image data, JFIF standard 1.02

Additional info:

Comment 2 Ian Kent 2012-04-26 13:00:17 UTC
Just a quick update so you know that the problem is being
worked on and we're making progress.

I've worked through the reproducer above and I'm able to
reproduce the problem.

I tried reverting a patch but that didn't help which really
only leaves one patch that could be a problem.

I've built a kernel (but not yet tested) without the patch for
handling the automount of paths with a trailing "/", which I
expect is the cause.

I believe that the problem is miss handling of the trailing
"/" in the kernel path walk of the arguments of the command
"ln -s ../nool/web/ html" since using "ln -s ../nool/web html"
doesn't expose the problem.

Comment 3 Ian Kent 2012-04-27 10:34:31 UTC
Created attachment 580720 [details]
Patch - propogate LOOKUP_DIRECTORY flag only for last component

Comment 4 Ian Kent 2012-04-27 10:39:02 UTC
A kernel that should resolve this problem is available at:
http://people.redhat.com/~ikent/kernel-2.6.18-308.5.1.el5.bz814418.3

It includes the patch posted in comment #3.
Please test this kernel and report back.

Comment 5 Jackie Meese 2012-04-27 14:42:16 UTC
This kernel is working on my current setup. Thanks for getting on this.

Comment 6 RHEL Program Management 2012-04-30 02:29:37 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 8 Jackie Meese 2012-06-04 17:50:57 UTC
I have just noticed after the recent kernel update that the error only occurs on the PAE kernel, where the kernel you had put up was a non-PAE kernel. So, the fix you put up did fix the problem, but it is possible it was for the wrong reason.

Comment 9 Ian Kent 2012-06-05 01:16:50 UTC
(In reply to comment #8)
> I have just noticed after the recent kernel update that the error only
> occurs on the PAE kernel, where the kernel you had put up was a non-PAE
> kernel. So, the fix you put up did fix the problem, but it is possible it
> was for the wrong reason.

I think you're mistaken.

I'm a litle behind on this one as I haven't yet sent the patch
to the kernel patch review list so it can't be included in the
kerel source tree.

But, more importantly, this won't get pushed out in an update
unless someone adds the ZStream keyword to the bug attributes
and it is then approved and the bug cloned to get the update
underway. Ideally that keyword would be set before I post the
patch so that the ZStream update request gets noticed at the
same time as the patch is posted. I believe I don't have the
ability to set the ZStream keyword so someone else (anyone,
please) will need to do that.

Comment 10 Ondrej Valousek 2012-06-05 07:08:35 UTC
Hi Ian,
Is there anything I can do to have this fixed asap? This bug is very annoying for us as it prevents us from upgrading to the latest kernel. I took matter-of-fact that is going to be fixed in 2.6.18-308.5.1.el5 kernels (no, we are not using the PAE kernel, but we still experience this).
I am bit surprised that RH do not take this one serious enough as it de-facto renders NFSv4 unusable.
Shall I open a case with SEG?
Thanks.

Comment 11 Ian Kent 2012-06-05 11:04:21 UTC
(In reply to comment #10)
> Hi Ian,
> Is there anything I can do to have this fixed asap? This bug is very
> annoying for us as it prevents us from upgrading to the latest kernel. I
> took matter-of-fact that is going to be fixed in 2.6.18-308.5.1.el5 kernels
> (no, we are not using the PAE kernel, but we still experience this).
> I am bit surprised that RH do not take this one serious enough as it
> de-facto renders NFSv4 unusable.
> Shall I open a case with SEG?

I can asure you that I take this seriously but, as I admitted
above, this one slipped through the cracks and I appologize
for that. So now we've lost some time in the process.

But OTOH, to have this pushed out as an update we need (I
believe, since it's not something I do) support to push for 
it to be proposed as an update.

So, if you haven't logged a case with support, then it will
help if you do. Point them to this bug so they can see what's
going on. Your not alone in wanting this updated and I'm keen
to see it pushed out as an update so we should be fine.

Ian

Comment 12 Ondrej Valousek 2012-06-05 11:13:19 UTC
Case 00651762 opened.

Comment 13 Jackie Meese 2012-06-05 14:30:49 UTC
I was incorrect. It appears that one of the other sysadmins had fixed one of the links to remove the trailing slash, so please disregard my comment #8 https://bugzilla.redhat.com/show_bug.cgi?id=814418#c8

Comment 24 Jeff Blaine 2012-06-15 16:25:23 UTC
(In reply to comment #4)
> A kernel that should resolve this problem is available at:
> http://people.redhat.com/~ikent/kernel-2.6.18-308.5.1.el5.bz814418.3
> 
> It includes the patch posted in comment #3.
> Please test this kernel and report back.

Is there anything available for PAE?

We're using the kernels at the link above just fine, but also need
PAE. Word on the street is that it will be at least 30 days before
this (official) bug fixed kernel shows up in repositories.  Stuck!

Comment 25 Ian Kent 2012-06-19 03:28:03 UTC
(In reply to comment #24)
> (In reply to comment #4)
> > A kernel that should resolve this problem is available at:
> > http://people.redhat.com/~ikent/kernel-2.6.18-308.5.1.el5.bz814418.3
> > 
> > It includes the patch posted in comment #3.
> > Please test this kernel and report back.
> 
> Is there anything available for PAE?
> 
> We're using the kernels at the link above just fine, but also need
> PAE. Word on the street is that it will be at least 30 days before
> this (official) bug fixed kernel shows up in repositories.  Stuck!

I've re-built the kernel from comment #4 and copied across
the PAE packages as well this time.

Have a look and let me know if this is what you are after.

Comment 26 Ian Kent 2012-06-20 04:37:31 UTC
*** Bug 825031 has been marked as a duplicate of this bug. ***

Comment 29 TAO Zhijiang 2012-07-23 06:20:03 UTC
[root@nec-em18 ~]# uname -a
Linux nec-em18.rhts.eng.bos.redhat.com 2.6.18-308.4.1.el5 #1 SMP Wed Mar 28 01:54:56 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@nec-em18 ~]# mkdir -p /exports/home && chmod 777 /exports/hom
chmod: cannot access `/exports/hom': No such file or directory
[root@nec-em18 ~]# mkdir -p /exports/home && chmod 777 /exports/home
[root@nec-em18 ~]# cd /exports/home/
[root@nec-em18 home]# mkdir nfs_dir
[root@nec-em18 home]# echo "haha" > nfs_dir/foo.txt
[root@nec-em18 home]# ln -s nfs_dir/ nfs_link1
[root@nec-em18 home]# ln -s nfs_dir nfs_link2
[root@nec-em18 home]# ll
total 16
drwxr-xr-x 2 root root 4096 Jul 23 02:05 nfs_dir
lrwxrwxrwx 1 root root    8 Jul 23 02:05 nfs_link1 -> nfs_dir/
lrwxrwxrwx 1 root root    7 Jul 23 02:05 nfs_link2 -> nfs_dir
[root@nec-em18 home]# 
[root@nec-em18 home]# tail /etc/exports 
/exports/home *(fsid=0,all_squash,rw,sync)
[root@nec-em18 home]# service portmap restart > /dev/null 2>&1
[root@nec-em18 home]# service nfs restart > /dev/null 2>&1
[root@nec-em18 home]# service nfslock restart > /dev/null 2>&1
[root@nec-em18 home]# /sbin/rpc.statd restart > /dev/null 2>&1
[root@nec-em18 home]# 
[root@nec-em18 home]# mount -t nfs4 `hostname`:/ /mnt -vvv
Warning: rpc.idmapd appears not to be running.
         All uids will be mapped to the nobody uid.
mount: pinging: prog 100003 vers 4 prot tcp port 2049
[root@nec-em18 home]# cd /mnt
[root@nec-em18 mnt]# cat nfs_dir/foo.txt 
haha
[root@nec-em18 mnt]# cat nfs_link1/foo.txt 
cat: nfs_link1/foo.txt: Not a directory
[root@nec-em18 mnt]# cat nfs_link2/foo.txt 
haha
[root@nec-em18 mnt]# 
[root@nec-em18 mnt]# 
[root@nec-em18 mnt]# 


[root@nec-em18 ~]# uname -a
Linux nec-em18.rhts.eng.bos.redhat.com 2.6.18-330.el5 #1 SMP Wed Jul 18 11:18:55 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@nec-em18 ~]# ll /exports/home/
total 16
drwxr-xr-x 2 root root 4096 Jul 23 02:05 nfs_dir
lrwxrwxrwx 1 root root    8 Jul 23 02:05 nfs_link1 -> nfs_dir/
lrwxrwxrwx 1 root root    7 Jul 23 02:05 nfs_link2 -> nfs_dir
[root@nec-em18 ~]# tail /etc/exports 
/exports/home *(fsid=0,all_squash,rw,sync)
[root@nec-em18 ~]# service portmap restart > /dev/null 2>&1
[root@nec-em18 ~]# service nfs restart > /dev/null 2>&1
[root@nec-em18 ~]# service nfslock restart > /dev/null 2>&1
[root@nec-em18 ~]# /sbin/rpc.statd restart > /dev/null 2>&1
[root@nec-em18 ~]# mount -t nfs4 `hostname`:/ /mnt -vvv
Warning: rpc.idmapd appears not to be running.
         All uids will be mapped to the nobody uid.
mount: pinging: prog 100003 vers 4 prot tcp port 2049
[root@nec-em18 ~]# cd /mnt/
[root@nec-em18 mnt]# ll
total 16
drwxr-xr-x 2 root root 4096 Jul 23 02:05 nfs_dir
lrwxrwxrwx 1 root root    8 Jul 23 02:05 nfs_link1 -> nfs_dir/
lrwxrwxrwx 1 root root    7 Jul 23 02:05 nfs_link2 -> nfs_dir
[root@nec-em18 mnt]# cat nfs_dir/foo.txt 
haha
[root@nec-em18 mnt]# cat nfs_link1/foo.txt 
haha
[root@nec-em18 mnt]# cat nfs_link2/foo.txt 
haha
[root@nec-em18 mnt]# 
[root@nec-em18 mnt]# 


This bug has been reproduced & verified.

Comment 30 Steve Dickson 2012-07-30 15:13:50 UTC
*** Bug 834717 has been marked as a duplicate of this bug. ***

Comment 32 Martin Prpič 2012-08-28 12:29:05 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
If a path followed a symlink that ended with the slash ("/") character, the LOOKUP_DIRECTORY flag could be set earlier than the last path component. This led to an ENOTDIR (Not a directory) error. The LOOKUP_DIRECTORY flag is now propagated only for the last component. For the purpose of possible automounting, the flag is not needed for intermediate path components; the LOOKUP_CONTINUE flag is set in such a case. The ENOTDIR error no longer occurs in this scenario.

Comment 34 errata-xmlrpc 2013-01-08 04:29:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-0006.html