Bug 842312

Summary: nfs_attr_use_mounted_on_file() returns wrong value
Product: Red Hat Enterprise Linux 6 Reporter: Steve Dickson <steved>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED ERRATA QA Contact: Jian Li <jiali>
Severity: high Docs Contact:
Priority: urgent    
Version: 6.4CC: cww, dhoward, dwysocha, eguan, fhrbata, jiali, jlayton, nfs-maint, nmurray, rwheeler
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-302.el6 Doc Type: Bug Fix
Doc Text:
Due to a missing return statement, the nfs_attr_use_mounted_on_file() function returned a wrong value. As a consequence, redundant ESTALE errors could potentially be returned. This update adds the proper return statement to nfs_attr_use_mounted_on_file(), thus preventing this bug. Note that this bug only affects NFSv4 filesystems.
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-21 06:42:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 847944, 847945    
Attachments:
Description Flags
newpynfs fault-inject patch none

Description Steve Dickson 2012-07-23 12:39:59 UTC
Description of problem:
A 'return 0;' is missing in nfs_attr_use_mounted_on_file(). This if statement:

    if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
        (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
         ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))

should have a 'return 0;' when all three conditions are met.
 
Version-Release number of selected component (if applicable):
285.el6.fs.1

Comment 1 RHEL Program Management 2012-07-23 12:41:54 UTC
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.

Comment 4 Jian Li 2012-07-27 04:27:58 UTC
(In reply to comment #0)
> Description of problem:
> A 'return 0;' is missing in nfs_attr_use_mounted_on_file(). This if
> statement:
> 
>     if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
>         (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
>          ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))
> 
> should have a 'return 0;' when all three conditions are met.
>  
> Version-Release number of selected component (if applicable):
> 285.el6.fs.1

Can you give some advice to reproduce its effect?

Comment 5 Jeff Layton 2012-08-10 16:05:20 UTC
I'm not sure I know. I suspect that any bugs from this are quite subtle and difficult to detect.

The effect of that bug was that we'd always return 1 in that function, and this would get done whenever the conditions were met:

        fattr->fileid = fattr->mounted_on_fileid;

Which means that the fattr->fileid is almost always being set to the wrong thing when nfs_fhget gets called.

As to what effect that has, it's not immediately clear. Possibly we'd end up running afoul of the fileid check in nfs_update_inode in some cases which could result in spurious ESTALE errors? Or maybe we end up with some inode aliasing in the icache?

Comment 8 Jarod Wilson 2012-08-27 15:21:21 UTC
Patch(es) available on kernel-2.6.32-302.el6

Comment 11 Jian Li 2012-09-05 10:25:10 UTC
Hi Steve, 
I am testing this bug with newpynfs, because rhel nfs server support NFS_ATTR_FATTR_FILEID, which make the bug's patch's function ignored. But with newpynfs, I couldn't mount when I enable NFS_ATTR_FATTR_MOUNTED_ON_FILEID and disable NFS_ATTR_FATTR_FILEID.

Look at the patch:
 	if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
 		(((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
		((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))
			return 0;

Here, for a root inode(rootfh), fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT would be FALSE, so, this cause nfs_fhget always returns -2, fails to find/create the super_block's root inode. Is it right? Maybe I should make (fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) be FALSE. 

messages produced from 'rpcdebug'
#mount localhost:/ /mnt/test
** snip **
decode_getfattr_attrs: xdr returned 0
decode_getfattr_generic: xdr returned 0
nfs_fhget: iget failed with error -2
nfs_get_root: get root inode failed
<-- nfs4_get_root()
--> nfs4_kill_super
NFS: releasing superblock cookie (0xffff88007c4cb800/0xffff88007d270300)
--> nfs_free_server()
--> nfs_put_client({1})
--> nfs_free_client(4)
NFS: releasing client cookie (0xffff88007ccdd000/0xffff88007d2704f8)
<-- nfs_free_client()
<-- nfs_free_server()
<-- nfs4_kill_super
<-- nfs4_try_mount() = -2 [error]

Comment 12 Jeff Layton 2012-09-06 11:34:50 UTC
Ok, so you're setting up a fake server with newpynfs in order to test this?

In any case, let's suppose the server did not send us a regular fileid, so NFS_ATTR_FATTR_FILEID is unset. We then go into nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether the mounted_on_fileid is valid, or whether the attributes indicate that this is a mountpoint or referral. If none of those cases are true then we'll return 0 on the post-patch kernel and you'll get back ENOENT from nfs_fhget().

So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed after the patch goes in. If that's not working, maybe you're falling into this case?

        if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
                goto out_no_inode;

Does the server set FATTR4_WORD0_TYPE?

Comment 13 Jian Li 2012-09-07 02:32:40 UTC
(In reply to comment #12)
> Ok, so you're setting up a fake server with newpynfs in order to test this?
yes
> 
> In any case, let's suppose the server did not send us a regular fileid, so
> NFS_ATTR_FATTR_FILEID is unset. We then go into
> nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether
> the mounted_on_fileid is valid, or whether the attributes indicate that this
> is a mountpoint or referral. If none of those cases are true then we'll
> return 0 on the post-patch kernel and you'll get back ENOENT from
> nfs_fhget().
> 
> So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and
> NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed
> after the patch goes in. If that's not working, maybe you're falling into
> this case?
> 
>         if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
>                 goto out_no_inode;
> 
> Does the server set FATTR4_WORD0_TYPE?
Yes. server set FATTR4_WORK0_TYPE.  
Post-patch kernel cann't mount because NFS_ATTR_FATTR_MOUNTPOINT and NFS_ATTR_FATTR_V4_REFERRAL are not set either, nfs_attr_use_mounted_on_fileid return 0. 
I have adapted newpynfs, which gives root a fileid.

Comment 14 Jian Li 2012-09-07 17:58:25 UTC
I tested this bug with newpynfs, it don't support fattr4_fileid. 

1. When newpynfs don't support fattr4_mounted_on_file, post-patch kernel(kernel-2.6.32.305.el6) cann't mount, pre-patch kernel(kernel-2.6.32.293.el6 are not affected. 

2. When fattr4_mounted_on_fileid is supported, there are two file in server with the same filehandle, and different fattr4_mounted_on_file. pre-patch kernel could n't different them, and 'stat' return the same ino, so kernel couldn't distinguish the two file well.
post-patch kernel could difference them, and worked well as expected.

TEST RESULT:
post-patch kernel:
[root@dhcp-13-96 ~]# uname -a
Linux dhcp-13-96.redhat.com 2.6.32-305.el6.x86_64 #1 SMP Fri Aug 31 13:57:19 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-13-96 ~]# mount | grep /mnt/test
10.66.13.66:/ on /mnt/test type nfs (rw,vers=4,addr=10.66.13.66,clientaddr=10.66.13.96)
[root@dhcp-13-96 ~]# stat  /mnt/test/test1
  File: `/mnt/test/test1'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 16h/22d Inode: 2           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800
[root@dhcp-13-96 ~]# stat  /mnt/test/test2
  File: `/mnt/test/test2'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 16h/22d Inode: 3           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800

pre-post kernel:
[root@dhcp-13-66 nfs4.0]# uname -a
Linux dhcp-13-66.redhat.com 2.6.32-293.el6.x86_64 #1 SMP Wed Jul 25 15:08:44 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-13-66 nfs4.0]# mount | grep /mnt/test
localhost:/ on /mnt/test type nfs (rw,vers=4,addr=127.0.0.1,clientaddr=127.0.0.1)
[root@dhcp-13-66 nfs4.0]# stat /mnt/test/test1
  File: `/mnt/test/test1'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 1eh/30d Inode: 0           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800
[root@dhcp-13-66 nfs4.0]# stat /mnt/test/test2
  File: `/mnt/test/test2'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 1eh/30d Inode: 0           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800

newpynfs fault-injection patch is attached.

Comment 15 Jian Li 2012-09-07 18:03:56 UTC
Created attachment 610824 [details]
newpynfs fault-inject patch

Comment 16 Dave Wysochanski 2013-01-23 22:04:57 UTC
If it wasn't obvious, note that this bug only affects NFSv4.  We should probably make this clear in the doc text so I added a sentence at the end.

The affected function, nfs_attr_use_mounted_on_file() is only called from one place in nfs_fhget().  Inside nfs_fhget() the first part of the 'if' is always false so we never call the function:
       if (((fattr->valid & NFS_ATTR_FATTR_FILEID) == 0) &&
            !nfs_attr_use_mounted_on_fileid(fattr))

Comment 18 errata-xmlrpc 2013-02-21 06:42:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html