Bug 842312 - nfs_attr_use_mounted_on_file() returns wrong value
nfs_attr_use_mounted_on_file() returns wrong value
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
All Linux
urgent Severity high
: rc
: ---
Assigned To: Steve Dickson
Jian Li
: ZStream
Depends On:
Blocks: 847944 847945
  Show dependency treegraph
 
Reported: 2012-07-23 08:39 EDT by Steve Dickson
Modified: 2014-03-03 19:08 EST (History)
10 users (show)

See Also:
Fixed In Version: kernel-2.6.32-302.el6
Doc Type: Bug Fix
Doc Text:
Due to a missing return statement, the nfs_attr_use_mounted_on_file() function returned a wrong value. As a consequence, redundant ESTALE errors could potentially be returned. This update adds the proper return statement to nfs_attr_use_mounted_on_file(), thus preventing this bug. Note that this bug only affects NFSv4 filesystems.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-21 01:42:39 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
newpynfs fault-inject patch (4.30 KB, patch)
2012-09-07 14:03 EDT, Jian Li
no flags Details | Diff

  None (edit)
Description Steve Dickson 2012-07-23 08:39:59 EDT
Description of problem:
A 'return 0;' is missing in nfs_attr_use_mounted_on_file(). This if statement:

    if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
        (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
         ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))

should have a 'return 0;' when all three conditions are met.
 
Version-Release number of selected component (if applicable):
285.el6.fs.1
Comment 1 RHEL Product and Program Management 2012-07-23 08:41:54 EDT
This request was evaluated by Red Hat Product Management for
inclusion in a Red Hat Enterprise Linux release.  Product
Management has requested further review of this request by
Red Hat Engineering, for potential inclusion in a Red Hat
Enterprise Linux release for currently deployed products.
This request is not yet committed for inclusion in a release.
Comment 4 Jian Li 2012-07-27 00:27:58 EDT
(In reply to comment #0)
> Description of problem:
> A 'return 0;' is missing in nfs_attr_use_mounted_on_file(). This if
> statement:
> 
>     if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
>         (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
>          ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))
> 
> should have a 'return 0;' when all three conditions are met.
>  
> Version-Release number of selected component (if applicable):
> 285.el6.fs.1

Can you give some advice to reproduce its effect?
Comment 5 Jeff Layton 2012-08-10 12:05:20 EDT
I'm not sure I know. I suspect that any bugs from this are quite subtle and difficult to detect.

The effect of that bug was that we'd always return 1 in that function, and this would get done whenever the conditions were met:

        fattr->fileid = fattr->mounted_on_fileid;

Which means that the fattr->fileid is almost always being set to the wrong thing when nfs_fhget gets called.

As to what effect that has, it's not immediately clear. Possibly we'd end up running afoul of the fileid check in nfs_update_inode in some cases which could result in spurious ESTALE errors? Or maybe we end up with some inode aliasing in the icache?
Comment 8 Jarod Wilson 2012-08-27 11:21:21 EDT
Patch(es) available on kernel-2.6.32-302.el6
Comment 11 Jian Li 2012-09-05 06:25:10 EDT
Hi Steve, 
I am testing this bug with newpynfs, because rhel nfs server support NFS_ATTR_FATTR_FILEID, which make the bug's patch's function ignored. But with newpynfs, I couldn't mount when I enable NFS_ATTR_FATTR_MOUNTED_ON_FILEID and disable NFS_ATTR_FATTR_FILEID.

Look at the patch:
 	if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) ||
 		(((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) &&
		((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0)))
			return 0;

Here, for a root inode(rootfh), fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT would be FALSE, so, this cause nfs_fhget always returns -2, fails to find/create the super_block's root inode. Is it right? Maybe I should make (fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) be FALSE. 

messages produced from 'rpcdebug'
#mount localhost:/ /mnt/test
** snip **
decode_getfattr_attrs: xdr returned 0
decode_getfattr_generic: xdr returned 0
nfs_fhget: iget failed with error -2
nfs_get_root: get root inode failed
<-- nfs4_get_root()
--> nfs4_kill_super
NFS: releasing superblock cookie (0xffff88007c4cb800/0xffff88007d270300)
--> nfs_free_server()
--> nfs_put_client({1})
--> nfs_free_client(4)
NFS: releasing client cookie (0xffff88007ccdd000/0xffff88007d2704f8)
<-- nfs_free_client()
<-- nfs_free_server()
<-- nfs4_kill_super
<-- nfs4_try_mount() = -2 [error]
Comment 12 Jeff Layton 2012-09-06 07:34:50 EDT
Ok, so you're setting up a fake server with newpynfs in order to test this?

In any case, let's suppose the server did not send us a regular fileid, so NFS_ATTR_FATTR_FILEID is unset. We then go into nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether the mounted_on_fileid is valid, or whether the attributes indicate that this is a mountpoint or referral. If none of those cases are true then we'll return 0 on the post-patch kernel and you'll get back ENOENT from nfs_fhget().

So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed after the patch goes in. If that's not working, maybe you're falling into this case?

        if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
                goto out_no_inode;

Does the server set FATTR4_WORD0_TYPE?
Comment 13 Jian Li 2012-09-06 22:32:40 EDT
(In reply to comment #12)
> Ok, so you're setting up a fake server with newpynfs in order to test this?
yes
> 
> In any case, let's suppose the server did not send us a regular fileid, so
> NFS_ATTR_FATTR_FILEID is unset. We then go into
> nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether
> the mounted_on_fileid is valid, or whether the attributes indicate that this
> is a mountpoint or referral. If none of those cases are true then we'll
> return 0 on the post-patch kernel and you'll get back ENOENT from
> nfs_fhget().
> 
> So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and
> NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed
> after the patch goes in. If that's not working, maybe you're falling into
> this case?
> 
>         if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0)
>                 goto out_no_inode;
> 
> Does the server set FATTR4_WORD0_TYPE?
Yes. server set FATTR4_WORK0_TYPE.  
Post-patch kernel cann't mount because NFS_ATTR_FATTR_MOUNTPOINT and NFS_ATTR_FATTR_V4_REFERRAL are not set either, nfs_attr_use_mounted_on_fileid return 0. 
I have adapted newpynfs, which gives root a fileid.
Comment 14 Jian Li 2012-09-07 13:58:25 EDT
I tested this bug with newpynfs, it don't support fattr4_fileid. 

1. When newpynfs don't support fattr4_mounted_on_file, post-patch kernel(kernel-2.6.32.305.el6) cann't mount, pre-patch kernel(kernel-2.6.32.293.el6 are not affected. 

2. When fattr4_mounted_on_fileid is supported, there are two file in server with the same filehandle, and different fattr4_mounted_on_file. pre-patch kernel could n't different them, and 'stat' return the same ino, so kernel couldn't distinguish the two file well.
post-patch kernel could difference them, and worked well as expected.

TEST RESULT:
post-patch kernel:
[root@dhcp-13-96 ~]# uname -a
Linux dhcp-13-96.redhat.com 2.6.32-305.el6.x86_64 #1 SMP Fri Aug 31 13:57:19 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-13-96 ~]# mount | grep /mnt/test
10.66.13.66:/ on /mnt/test type nfs (rw,vers=4,addr=10.66.13.66,clientaddr=10.66.13.96)
[root@dhcp-13-96 ~]# stat  /mnt/test/test1
  File: `/mnt/test/test1'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 16h/22d Inode: 2           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800
[root@dhcp-13-96 ~]# stat  /mnt/test/test2
  File: `/mnt/test/test2'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 16h/22d Inode: 3           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800

pre-post kernel:
[root@dhcp-13-66 nfs4.0]# uname -a
Linux dhcp-13-66.redhat.com 2.6.32-293.el6.x86_64 #1 SMP Wed Jul 25 15:08:44 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@dhcp-13-66 nfs4.0]# mount | grep /mnt/test
localhost:/ on /mnt/test type nfs (rw,vers=4,addr=127.0.0.1,clientaddr=127.0.0.1)
[root@dhcp-13-66 nfs4.0]# stat /mnt/test/test1
  File: `/mnt/test/test1'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 1eh/30d Inode: 0           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800
[root@dhcp-13-66 nfs4.0]# stat /mnt/test/test2
  File: `/mnt/test/test2'
  Size: 3               Blocks: 0          IO Block: 4096   regular file
Device: 1eh/30d Inode: 0           Links: 1
Access: (0644/-rw-r--r--)  Uid: (   99/  nobody)   Gid: (   99/  nobody)
Access: 2012-09-08 01:36:21.481472969 +0800
Modify: 2012-09-08 01:46:01.097069025 +0800
Change: 2012-09-08 01:46:01.097074031 +0800

newpynfs fault-injection patch is attached.
Comment 15 Jian Li 2012-09-07 14:03:56 EDT
Created attachment 610824 [details]
newpynfs fault-inject patch
Comment 16 Dave Wysochanski 2013-01-23 17:04:57 EST
If it wasn't obvious, note that this bug only affects NFSv4.  We should probably make this clear in the doc text so I added a sentence at the end.

The affected function, nfs_attr_use_mounted_on_file() is only called from one place in nfs_fhget().  Inside nfs_fhget() the first part of the 'if' is always false so we never call the function:
       if (((fattr->valid & NFS_ATTR_FATTR_FILEID) == 0) &&
            !nfs_attr_use_mounted_on_fileid(fattr))
Comment 18 errata-xmlrpc 2013-02-21 01:42:39 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-0496.html

Note You need to log in before you can comment on or make changes to this bug.