Bug 842312
Summary: | nfs_attr_use_mounted_on_file() returns wrong value | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Steve Dickson <steved> | ||||
Component: | kernel | Assignee: | Steve Dickson <steved> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jian Li <jiali> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 6.4 | CC: | cww, dhoward, dwysocha, eguan, fhrbata, jiali, jlayton, nfs-maint, nmurray, rwheeler | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.32-302.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Due to a missing return statement, the nfs_attr_use_mounted_on_file() function returned a wrong value. As a consequence, redundant ESTALE errors could potentially be returned. This update adds the proper return statement to nfs_attr_use_mounted_on_file(), thus preventing this bug. Note that this bug only affects NFSv4 filesystems.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2013-02-21 06:42:39 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 847944, 847945 | ||||||
Attachments: |
|
Description
Steve Dickson
2012-07-23 12:39:59 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. (In reply to comment #0) > Description of problem: > A 'return 0;' is missing in nfs_attr_use_mounted_on_file(). This if > statement: > > if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) || > (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) && > ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0))) > > should have a 'return 0;' when all three conditions are met. > > Version-Release number of selected component (if applicable): > 285.el6.fs.1 Can you give some advice to reproduce its effect? I'm not sure I know. I suspect that any bugs from this are quite subtle and difficult to detect. The effect of that bug was that we'd always return 1 in that function, and this would get done whenever the conditions were met: fattr->fileid = fattr->mounted_on_fileid; Which means that the fattr->fileid is almost always being set to the wrong thing when nfs_fhget gets called. As to what effect that has, it's not immediately clear. Possibly we'd end up running afoul of the fileid check in nfs_update_inode in some cases which could result in spurious ESTALE errors? Or maybe we end up with some inode aliasing in the icache? Patch(es) available on kernel-2.6.32-302.el6 Hi Steve, I am testing this bug with newpynfs, because rhel nfs server support NFS_ATTR_FATTR_FILEID, which make the bug's patch's function ignored. But with newpynfs, I couldn't mount when I enable NFS_ATTR_FATTR_MOUNTED_ON_FILEID and disable NFS_ATTR_FATTR_FILEID. Look at the patch: if (((fattr->valid & NFS_ATTR_FATTR_MOUNTED_ON_FILEID) == 0) || (((fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT) == 0) && ((fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) == 0))) return 0; Here, for a root inode(rootfh), fattr->valid & NFS_ATTR_FATTR_MOUNTPOINT would be FALSE, so, this cause nfs_fhget always returns -2, fails to find/create the super_block's root inode. Is it right? Maybe I should make (fattr->valid & NFS_ATTR_FATTR_V4_REFERRAL) be FALSE. messages produced from 'rpcdebug' #mount localhost:/ /mnt/test ** snip ** decode_getfattr_attrs: xdr returned 0 decode_getfattr_generic: xdr returned 0 nfs_fhget: iget failed with error -2 nfs_get_root: get root inode failed <-- nfs4_get_root() --> nfs4_kill_super NFS: releasing superblock cookie (0xffff88007c4cb800/0xffff88007d270300) --> nfs_free_server() --> nfs_put_client({1}) --> nfs_free_client(4) NFS: releasing client cookie (0xffff88007ccdd000/0xffff88007d2704f8) <-- nfs_free_client() <-- nfs_free_server() <-- nfs4_kill_super <-- nfs4_try_mount() = -2 [error] Ok, so you're setting up a fake server with newpynfs in order to test this? In any case, let's suppose the server did not send us a regular fileid, so NFS_ATTR_FATTR_FILEID is unset. We then go into nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether the mounted_on_fileid is valid, or whether the attributes indicate that this is a mountpoint or referral. If none of those cases are true then we'll return 0 on the post-patch kernel and you'll get back ENOENT from nfs_fhget(). So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed after the patch goes in. If that's not working, maybe you're falling into this case? if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0) goto out_no_inode; Does the server set FATTR4_WORD0_TYPE? (In reply to comment #12) > Ok, so you're setting up a fake server with newpynfs in order to test this? yes > > In any case, let's suppose the server did not send us a regular fileid, so > NFS_ATTR_FATTR_FILEID is unset. We then go into > nfs_attr_use_mounted_on_fileid. At that point, the conditional asks whether > the mounted_on_fileid is valid, or whether the attributes indicate that this > is a mountpoint or referral. If none of those cases are true then we'll > return 0 on the post-patch kernel and you'll get back ENOENT from > nfs_fhget(). > > So, I'd expect that making sure that NFS_ATTR_FATTR_FILEID is unset and > NFS_ATTR_FATTR_MOUNTED_ON_FILEID is set should allow nfs_fhget to succeed > after the patch goes in. If that's not working, maybe you're falling into > this case? > > if ((fattr->valid & NFS_ATTR_FATTR_TYPE) == 0) > goto out_no_inode; > > Does the server set FATTR4_WORD0_TYPE? Yes. server set FATTR4_WORK0_TYPE. Post-patch kernel cann't mount because NFS_ATTR_FATTR_MOUNTPOINT and NFS_ATTR_FATTR_V4_REFERRAL are not set either, nfs_attr_use_mounted_on_fileid return 0. I have adapted newpynfs, which gives root a fileid. I tested this bug with newpynfs, it don't support fattr4_fileid. 1. When newpynfs don't support fattr4_mounted_on_file, post-patch kernel(kernel-2.6.32.305.el6) cann't mount, pre-patch kernel(kernel-2.6.32.293.el6 are not affected. 2. When fattr4_mounted_on_fileid is supported, there are two file in server with the same filehandle, and different fattr4_mounted_on_file. pre-patch kernel could n't different them, and 'stat' return the same ino, so kernel couldn't distinguish the two file well. post-patch kernel could difference them, and worked well as expected. TEST RESULT: post-patch kernel: [root@dhcp-13-96 ~]# uname -a Linux dhcp-13-96.redhat.com 2.6.32-305.el6.x86_64 #1 SMP Fri Aug 31 13:57:19 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@dhcp-13-96 ~]# mount | grep /mnt/test 10.66.13.66:/ on /mnt/test type nfs (rw,vers=4,addr=10.66.13.66,clientaddr=10.66.13.96) [root@dhcp-13-96 ~]# stat /mnt/test/test1 File: `/mnt/test/test1' Size: 3 Blocks: 0 IO Block: 4096 regular file Device: 16h/22d Inode: 2 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Access: 2012-09-08 01:36:21.481472969 +0800 Modify: 2012-09-08 01:46:01.097069025 +0800 Change: 2012-09-08 01:46:01.097074031 +0800 [root@dhcp-13-96 ~]# stat /mnt/test/test2 File: `/mnt/test/test2' Size: 3 Blocks: 0 IO Block: 4096 regular file Device: 16h/22d Inode: 3 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Access: 2012-09-08 01:36:21.481472969 +0800 Modify: 2012-09-08 01:46:01.097069025 +0800 Change: 2012-09-08 01:46:01.097074031 +0800 pre-post kernel: [root@dhcp-13-66 nfs4.0]# uname -a Linux dhcp-13-66.redhat.com 2.6.32-293.el6.x86_64 #1 SMP Wed Jul 25 15:08:44 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [root@dhcp-13-66 nfs4.0]# mount | grep /mnt/test localhost:/ on /mnt/test type nfs (rw,vers=4,addr=127.0.0.1,clientaddr=127.0.0.1) [root@dhcp-13-66 nfs4.0]# stat /mnt/test/test1 File: `/mnt/test/test1' Size: 3 Blocks: 0 IO Block: 4096 regular file Device: 1eh/30d Inode: 0 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Access: 2012-09-08 01:36:21.481472969 +0800 Modify: 2012-09-08 01:46:01.097069025 +0800 Change: 2012-09-08 01:46:01.097074031 +0800 [root@dhcp-13-66 nfs4.0]# stat /mnt/test/test2 File: `/mnt/test/test2' Size: 3 Blocks: 0 IO Block: 4096 regular file Device: 1eh/30d Inode: 0 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 99/ nobody) Gid: ( 99/ nobody) Access: 2012-09-08 01:36:21.481472969 +0800 Modify: 2012-09-08 01:46:01.097069025 +0800 Change: 2012-09-08 01:46:01.097074031 +0800 newpynfs fault-injection patch is attached. Created attachment 610824 [details]
newpynfs fault-inject patch
If it wasn't obvious, note that this bug only affects NFSv4. We should probably make this clear in the doc text so I added a sentence at the end. The affected function, nfs_attr_use_mounted_on_file() is only called from one place in nfs_fhget(). Inside nfs_fhget() the first part of the 'if' is always false so we never call the function: if (((fattr->valid & NFS_ATTR_FATTR_FILEID) == 0) && !nfs_attr_use_mounted_on_fileid(fattr)) Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0496.html |