Bug 913660 - nfs client crashes during open
nfs client crashes during open
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.4
All Linux
urgent Severity urgent
: rc
: ---
Assigned To: Jeff Layton
Jian Li
pNFS
: ZStream
Depends On:
Blocks: 921964 961662
  Show dependency treegraph
 
Reported: 2013-02-21 13:20 EST by Benny Halevy
Modified: 2014-03-03 19:09 EST (History)
12 users (show)

See Also:
Fixed In Version: kernel-2.6.32-375.el6
Doc Type: Bug Fix
Doc Text:
In a case of a broken or malicious server, an index node (inode) of an incorrect type could be matched. This led to an NFS client NULL pointer dereference, and, consequently, to a kernel oops. To prevent this problem from occurring in this scenario, a check has been added to verify that the inode type is correct.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-11-21 11:05:50 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
patch -- don't allow nfs_find_actor to match inodes of the wrong type (5.35 KB, patch)
2013-02-26 14:32 EST, Jeff Layton
no flags Details | Diff
patch -- don't allow nfs_find_actor to match inodes of the wrong type (5.36 KB, patch)
2013-02-26 14:36 EST, Jeff Layton
no flags Details | Diff
patch -- don't allow nfs_find_actor to match inodes of the wrong type (5.57 KB, patch)
2013-02-26 20:18 EST, Jeff Layton
no flags Details | Diff

  None (edit)
Description Benny Halevy 2013-02-21 13:20:29 EST
Description of problem:
During tests in Red Hat, after bug #913655 was observed, an attempt was made to reproduce it, which resulted in a different bug - hence this report.

dd command the resulted in client crash

dd if=/dev/zero bs=1M count=102400 oflag=direct of=/mnt/pnfs/testfile

vmcore-dmesg.txt excerpt:
<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0 
<4>Oops: 0000 [#1] SMP 
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6 
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6  
<4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 
<1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000

Version-Release number of selected component (if applicable):
2.6.32-358.el6

How reproducible:
Happened once.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
nfs client NULL pointer dereference

Expected results:
success

Additional info:
Comment 2 Jeff Layton 2013-02-21 16:05:46 EST
Heh, interesting. I'd wager that this has something to do with the problem:

    <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644

Maybe opened as if it were a directory, and then it changed to a regular file? Anyway, looks like we ended up in this codepath in fs/open.c:

        /* NB: we're sure to have correct a_ops only after f_op->open */
        if (f->f_flags & O_DIRECT) {
                if (!f->f_mapping->a_ops ||
                    ((!f->f_mapping->a_ops->direct_IO) &&
                    (!f->f_mapping->a_ops->get_xip_mem))) {
                        fput(f);
                        f = ERR_PTR(-EINVAL);
                }
        }


...but the f_ops were set to nfs_dir_operations. Weird...
Comment 3 Jeff Layton 2013-02-22 08:52:47 EST
Ok, I suspect we have both a client and server bug here:

 out_changed:
        /*
         * Big trouble! The inode has become a different object.
         */
        printk(KERN_DEBUG "%s: inode %ld mode changed, %07o to %07o\n",
                        __func__, inode->i_ino, inode->i_mode, fattr->mode);

...so, it sounds like the inode on the server got recycled, but maybe the server isn't keeping a proper i_generation counter. The filehandle should change when the server recycles the i_ino.

Granted, none of that should cause the box to crash, so I think we have a client bug here too. We probably ought to be doing something more to ensure that we don't hit a problem when the S_IFMT changes suddenly like this.

Benny, do you have a way to reproduce this at will?
Comment 4 Jeff Layton 2013-02-22 08:57:28 EST
If there is a way to reproduce this, it would be nice to see a stack trace too when that printk fires (just add a dump_stack() right after it).

A capture of the network traffic and/or some nfs_debug output might also be interesting.
Comment 5 Jeff Layton 2013-02-22 09:22:39 EST
Ok, looking over the code...

When nfs_update_inode throws this printk, it returns -ESTALE. That eventually bubbles back up to the caller, but almost all of the nfs_post_op_update_inode() callers ignore the return code.

So I suspect what happened is that we had a directory inode cached with the same fh. We issued an open against a name and got back the same fh in the response, but the S_IFMT was different.

So, nfs_update_inode throws back ESTALE but the i_fop is already set, and the upper layer open code can't do anything about it.

Still looking at what a proper fix would be...
Comment 6 Benny Halevy 2013-02-24 08:26:35 EST
We're working on reproducing the server side bug with our server (though it's a bit tricky) and on fixing the issue so to correctly represent the inode generation number in the FH.
Comment 7 Jeff Layton 2013-02-25 10:30:14 EST
Thanks Benny...

Looking over the code, I think the right solution is to simply check for a NULL filp->private_data in nfs_closedir. Checking now to see if this is something we need to push upstream too. I'll note that we already check for a NULL ctx in the NFS file f_op->release. It seems reasonable to do the same for directories.
Comment 8 Jeff Layton 2013-02-26 14:32:27 EST
Created attachment 703090 [details]
patch -- don't allow nfs_find_actor to match inodes of the wrong type

Actually, I think we might be best off doing something like this patch. This one is vs. mainline, but backporting to RHEL6 should be trivial. Thoughts?
Comment 9 Jeff Layton 2013-02-26 14:36:25 EST
Created attachment 703093 [details]
patch -- don't allow nfs_find_actor to match inodes of the wrong type

Ahem, I mean this patch...
Comment 10 Jeff Layton 2013-02-26 14:37:18 EST
Benny, if you find some way to reproduce this at will, could you test this patch and let me know if it prevents the oops?
Comment 11 Benny Halevy 2013-02-26 15:02:50 EST
So far we've been able to reproduce the warning by instrumenting the client and adding a delay during open yet we haven't been able to reproduce the crash. Will give the patch some soak time regardless.
Comment 12 Jeff Layton 2013-02-26 19:15:58 EST
Interesting. The trick I think will be to make sure you return the new attributes in the open. It may also be important to make sure you're doing the atomic open in a lookup and not in d_revalidate. IOW, make sure you don't touch the dentry until you issue the open call against it.
Comment 13 Jeff Layton 2013-02-26 19:39:26 EST
Yeah...from nfs_open_revalidate:

        /* We can't create new files in nfs_open_revalidate(), so we
         * optimize away revalidation of negative dentries.
         */
        if (inode == NULL) {
                if (!nfs_neg_need_reval(dir, dentry, nd))
                        ret = 1;
                goto out;
        }

...so you need make sure the reproducer does something like this:

1) do a lookup and find a directory inode
2) then do an open() against a previously untouched dentry, and get back the same fh and fileid as the directory inode
Comment 14 Jeff Layton 2013-02-26 19:40:19 EST
(assuming of course that the dentry you're trying to open was a negative one)
Comment 15 Jeff Layton 2013-02-26 20:18:28 EST
Created attachment 703218 [details]
patch -- don't allow nfs_find_actor to match inodes of the wrong type

Revised patch (with some changes suggested by Trond). The S_IFMT checks are cheaper than comparing filehandles, so we might as well do those first. The fileid check should almost always come back negative if it's the wrong inode anyway.

I'll go ahead and send this out soon regardless, but if you can confirm whether it fixes the oops, that would be ideal.
Comment 16 Jeff Layton 2013-02-28 12:14:39 EST
Went ahead and sent the patch to Trond so it will hopefully make 3.9. We'll see about getting this into 6.5 too. 6.4.z I'm not so sure about -- this is ultimately due to some strange behavior by the server, so I'm not sure how generally applicable this is. OTOH, it's a small, targeted fix that's unlikely to break anything so maybe it's reasonable.
Comment 19 Jarod Wilson 2013-05-08 16:29:04 EDT
Patch(es) available on kernel-2.6.32-375.el6
Comment 22 Jian Li 2013-10-13 05:49:59 EDT
Verified by beaker job: https://beaker.engineering.redhat.com/jobs/517100
job call_trace data is not easy available.

<4>
<4>Pid: 6878, comm: open Not tainted 2.6.32-370.el6.x86_64 #1 IBM  -[7160I35]-/00D8634
<4>RIP: 0010:[<ffffffffa02892c5>]  [<ffffffffa02892c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff880476f4da78  EFLAGS: 00010296
<4>RAX: ffffffffa02892b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02c8720 RSI: ffff880473739e00 RDI: ffff880453b4da78
<4>RBP: ffff880476f4da88 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000441 R11: 0000000000400002 R12: 0000000040000010
<4>R13: ffff880453b4da78 R14: ffff88045ae79900 R15: ffff88047336ce80
<4>FS:  00007f0343cdd700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 00000004733b4000 CR4: 00000000000407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process open (pid: 6878, threadinfo ffff880476f4c000, task ffff8804756e4ae0)
<4>Stack:
<4> 0000000040000010 ffff880473739e00 ffff880476f4dad8 ffffffff81182c65
<4><d> ffff880453b4da78 ffff88045ae79900 ffff880476f4dad8 ffffffffffffffea
<4><d> ffff88047336ce80 ffff88045ae79900 ffffffffa02890a0 ffff880453b4da78
<4>Call Trace:
<4> [<ffffffff81182c65>] __fput+0xf5/0x210
<4> [<ffffffffa02890a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182da5>] fput+0x25/0x30
<4> [<ffffffff8117e75e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c3f7a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e9d9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa028a679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa028a86b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff81190a45>] do_lookup+0x1a5/0x230
<4> [<ffffffff81191204>] __link_path_walk+0x734/0x1030
<4> [<ffffffff81143b17>] ? handle_pte_fault+0xf7/0xb50
<4> [<ffffffff81191d8a>] path_walk+0x6a/0xe0
<4> [<ffffffff81191f5b>] do_path_lookup+0x5b/0xa0
<4> [<ffffffff81182980>] ? get_empty_filp+0xa0/0x180
<4> [<ffffffff81192e9b>] do_filp_open+0xfb/0xdc0
<4> [<ffffffff8104759c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119fb32>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117e399>] do_sys_open+0x69/0x140
<4> [<ffffffff8117e4b0>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 f3 1c f8 ff 48 89 df e8 4b 03 ee e0 48 83 c4 08 31
Comment 23 errata-xmlrpc 2013-11-21 11:05:50 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1645.html

Note You need to log in before you can comment on or make changes to this bug.