Hide Forgot
Description of problem: During tests in Red Hat, after bug #913655 was observed, an attempt was made to reproduce it, which resulted in a different bug - hence this report. dd command the resulted in client crash dd if=/dev/zero bs=1M count=102400 oflag=direct of=/mnt/pnfs/testfile vmcore-dmesg.txt excerpt: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 <1>BUG: unable to handle kernel NULL pointer dereference at (null) <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>PGD 81448a067 PUD 831632067 PMD 0 <4>Oops: 0000 [#1] SMP <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled <4>CPU 6 <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6 <4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292 <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760 <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008 <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780 <4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040) <4>Stack: <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745 <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760 <4>Call Trace: <4> [<ffffffff81182745>] __fput+0xf5/0x210 <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182885>] fput+0x25/0x30 <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160 <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0 <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50 <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0 <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140 <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80 <4> [<ffffffff8117df90>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31 <1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs] <4> RSP <ffff88081458bb98> <4>CR2: 0000000000000000 Version-Release number of selected component (if applicable): 2.6.32-358.el6 How reproducible: Happened once. Steps to Reproduce: 1. 2. 3. Actual results: nfs client NULL pointer dereference Expected results: success Additional info:
Heh, interesting. I'd wager that this has something to do with the problem: <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644 Maybe opened as if it were a directory, and then it changed to a regular file? Anyway, looks like we ended up in this codepath in fs/open.c: /* NB: we're sure to have correct a_ops only after f_op->open */ if (f->f_flags & O_DIRECT) { if (!f->f_mapping->a_ops || ((!f->f_mapping->a_ops->direct_IO) && (!f->f_mapping->a_ops->get_xip_mem))) { fput(f); f = ERR_PTR(-EINVAL); } } ...but the f_ops were set to nfs_dir_operations. Weird...
Ok, I suspect we have both a client and server bug here: out_changed: /* * Big trouble! The inode has become a different object. */ printk(KERN_DEBUG "%s: inode %ld mode changed, %07o to %07o\n", __func__, inode->i_ino, inode->i_mode, fattr->mode); ...so, it sounds like the inode on the server got recycled, but maybe the server isn't keeping a proper i_generation counter. The filehandle should change when the server recycles the i_ino. Granted, none of that should cause the box to crash, so I think we have a client bug here too. We probably ought to be doing something more to ensure that we don't hit a problem when the S_IFMT changes suddenly like this. Benny, do you have a way to reproduce this at will?
If there is a way to reproduce this, it would be nice to see a stack trace too when that printk fires (just add a dump_stack() right after it). A capture of the network traffic and/or some nfs_debug output might also be interesting.
Ok, looking over the code... When nfs_update_inode throws this printk, it returns -ESTALE. That eventually bubbles back up to the caller, but almost all of the nfs_post_op_update_inode() callers ignore the return code. So I suspect what happened is that we had a directory inode cached with the same fh. We issued an open against a name and got back the same fh in the response, but the S_IFMT was different. So, nfs_update_inode throws back ESTALE but the i_fop is already set, and the upper layer open code can't do anything about it. Still looking at what a proper fix would be...
We're working on reproducing the server side bug with our server (though it's a bit tricky) and on fixing the issue so to correctly represent the inode generation number in the FH.
Thanks Benny... Looking over the code, I think the right solution is to simply check for a NULL filp->private_data in nfs_closedir. Checking now to see if this is something we need to push upstream too. I'll note that we already check for a NULL ctx in the NFS file f_op->release. It seems reasonable to do the same for directories.
Created attachment 703090 [details] patch -- don't allow nfs_find_actor to match inodes of the wrong type Actually, I think we might be best off doing something like this patch. This one is vs. mainline, but backporting to RHEL6 should be trivial. Thoughts?
Created attachment 703093 [details] patch -- don't allow nfs_find_actor to match inodes of the wrong type Ahem, I mean this patch...
Benny, if you find some way to reproduce this at will, could you test this patch and let me know if it prevents the oops?
So far we've been able to reproduce the warning by instrumenting the client and adding a delay during open yet we haven't been able to reproduce the crash. Will give the patch some soak time regardless.
Interesting. The trick I think will be to make sure you return the new attributes in the open. It may also be important to make sure you're doing the atomic open in a lookup and not in d_revalidate. IOW, make sure you don't touch the dentry until you issue the open call against it.
Yeah...from nfs_open_revalidate: /* We can't create new files in nfs_open_revalidate(), so we * optimize away revalidation of negative dentries. */ if (inode == NULL) { if (!nfs_neg_need_reval(dir, dentry, nd)) ret = 1; goto out; } ...so you need make sure the reproducer does something like this: 1) do a lookup and find a directory inode 2) then do an open() against a previously untouched dentry, and get back the same fh and fileid as the directory inode
(assuming of course that the dentry you're trying to open was a negative one)
Created attachment 703218 [details] patch -- don't allow nfs_find_actor to match inodes of the wrong type Revised patch (with some changes suggested by Trond). The S_IFMT checks are cheaper than comparing filehandles, so we might as well do those first. The fileid check should almost always come back negative if it's the wrong inode anyway. I'll go ahead and send this out soon regardless, but if you can confirm whether it fixes the oops, that would be ideal.
Went ahead and sent the patch to Trond so it will hopefully make 3.9. We'll see about getting this into 6.5 too. 6.4.z I'm not so sure about -- this is ultimately due to some strange behavior by the server, so I'm not sure how generally applicable this is. OTOH, it's a small, targeted fix that's unlikely to break anything so maybe it's reasonable.
Patch(es) available on kernel-2.6.32-375.el6
Verified by beaker job: https://beaker.engineering.redhat.com/jobs/517100 job call_trace data is not easy available. <4> <4>Pid: 6878, comm: open Not tainted 2.6.32-370.el6.x86_64 #1 IBM -[7160I35]-/00D8634 <4>RIP: 0010:[<ffffffffa02892c5>] [<ffffffffa02892c5>] nfs_closedir+0x15/0x30 [nfs] <4>RSP: 0018:ffff880476f4da78 EFLAGS: 00010296 <4>RAX: ffffffffa02892b0 RBX: 0000000000000000 RCX: 0000000000000003 <4>RDX: ffffffffa02c8720 RSI: ffff880473739e00 RDI: ffff880453b4da78 <4>RBP: ffff880476f4da88 R08: 0000000000000000 R09: 0000000000000000 <4>R10: 0000000000000441 R11: 0000000000400002 R12: 0000000040000010 <4>R13: ffff880453b4da78 R14: ffff88045ae79900 R15: ffff88047336ce80 <4>FS: 00007f0343cdd700(0000) GS:ffff880028300000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b <4>CR2: 0000000000000000 CR3: 00000004733b4000 CR4: 00000000000407e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process open (pid: 6878, threadinfo ffff880476f4c000, task ffff8804756e4ae0) <4>Stack: <4> 0000000040000010 ffff880473739e00 ffff880476f4dad8 ffffffff81182c65 <4><d> ffff880453b4da78 ffff88045ae79900 ffff880476f4dad8 ffffffffffffffea <4><d> ffff88047336ce80 ffff88045ae79900 ffffffffa02890a0 ffff880453b4da78 <4>Call Trace: <4> [<ffffffff81182c65>] __fput+0xf5/0x210 <4> [<ffffffffa02890a0>] ? do_open+0x0/0x20 [nfs] <4> [<ffffffff81182da5>] fput+0x25/0x30 <4> [<ffffffff8117e75e>] __dentry_open+0x27e/0x360 <4> [<ffffffff811c3f7a>] ? inotify_d_instantiate+0x2a/0x60 <4> [<ffffffff8117e9d9>] lookup_instantiate_filp+0x69/0x90 <4> [<ffffffffa028a679>] nfs_intent_set_file+0x59/0x90 [nfs] <4> [<ffffffffa028a86b>] nfs_atomic_lookup+0x1bb/0x310 [nfs] <4> [<ffffffff81190a45>] do_lookup+0x1a5/0x230 <4> [<ffffffff81191204>] __link_path_walk+0x734/0x1030 <4> [<ffffffff81143b17>] ? handle_pte_fault+0xf7/0xb50 <4> [<ffffffff81191d8a>] path_walk+0x6a/0xe0 <4> [<ffffffff81191f5b>] do_path_lookup+0x5b/0xa0 <4> [<ffffffff81182980>] ? get_empty_filp+0xa0/0x180 <4> [<ffffffff81192e9b>] do_filp_open+0xfb/0xdc0 <4> [<ffffffff8104759c>] ? __do_page_fault+0x1ec/0x480 <4> [<ffffffff8119fb32>] ? alloc_fd+0x92/0x160 <4> [<ffffffff8117e399>] do_sys_open+0x69/0x140 <4> [<ffffffff8117e4b0>] sys_open+0x20/0x30 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 f3 1c f8 ff 48 89 df e8 4b 03 ee e0 48 83 c4 08 31
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1645.html