Bug 561275 - kernel: serious ugliness in iget() uses by nfsd [mrg-1]
Summary: kernel: serious ugliness in iget() uses by nfsd [mrg-1]
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: Development
Hardware: All
OS: Linux
high
high
Target Milestone: 1.2.5
: ---
Assignee: Luis Claudio R. Goncalves
QA Contact: David Sommerseth
URL:
Whiteboard: impact=important,source=redhat,report...
Depends On: 189918 245197
Blocks: 176344 189920 245198 561268
TreeView+ depends on / blocked
 
Reported: 2010-02-03 08:51 UTC by Eugene Teo (Security Response)
Modified: 2016-05-22 23:29 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 189918
Environment:
Last Closed: 2010-03-23 15:43:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0161 0 normal SHIPPED_LIVE Important: kernel-rt security and bug fix update 2010-03-23 15:42:25 UTC

Description Eugene Teo (Security Response) 2010-02-03 08:51:40 UTC
+++ This bug was initially created as a clone of Bug #189918 +++

Scenario 1:

A: calls ext3_new_inode(), blocks in e.g. new_inode()
knfsd: gets an fhandle with inumber ext3_new_inode() will pick
knfsd: calls iget(sb, ino)
knfsd: allocates and hashes inode (locked, new), calls ext3_read_inode(), blocks
A: allocates inode, fills it, hashes, has it written to cache
knfsd: comes back, gets the data left by A, happily fills its struct inode
=> we have two in-core inodes with the same inode number, both in use (by normal
dcache and by anon dentry held by nfsd).

Scenario 2:

same, except that A fails in xattr allocation (after having inode inserted into
hash) and knfsd comes and finds the inode in a normal fashion - via icache.
inode is pinned down by nfsd (and has no ACL or selinux label, BTW).

Scenario 3:
knfsd: does iget(), gets preempted before it can call make_bad_inode()
A: allocates inode with the same inumber, hashes it
the latter gets evicted (not too hard - just rmdir on non-empty parent),
later normal lookup finds half-done one from knfsd and blocks (it's still locked).
knfsd: comes back, marks that puppy bad, unhashes and unlocks
normal lookup gets a pile of crap instead of inode.

Similar fun exists for other exportable filesystems.  On top of that, failure
exits in ext3_new_inode() are leaking like hell - block quota not freed if we'd
allocated xattr block for ACL, etc., but that's ext3-specific (ext2 has
smaller-scale analog of that).

--- Additional comment from holtmann on 2006-04-28 05:51:32 EDT ---

Proposed fix from Al Viro:

* have callers of find_inode()/find_inode_fast() check if the inode they've got
is still in hash after they'd finished wait_on_inode(). If it isn't (i.e. we'd
raced with ->read_inode() called by somebody before us unhashing the inode) -
act as if we hadn't found it at all.

* have iget() check if after ->read_inode() the sucker is unhashed, iput() and
return NULL if it is (that, BTW, simplifies life for export_iget()).

* have foo_new_inode() on affected filesystems use iget_locked() after they
figure out the inode number.  That shall give us a (new,locked) in-core inode
_and_ guarantee that there won't be aliasing issues. Then we fill it and instead
of insert_inode_hash() do unlock_new_inode() in the very end. Or
make_bad_inode()/unlock_new_inode()/iput() on failure exit (with explicit
cleanup rather than relying of foo_delete_inode(); needed anyway since we need
to do cleanups after halfway-failed inode creation).

Comment 2 Eugene Teo (Security Response) 2010-02-03 08:52:52 UTC
Luis, looks like these patches can be applied onto rt kernel directly.

[PATCH] Add an ERR_CAST() function to complement ERR_PTR and co.
d1bc8e95445224276d7896b8b08cbb0b28a0ca80
v2.6.25-rc1~388
[PATCH] bugfix: two read_inode() calls without clear_inode() call between
4120db47198d21d8cd3b2cdbbe1ea6118a50bcd4
v2.6.13-rc3~53
[PATCH] igrab() should check for I_CLEAR
4a3b0a490d49ada8bbf3f426be1a0ace4dcd0a55
v2.6.21-rc1~91^2~363
[PATCH] iget: stop EXT2 from using iget() and read_inode()
52fcf7032935b33158e3998ed399cac97447ab8d
v2.6.25-rc1~377
[PATCH] iget: stop EXT3 from using iget() and read_inode()
473043dcee1874aab99f66b0362b344618eb3790
v2.6.25-rc1~376
[PATCH] iget: stop FAT from using iget() and read_inode()
17f95a7b4416a2c61e35f51b29eaaf1818fb5d7d
v2.6.25-rc1~374
[PATCH] iget: stop EXT4 from using iget() and read_inode()
1d1fe1ee02b9ac2660995b10e35dd41448fef011
v2.6.25-rc1~375
[PATCH] iget: stop ISOFS from using read_inode
c4386c83bf849c56b1f49951595aeb7c9a719d21
v2.6.25-rc1~370
[PATCH] iget: stop JFFS2 from using iget() and read_inode()
5451f79f5f817880958ed063864ad268d94ccd1f
v2.6.25-rc1~369
[PATCH] preparation to jfs iget sanitizing
?? see rhel-4 patch
[PATCH] iget: stop JFS from using iget
eab1df71a0ef6d333b9b826deaa0d0eb4b4f69dc
v2.6.25-rc1~368
[PATCH] iget: stop FUSE from using iget() and read_inode()
fa300b1914f892196acb385677047bc978466de7
v2.6.25-rc1~372
iget: stop PROCFS from using iget() and read_inode()
a1d4aebbfa91c55a6b0c629a9ccf6369be0c6e95
v2.6.25-rc1~366
[PATCH] nfsd/create race fixes, infrastructure
261bca86ed4f7f391d1938167624e78da61dcc6b
v2.6.29-rc1~549
[PATCH] fs: make sure data stored into inode is properly seen before
unlocking new inode
580be0837a7a59b207c3d5c661d044d8dd0a6a30
v2.6.32-rc1~608
[PATCH] ext3/4 with synchronous writes gets wedged by Postfix
72a43d63cb51057393edfbcfc4596066205ad15d
v2.6.30~18^2
[PATCH] nfsd race fixes: ext2
41080b5a240113328c607f22b849f653373db0ce
v2.6.29-rc1~549
[PATCH] nfsd race fixes: ext3
c38012daa7ad902a39a4213ba2b3fe50e81157ea
v2.6.29-rc1~549
nfsd race fixes: ext4
6b38e842bb832a3dbeb17e382404aef3c40ac5f9
v2.6.29-rc1~549
[PATCH] nfsd race fixes: reiserfs
c1eaa26b671299b3ec01d40c6c71ee19a4f81517
v2.6.29-rc1~549
[PATCH] nfsd race fixes: jfs
1f3403fa640f9f7b135dee79f2d39d01c8ad4a08
v2.6.29-rc1~549

Comment 6 David Sommerseth 2010-03-12 18:18:39 UTC
Verified by code review.  Found patch mentioned in comment #1 applied to the
2.6.24.7-149 src.rpm.  Started a 96 hour load and stress test in the -149 kernel.  Awaiting final results from this test to conclude.

Comment 9 errata-xmlrpc 2010-03-23 15:43:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0161.html


Note You need to log in before you can comment on or make changes to this bug.