561275 – kernel: serious ugliness in iget() uses by nfsd [mrg-1]

Bug 561275 - kernel: serious ugliness in iget() uses by nfsd [mrg-1]

Summary: kernel: serious ugliness in iget() uses by nfsd [mrg-1]

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise MRG
Classification:	Red Hat
Component:	realtime-kernel
Sub Component:
Version:	Development
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	1.2.5
Target Release:	---
Assignee:	Luis Claudio R. Goncalves
QA Contact:	David Sommerseth
Docs Contact:
URL:
Whiteboard:	impact=important,source=redhat,report...
Depends On:	189918 245197
Blocks:	176344 189920 245198 561268
TreeView+	depends on / blocked

Reported:	2010-02-03 08:51 UTC by Eugene Teo (Security Response)
Modified:	2016-05-22 23:29 UTC (History)
CC List:	21 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:	189918
Environment:
Last Closed:	2010-03-23 15:43:06 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2010:0161	0	normal	SHIPPED_LIVE	Important: kernel-rt security and bug fix update	2010-03-23 15:42:25 UTC

Description Eugene Teo (Security Response) 2010-02-03 08:51:40 UTC

+++ This bug was initially created as a clone of Bug #189918 +++

Scenario 1:

A: calls ext3_new_inode(), blocks in e.g. new_inode()
knfsd: gets an fhandle with inumber ext3_new_inode() will pick
knfsd: calls iget(sb, ino)
knfsd: allocates and hashes inode (locked, new), calls ext3_read_inode(), blocks
A: allocates inode, fills it, hashes, has it written to cache
knfsd: comes back, gets the data left by A, happily fills its struct inode
=> we have two in-core inodes with the same inode number, both in use (by normal
dcache and by anon dentry held by nfsd).

Scenario 2:

same, except that A fails in xattr allocation (after having inode inserted into
hash) and knfsd comes and finds the inode in a normal fashion - via icache.
inode is pinned down by nfsd (and has no ACL or selinux label, BTW).

Scenario 3:
knfsd: does iget(), gets preempted before it can call make_bad_inode()
A: allocates inode with the same inumber, hashes it
the latter gets evicted (not too hard - just rmdir on non-empty parent),
later normal lookup finds half-done one from knfsd and blocks (it's still locked).
knfsd: comes back, marks that puppy bad, unhashes and unlocks
normal lookup gets a pile of crap instead of inode.

Similar fun exists for other exportable filesystems.  On top of that, failure
exits in ext3_new_inode() are leaking like hell - block quota not freed if we'd
allocated xattr block for ACL, etc., but that's ext3-specific (ext2 has
smaller-scale analog of that).

--- Additional comment from holtmann on 2006-04-28 05:51:32 EDT ---

Proposed fix from Al Viro:

* have callers of find_inode()/find_inode_fast() check if the inode they've got
is still in hash after they'd finished wait_on_inode(). If it isn't (i.e. we'd
raced with ->read_inode() called by somebody before us unhashing the inode) -
act as if we hadn't found it at all.

* have iget() check if after ->read_inode() the sucker is unhashed, iput() and
return NULL if it is (that, BTW, simplifies life for export_iget()).

* have foo_new_inode() on affected filesystems use iget_locked() after they
figure out the inode number.  That shall give us a (new,locked) in-core inode
_and_ guarantee that there won't be aliasing issues. Then we fill it and instead
of insert_inode_hash() do unlock_new_inode() in the very end. Or
make_bad_inode()/unlock_new_inode()/iput() on failure exit (with explicit
cleanup rather than relying of foo_delete_inode(); needed anyway since we need
to do cleanups after halfway-failed inode creation).

Comment 2 Eugene Teo (Security Response) 2010-02-03 08:52:52 UTC

Luis, looks like these patches can be applied onto rt kernel directly.

[PATCH] Add an ERR_CAST() function to complement ERR_PTR and co.
d1bc8e95445224276d7896b8b08cbb0b28a0ca80
v2.6.25-rc1~388
[PATCH] bugfix: two read_inode() calls without clear_inode() call between
4120db47198d21d8cd3b2cdbbe1ea6118a50bcd4
v2.6.13-rc3~53
[PATCH] igrab() should check for I_CLEAR
4a3b0a490d49ada8bbf3f426be1a0ace4dcd0a55
v2.6.21-rc1~91^2~363
[PATCH] iget: stop EXT2 from using iget() and read_inode()
52fcf7032935b33158e3998ed399cac97447ab8d
v2.6.25-rc1~377
[PATCH] iget: stop EXT3 from using iget() and read_inode()
473043dcee1874aab99f66b0362b344618eb3790
v2.6.25-rc1~376
[PATCH] iget: stop FAT from using iget() and read_inode()
17f95a7b4416a2c61e35f51b29eaaf1818fb5d7d
v2.6.25-rc1~374
[PATCH] iget: stop EXT4 from using iget() and read_inode()
1d1fe1ee02b9ac2660995b10e35dd41448fef011
v2.6.25-rc1~375
[PATCH] iget: stop ISOFS from using read_inode
c4386c83bf849c56b1f49951595aeb7c9a719d21
v2.6.25-rc1~370
[PATCH] iget: stop JFFS2 from using iget() and read_inode()
5451f79f5f817880958ed063864ad268d94ccd1f
v2.6.25-rc1~369
[PATCH] preparation to jfs iget sanitizing
?? see rhel-4 patch
[PATCH] iget: stop JFS from using iget
eab1df71a0ef6d333b9b826deaa0d0eb4b4f69dc
v2.6.25-rc1~368
[PATCH] iget: stop FUSE from using iget() and read_inode()
fa300b1914f892196acb385677047bc978466de7
v2.6.25-rc1~372
iget: stop PROCFS from using iget() and read_inode()
a1d4aebbfa91c55a6b0c629a9ccf6369be0c6e95
v2.6.25-rc1~366
[PATCH] nfsd/create race fixes, infrastructure
261bca86ed4f7f391d1938167624e78da61dcc6b
v2.6.29-rc1~549
[PATCH] fs: make sure data stored into inode is properly seen before
unlocking new inode
580be0837a7a59b207c3d5c661d044d8dd0a6a30
v2.6.32-rc1~608
[PATCH] ext3/4 with synchronous writes gets wedged by Postfix
72a43d63cb51057393edfbcfc4596066205ad15d
v2.6.30~18^2
[PATCH] nfsd race fixes: ext2
41080b5a240113328c607f22b849f653373db0ce
v2.6.29-rc1~549
[PATCH] nfsd race fixes: ext3
c38012daa7ad902a39a4213ba2b3fe50e81157ea
v2.6.29-rc1~549
nfsd race fixes: ext4
6b38e842bb832a3dbeb17e382404aef3c40ac5f9
v2.6.29-rc1~549
[PATCH] nfsd race fixes: reiserfs
c1eaa26b671299b3ec01d40c6c71ee19a4f81517
v2.6.29-rc1~549
[PATCH] nfsd race fixes: jfs
1f3403fa640f9f7b135dee79f2d39d01c8ad4a08
v2.6.29-rc1~549

Comment 6 David Sommerseth 2010-03-12 18:18:39 UTC

Verified by code review.  Found patch mentioned in comment #1 applied to the
2.6.24.7-149 src.rpm.  Started a 96 hour load and stress test in the -149 kernel.  Awaiting final results from this test to conclude.

Comment 9 errata-xmlrpc 2010-03-23 15:43:06 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0161.html

Note You need to log in before you can comment on or make changes to this bug.

aviro
bhu
dhoward
dhowells
esandeen
jbaron
jcm
jskrabal
kwirth
lgoncalv
lwang
mjc
ovasik
plyons
rwheeler
sct
security-response-team
steved
tburke
vmayatsk
williams