128794 – (IT_45140) [RHEL3]Oops in de_put due to proc_dir_entry inconsistencies

Bug 128794 (IT_45140) - [RHEL3]Oops in de_put due to proc_dir_entry inconsistencies

Summary: [RHEL3]Oops in de_put due to proc_dir_entry inconsistencies

Keywords:
Status:	CLOSED ERRATA
Alias:	IT_45140
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Alexander Viro
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	123574
TreeView+	depends on / blocked

Reported:	2004-07-29 17:01 UTC by Kevin W. Rudd
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-12-20 20:55:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:550	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 3 Update 4	2004-12-20 05:00:00 UTC

Description Kevin W. Rudd 2004-07-29 17:01:49 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113

Description of problem:
There are inconsistencies in the proc filesystem's proc_dir_entry that
can cause a proc_dir_entry to be freed while some other process still
holds a reference to it.  This is being seen as panics with the
following fingerprint:

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
0229c38f
*pde = 00003001
*pte = 00000000
Oops: 0000
netconsole parport_pc lp parport autofs tg3 floppy sg keybdev mousedev
hid input
 usb-uhci usbcore ext3 jbd ips qla2300 aic7xxx sd_mod scsi_mod  
CPU:    4
EIP:    0060:[<0229c38f>]    Not tainted
EFLAGS: 00010097

EIP is at vsnprintf [kernel] 0x2df (2.4.21-15.0.2.ELhugemem/i686)
eax: 00000004   ebx: 0000000a   ecx: 00000004   edx: fffffffe
esi: 02472f6e   edi: 00000000   ebp: 0247335f   esp: d68cfedc
ds: 0068   es: 0068   ss: 0068
Process atop (pid: 12106, stackpage=d68cf000)
Stack: 00006807 00000006 00017201 0210ade2 00000000 00000000 00000011
00000002 
       ffffffff ffffffff 92cd0000 00000246 1120b800 a33d0a80 02128dee
02472f60 
       00000400 022b8063 d68cff40 92cd0000 d68ce000 1120b800 0218f399
022b8054 
Call Trace:   [<0210ade2>] __down [kernel] 0x72 (0xd68cfee8)
[<02128dee>] printk [kernel] 0x7e (0xd68cff14)
[<0218f399>] de_put [kernel] 0xa9 (0xd68cff34)
[<0218f3c0>] proc_delete_inode [kernel] 0x0 (0xd68cff44)
[<0217d8ef>] iput [kernel] 0x14f (0xd68cff48)
[<0217a749>] dput [kernel] 0xc9 (0xd68cff64)
[<0216289a>] __fput [kernel] 0xba (0xd68cff78)
[<02160bae>] filp_close [kernel] 0x8e (0xd68cff94)
[<02160c56>] sys_close [kernel] 0x66 (0xd68cffb0)

The following bit of code is responsible for this particular panic:

static void de_put(struct proc_dir_entry *de)
{
        if (de) {
                lock_kernel();
                if (!atomic_read(&de->count)) {
                        printk("de_put: entry %s already free!\n",
de->name);
                        unlock_kernel();
                        return;
                }

There are a couple of problems with this particular check.  If the
entry has indeed already been freed, then any structure elements
within it should not be referenced as they are by definition no longer
valid.  At best, you end up with messages like the following printed
to the console:

 de_put: entry <NULL> already free!

Worse, if the new value of what was once de->name is not null, the
system panics.  The potentially nasty situation comes in when the
entry has been freed, but the new value of what was once de->count is
not 0.  Then, the next line will actually corrupt someone else's data
and potentially free it:

                if (atomic_dec_and_test(&de->count)) {
                        if (de->deleted) {
                                printk("de_put: deferred delete of %s\n",
                                        de->name);
                                free_proc_entry(de);
                        }
                }


Version-Release number of selected component (if applicable):
2.4.21-15.0.2.ELhugemem/i686

How reproducible:
Always

Steps to Reproduce:
1. Start up a large number of processes (in the environment that the
panic has been seen, 12000 processes were started.
2. Run atop.
3. Stop the processes.
    

Actual Results:  Top appears to race with the deletion of the
associated PID proc entries, and the system panics.

Expected Results:  No panic ;-)

Additional info:

Comment 1 Wendy Cheng 2004-09-03 15:48:42 UTC

Dave,

This issue also reported via support channel....

The difficulty here is that this piece of code is terribly designed
and implemented. First, the proc_dir_entry is allocated via "malloc".
That's a very bad start - should use a fixed set of slab cache objects
rather than a general purpose one (I think). Now let's examine one
possible race condition:

thread A calls remove_proc_entry()
thread A does an atomic_read(&de->count) and finds it is zero
          thread B calls proc_get_inode() and got the "de" pionter
thread A does a "free_proc_entry" - which does a free(de).
("de" now "returns" to malloc pool - God knows who gets it and what it
has done to that piece of memory)
          thread B still has the "de" so it'll trash someone's memory
by either
          thread B does de_get() which incrases de->count (trashes
someone's memory) or.
          thread B does a de_put() which'll either crash and/or
trashes someone's memory.

The "de" needs to get from a fixed set of slab cache and we need a
lock to protect the structure. 

What do you think ?

Comment 5 Alexander Viro 2004-09-13 22:48:30 UTC

First of all, we shouldn't be using any proc_dir_entry for per-process
stuff.  What's more, exclusion should *prevent* proc_get_inode() after
we have removed the entry from lists, which happens before the final
de_put().  BKL used to provide that exclusion, IIRC.  Whether it still
does so or not is the question.
Special slab vs. kmalloc() is a red herring - either we get it right
(in which case it doesn't matter) or we do not (in which case nasty
stuff will happen anyway).

Comment 6 Ernie Petrides 2004-09-24 09:40:28 UTC

A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.11.EL).

Comment 7 John Flanagan 2004-12-20 20:55:48 UTC

An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html

Note You need to log in before you can comment on or make changes to this bug.