Bug 128794 (IT_45140)

Summary: [RHEL3]Oops in de_put due to proc_dir_entry inconsistencies
Product: Red Hat Enterprise Linux 3 Reporter: Kevin W. Rudd <solgato>
Component: kernelAssignee: Alexander Viro <aviro>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3.0CC: anderson, petrides, riel, tao
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-12-20 20:55:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 123574    

Description Kevin W. Rudd 2004-07-29 17:01:49 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113

Description of problem:
There are inconsistencies in the proc filesystem's proc_dir_entry that
can cause a proc_dir_entry to be freed while some other process still
holds a reference to it.  This is being seen as panics with the
following fingerprint:

Unable to handle kernel NULL pointer dereference at virtual address
00000004
 printing eip:
0229c38f
*pde = 00003001
*pte = 00000000
Oops: 0000
netconsole parport_pc lp parport autofs tg3 floppy sg keybdev mousedev
hid input
 usb-uhci usbcore ext3 jbd ips qla2300 aic7xxx sd_mod scsi_mod  
CPU:    4
EIP:    0060:[<0229c38f>]    Not tainted
EFLAGS: 00010097

EIP is at vsnprintf [kernel] 0x2df (2.4.21-15.0.2.ELhugemem/i686)
eax: 00000004   ebx: 0000000a   ecx: 00000004   edx: fffffffe
esi: 02472f6e   edi: 00000000   ebp: 0247335f   esp: d68cfedc
ds: 0068   es: 0068   ss: 0068
Process atop (pid: 12106, stackpage=d68cf000)
Stack: 00006807 00000006 00017201 0210ade2 00000000 00000000 00000011
00000002 
       ffffffff ffffffff 92cd0000 00000246 1120b800 a33d0a80 02128dee
02472f60 
       00000400 022b8063 d68cff40 92cd0000 d68ce000 1120b800 0218f399
022b8054 
Call Trace:   [<0210ade2>] __down [kernel] 0x72 (0xd68cfee8)
[<02128dee>] printk [kernel] 0x7e (0xd68cff14)
[<0218f399>] de_put [kernel] 0xa9 (0xd68cff34)
[<0218f3c0>] proc_delete_inode [kernel] 0x0 (0xd68cff44)
[<0217d8ef>] iput [kernel] 0x14f (0xd68cff48)
[<0217a749>] dput [kernel] 0xc9 (0xd68cff64)
[<0216289a>] __fput [kernel] 0xba (0xd68cff78)
[<02160bae>] filp_close [kernel] 0x8e (0xd68cff94)
[<02160c56>] sys_close [kernel] 0x66 (0xd68cffb0)

The following bit of code is responsible for this particular panic:

static void de_put(struct proc_dir_entry *de)
{
        if (de) {
                lock_kernel();
                if (!atomic_read(&de->count)) {
                        printk("de_put: entry %s already free!\n",
de->name);
                        unlock_kernel();
                        return;
                }

There are a couple of problems with this particular check.  If the
entry has indeed already been freed, then any structure elements
within it should not be referenced as they are by definition no longer
valid.  At best, you end up with messages like the following printed
to the console:

 de_put: entry <NULL> already free!

Worse, if the new value of what was once de->name is not null, the
system panics.  The potentially nasty situation comes in when the
entry has been freed, but the new value of what was once de->count is
not 0.  Then, the next line will actually corrupt someone else's data
and potentially free it:

                if (atomic_dec_and_test(&de->count)) {
                        if (de->deleted) {
                                printk("de_put: deferred delete of %s\n",
                                        de->name);
                                free_proc_entry(de);
                        }
                }


Version-Release number of selected component (if applicable):
2.4.21-15.0.2.ELhugemem/i686

How reproducible:
Always

Steps to Reproduce:
1. Start up a large number of processes (in the environment that the
panic has been seen, 12000 processes were started.
2. Run atop.
3. Stop the processes.
    

Actual Results:  Top appears to race with the deletion of the
associated PID proc entries, and the system panics.

Expected Results:  No panic ;-)

Additional info:

Comment 1 Wendy Cheng 2004-09-03 15:48:42 UTC
Dave,

This issue also reported via support channel....

The difficulty here is that this piece of code is terribly designed
and implemented. First, the proc_dir_entry is allocated via "malloc".
That's a very bad start - should use a fixed set of slab cache objects
rather than a general purpose one (I think). Now let's examine one
possible race condition:

thread A calls remove_proc_entry()
thread A does an atomic_read(&de->count) and finds it is zero
          thread B calls proc_get_inode() and got the "de" pionter
thread A does a "free_proc_entry" - which does a free(de).
("de" now "returns" to malloc pool - God knows who gets it and what it
has done to that piece of memory)
          thread B still has the "de" so it'll trash someone's memory
by either
          thread B does de_get() which incrases de->count (trashes
someone's memory) or.
          thread B does a de_put() which'll either crash and/or
trashes someone's memory.

The "de" needs to get from a fixed set of slab cache and we need a
lock to protect the structure. 

What do you think ?


Comment 5 Alexander Viro 2004-09-13 22:48:30 UTC
First of all, we shouldn't be using any proc_dir_entry for per-process
stuff.  What's more, exclusion should *prevent* proc_get_inode() after
we have removed the entry from lists, which happens before the final
de_put().  BKL used to provide that exclusion, IIRC.  Whether it still
does so or not is the question.
Special slab vs. kmalloc() is a red herring - either we get it right
(in which case it doesn't matter) or we do not (in which case nasty
stuff will happen anyway).

Comment 6 Ernie Petrides 2004-09-24 09:40:28 UTC
A fix for this problem has just been committed to the RHEL3 U4
patch pool this evening (in kernel version 2.4.21-20.11.EL).


Comment 7 John Flanagan 2004-12-20 20:55:48 UTC
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-550.html