Bug 161597 - sysfs_remove_dir() de-references NULL pointer
sysfs_remove_dir() de-references NULL pointer
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
All Linux
medium Severity high
: ---
: ---
Assigned To: Pete Zaitcev
Brian Brock
:
: 158580 (view as bug list)
Depends On:
Blocks: 168429
  Show dependency treegraph
 
Reported: 2005-06-24 14:42 EDT by Bob Miller
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version: RHSA-2006-0132
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-07 14:11:22 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix for sysfs_remove_dir() panic. (915 bytes, patch)
2005-06-24 14:45 EDT, Bob Miller
no flags Details | Diff

  None (edit)
Description Bob Miller 2005-06-24 14:42:44 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, like Gecko)

Description of problem:
The code that does the assignment of parent_sd in sysfs_remove_dir() needs to 
wait until after dentry as been tested and proven to point to something.  It 
is valid for some kobjects in the system to not have sysfs entries.  The 
attached patch fixes the problem. 

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.EL

How reproducible:
Always

Steps to Reproduce:
Close a kernel device that doesn't have a sysfs entry. 

Actual Results:  The system panics 

Expected Results:  Not panic. 

Additional info:
Comment 1 Bob Miller 2005-06-24 14:45:51 EDT
Created attachment 115949 [details]
Fix for sysfs_remove_dir() panic.
Comment 3 Pete Zaitcev 2005-08-18 15:35:17 EDT
It would be helpful if the requestor provided a bug description
and attached actual traces (unedited), for a couple of reasons.

First, I need a material to explain to reviewers how dentry can
be NULL here.

Second, this is a patch which floated around with conjunction
with a problem of devices which insist on failing revalidation
across add_disk(). If this is what Polyserve is doing, it's not
the right fix for them.

I know that the patch is included into 2.6.12. So are many other
things.
Comment 5 Bob Miller 2005-08-19 14:13:49 EDT
Sorry.  Don't have a kernel stack trace of this problem.  It only happened 
once before I figured out what was causing the problem and we've been testing 
with the attached patch.  I can't, off the top of my head, come up with a way 
that the dentry would be NULL. But the fact that there is a check for the 
condition and we've had a panic means that it is possible for dentry to be 
NULL and it shouldn't be de-referenced before that check. 
 
BTW.  I can't find any kernel.org kernel that is/was coded to de-reference 
dentry before the check.  So, it looks like it was added by redhat when a bug 
fix for a different problem was back ported from a 2.6.10 kernel.org kernel. 
Comment 6 Jason Baron 2005-08-19 14:20:51 EDT
*** Bug 158580 has been marked as a duplicate of this bug. ***
Comment 7 Guy Streeter 2005-08-19 15:42:20 EDT
Bug 158580, which was closed as a dup of this, has a stack traceback and some
additional analysis. I don't know why it is the one that got closed as a dup.
Comment 8 Jason Baron 2005-08-19 15:45:50 EDT
both have analysis would you prefer to dup them the other way. i don't really
care, tbh.
Comment 10 Charlie Brady 2005-08-27 20:02:38 EDT
Is this what you're after? This was reported by someone using a CentOS 4.1
kernel - 2.6.9-11.ELsmp. Panic followed removal of a USB memory stick. Original
report is at
https://sourceforge.net/tracker/index.php?func=detail&aid=1274847&group_id=96750&atid=615772


Aug 27 23:29:55 pc7 kernel: sdb : READ CAPACITY failed.
Aug 27 23:29:55 pc7 kernel: sdb : status=0, message=00, host=7, driver=00
Aug 27 23:29:55 pc7 kernel: sdb : sense not available.
Aug 27 23:29:55 pc7 kernel: sdb: Write Protect is off
Aug 27 23:29:55 pc7 kernel: sdb: Mode Sense: 00 00 00 00
Aug 27 23:29:55 pc7 kernel: sdb: assuming drive cache: write through
Aug 27 23:29:55 pc7 kernel: Unable to handle kernel NULL pointer dereference at
virtual
address 00000054
Aug 27 23:29:55 pc7 kernel:  printing eip:
Aug 27 23:29:55 pc7 kernel: c018901a
Aug 27 23:29:55 pc7 kernel: *pde = 35c3a001
Aug 27 23:29:55 pc7 kernel: Oops: 0000 [#1]
Aug 27 23:29:55 pc7 kernel: SMP
Aug 27 23:29:55 pc7 kernel: Modules linked in: usb_storage appletalk(U) ipt_ULOG
ipt_REJECT ipt_MASQUERADE ipt_state ipt_TOS ip_nat_ftp ip_conntrack_ftp
iptable_mangle iptable_nat ip_conntrack iptable_filter ip_tables joydev button
battery ac uhci_hcd ohci_hcd ehci_hcd hw_random e1000 sg dm_snapshot dm_zero
dm_mirror ext3 jbd raid1 dm_mod mptscsih mptbase sd_mod scsi_mod
Aug 27 23:29:55 pc7 kernel: CPU:    1
Aug 27 23:29:55 pc7 kernel: EIP:    0060:[sysfs_remove_dir+31/249]    Not
tainted VLI
Aug 27 23:29:55 pc7 kernel: EIP:    0060:[<c018901a>]    Not tainted VLI
Aug 27 23:29:55 pc7 kernel: EFLAGS: 00010246   (2.6.9-11.ELsmp)
Aug 27 23:29:55 pc7 kernel: EIP is at sysfs_remove_dir+0x1f/0xf9
Aug 27 23:29:55 pc7 kernel: eax: c2215190   ebx: c2215190   ecx: f7fff480   edx:
c1000000
Aug 27 23:29:55 pc7 kernel: esi: 00000002   edi: 00000000   ebp: f7fe7580   esp:
f5d16ec0
Aug 27 23:29:55 pc7 kernel: ds: 007b   es: 007b   ss: 0068
Aug 27 23:29:55 pc7 kernel: Process hald (pid: 3620, threadinfo=f5d16000
task=f5d791b0)
Aug 27 23:29:55 pc7 kernel: Stack: c1000000 c2215190 00000002 00000000 f7fe7580
c01b5ac4 c2215190 c01b5ad4
Aug 27 23:29:55 pc7 kernel:        f5d16000 c018638b f5db3400 f5db3400 f882266c
f5d1a900 f5d16000 f5d1a900
Aug 27 23:29:55 pc7 kernel:        00000000 f5e8a580 c015cda2 f5d1a90c f7fe7580
f8825e80 00000000 f5e8a580
Aug 27 23:29:55 pc7 kernel: Call Trace:
Aug 27 23:29:55 pc7 kernel:  [kobject_del+22/30] kobject_del+0x16/0x1e
Aug 27 23:29:55 pc7 kernel:  [<c01b5ac4>] kobject_del+0x16/0x1e
Aug 27 23:29:55 pc7 kernel:  [kobject_unregister+8/16] kobject_unregister+0x8/0x10
Aug 27 23:29:55 pc7 kernel:  [<c01b5ad4>] kobject_unregister+0x8/0x10
Aug 27 23:29:55 pc7 kernel:  [rescan_partitions+76/253] rescan_partitions+0x4c/0xfd
Aug 27 23:29:55 pc7 kernel:  [<c018638b>] rescan_partitions+0x4c/0xfd
Aug 27 23:29:55 pc7 kernel:  [<f882266c>] sd_open+0xe3/0xf6 [sd_mod]
Aug 27 23:29:55 pc7 kernel:  [do_open+655/952] do_open+0x28f/0x3b8
Aug 27 23:29:55 pc7 kernel:  [<c015cda2>] do_open+0x28f/0x3b8
Aug 27 23:29:55 pc7 kernel:  [blkdev_open+26/66] blkdev_open+0x1a/0x42
Aug 27 23:29:55 pc7 kernel:  [<c015cf4d>] blkdev_open+0x1a/0x42
Aug 27 23:29:55 pc7 kernel:  [dentry_open+205/421] dentry_open+0xcd/0x1a5
Aug 27 23:29:55 pc7 kernel:  [<c015560e>] dentry_open+0xcd/0x1a5
Aug 27 23:29:55 pc7 kernel:  [filp_open+54/60] filp_open+0x36/0x3c
Aug 27 23:29:55 pc7 kernel:  [<c015553b>] filp_open+0x36/0x3c
Aug 27 23:29:55 pc7 kernel:  [__cond_resched+20/57] __cond_resched+0x14/0x39
Aug 27 23:29:55 pc7 kernel:  [<c02c597f>] __cond_resched+0x14/0x39
Aug 27 23:29:56 pc7 kernel:  [direct_strncpy_from_user+62/93]
direct_strncpy_from_user+0x3e/0x5d
Aug 27 23:29:56 pc7 kernel:  [<c01b8516>] direct_strncpy_from_user+0x3e/0x5d
Aug 27 23:29:56 pc7 kernel:  [sys_open+49/125] sys_open+0x31/0x7d
Aug 27 23:29:56 pc7 kernel:  [<c015583a>] sys_open+0x31/0x7d
Aug 27 23:29:56 pc7 kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Aug 27 23:29:56 pc7 kernel:  [<c02c7377>] syscall_call+0x7/0xb
Aug 27 23:29:56 pc7 kernel: Code: 5f 5d e9 d2 0c fe ff e9 3c ff ff ff 55 57 56
53 52 8b
78 30 85 ff 74 11 8b 07 85 c0 75 08 0f 0b 1a 01 e2 89 2d c0 f0 ff 07 85 ff <8b>
6f 54 0f 84 cb 00 00 00 8b 77 10 31 c9 ba 6b 00 00 00 b8 6f
Aug 27 23:29:56 pc7 kernel:  <0>Fatal exception: panic in 5 seconds
Aug 27 23:29:56 pc7 kernel: usb 1-3.1: USB disconnect, address 4
Aug 27 23:37:02 pc7 syslogd 1.4.1: restart.
Aug 27 23:37:02 pc7 syslog: syslogd startup succeeded
Aug 27 23:37:02 pc7 syslog: ESC[60G
Comment 11 Pete Zaitcev 2005-08-28 00:39:40 EDT
Charlie, my request for information pertains to Polyserve.
It's a Polyserve's problem. We can work around it with the patch which
Bob attached helpfuly, and probably will. However, the correct course
of action is to find out what Polyserve does. Unfortunately, I cannot
look at their code, because it's proprietary. So Bob has to explain it
in English. I hope this clears the situation.

I know how exactly USB triggers this (by calling add_disk for a device
which fails revalidations immediately). Please see bug 153971 for specifics.
So, thanks for the effort, but unfortunately your update was not helpful.
Comment 22 Red Hat Bugzilla 2006-03-07 14:11:22 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0132.html

Note You need to log in before you can comment on or make changes to this bug.