From Bugzilla Helper: User-Agent: Mozilla/5.0 (compatible; Konqueror/3.3; Linux) (KHTML, like Gecko) Description of problem: The code that does the assignment of parent_sd in sysfs_remove_dir() needs to wait until after dentry as been tested and proven to point to something. It is valid for some kobjects in the system to not have sysfs entries. The attached patch fixes the problem. Version-Release number of selected component (if applicable): kernel-2.6.9-5.EL How reproducible: Always Steps to Reproduce: Close a kernel device that doesn't have a sysfs entry. Actual Results: The system panics Expected Results: Not panic. Additional info:
Created attachment 115949 [details] Fix for sysfs_remove_dir() panic.
It would be helpful if the requestor provided a bug description and attached actual traces (unedited), for a couple of reasons. First, I need a material to explain to reviewers how dentry can be NULL here. Second, this is a patch which floated around with conjunction with a problem of devices which insist on failing revalidation across add_disk(). If this is what Polyserve is doing, it's not the right fix for them. I know that the patch is included into 2.6.12. So are many other things.
Sorry. Don't have a kernel stack trace of this problem. It only happened once before I figured out what was causing the problem and we've been testing with the attached patch. I can't, off the top of my head, come up with a way that the dentry would be NULL. But the fact that there is a check for the condition and we've had a panic means that it is possible for dentry to be NULL and it shouldn't be de-referenced before that check. BTW. I can't find any kernel.org kernel that is/was coded to de-reference dentry before the check. So, it looks like it was added by redhat when a bug fix for a different problem was back ported from a 2.6.10 kernel.org kernel.
*** Bug 158580 has been marked as a duplicate of this bug. ***
Bug 158580, which was closed as a dup of this, has a stack traceback and some additional analysis. I don't know why it is the one that got closed as a dup.
both have analysis would you prefer to dup them the other way. i don't really care, tbh.
Is this what you're after? This was reported by someone using a CentOS 4.1 kernel - 2.6.9-11.ELsmp. Panic followed removal of a USB memory stick. Original report is at https://sourceforge.net/tracker/index.php?func=detail&aid=1274847&group_id=96750&atid=615772 Aug 27 23:29:55 pc7 kernel: sdb : READ CAPACITY failed. Aug 27 23:29:55 pc7 kernel: sdb : status=0, message=00, host=7, driver=00 Aug 27 23:29:55 pc7 kernel: sdb : sense not available. Aug 27 23:29:55 pc7 kernel: sdb: Write Protect is off Aug 27 23:29:55 pc7 kernel: sdb: Mode Sense: 00 00 00 00 Aug 27 23:29:55 pc7 kernel: sdb: assuming drive cache: write through Aug 27 23:29:55 pc7 kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000054 Aug 27 23:29:55 pc7 kernel: printing eip: Aug 27 23:29:55 pc7 kernel: c018901a Aug 27 23:29:55 pc7 kernel: *pde = 35c3a001 Aug 27 23:29:55 pc7 kernel: Oops: 0000 [#1] Aug 27 23:29:55 pc7 kernel: SMP Aug 27 23:29:55 pc7 kernel: Modules linked in: usb_storage appletalk(U) ipt_ULOG ipt_REJECT ipt_MASQUERADE ipt_state ipt_TOS ip_nat_ftp ip_conntrack_ftp iptable_mangle iptable_nat ip_conntrack iptable_filter ip_tables joydev button battery ac uhci_hcd ohci_hcd ehci_hcd hw_random e1000 sg dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod mptscsih mptbase sd_mod scsi_mod Aug 27 23:29:55 pc7 kernel: CPU: 1 Aug 27 23:29:55 pc7 kernel: EIP: 0060:[sysfs_remove_dir+31/249] Not tainted VLI Aug 27 23:29:55 pc7 kernel: EIP: 0060:[<c018901a>] Not tainted VLI Aug 27 23:29:55 pc7 kernel: EFLAGS: 00010246 (2.6.9-11.ELsmp) Aug 27 23:29:55 pc7 kernel: EIP is at sysfs_remove_dir+0x1f/0xf9 Aug 27 23:29:55 pc7 kernel: eax: c2215190 ebx: c2215190 ecx: f7fff480 edx: c1000000 Aug 27 23:29:55 pc7 kernel: esi: 00000002 edi: 00000000 ebp: f7fe7580 esp: f5d16ec0 Aug 27 23:29:55 pc7 kernel: ds: 007b es: 007b ss: 0068 Aug 27 23:29:55 pc7 kernel: Process hald (pid: 3620, threadinfo=f5d16000 task=f5d791b0) Aug 27 23:29:55 pc7 kernel: Stack: c1000000 c2215190 00000002 00000000 f7fe7580 c01b5ac4 c2215190 c01b5ad4 Aug 27 23:29:55 pc7 kernel: f5d16000 c018638b f5db3400 f5db3400 f882266c f5d1a900 f5d16000 f5d1a900 Aug 27 23:29:55 pc7 kernel: 00000000 f5e8a580 c015cda2 f5d1a90c f7fe7580 f8825e80 00000000 f5e8a580 Aug 27 23:29:55 pc7 kernel: Call Trace: Aug 27 23:29:55 pc7 kernel: [kobject_del+22/30] kobject_del+0x16/0x1e Aug 27 23:29:55 pc7 kernel: [<c01b5ac4>] kobject_del+0x16/0x1e Aug 27 23:29:55 pc7 kernel: [kobject_unregister+8/16] kobject_unregister+0x8/0x10 Aug 27 23:29:55 pc7 kernel: [<c01b5ad4>] kobject_unregister+0x8/0x10 Aug 27 23:29:55 pc7 kernel: [rescan_partitions+76/253] rescan_partitions+0x4c/0xfd Aug 27 23:29:55 pc7 kernel: [<c018638b>] rescan_partitions+0x4c/0xfd Aug 27 23:29:55 pc7 kernel: [<f882266c>] sd_open+0xe3/0xf6 [sd_mod] Aug 27 23:29:55 pc7 kernel: [do_open+655/952] do_open+0x28f/0x3b8 Aug 27 23:29:55 pc7 kernel: [<c015cda2>] do_open+0x28f/0x3b8 Aug 27 23:29:55 pc7 kernel: [blkdev_open+26/66] blkdev_open+0x1a/0x42 Aug 27 23:29:55 pc7 kernel: [<c015cf4d>] blkdev_open+0x1a/0x42 Aug 27 23:29:55 pc7 kernel: [dentry_open+205/421] dentry_open+0xcd/0x1a5 Aug 27 23:29:55 pc7 kernel: [<c015560e>] dentry_open+0xcd/0x1a5 Aug 27 23:29:55 pc7 kernel: [filp_open+54/60] filp_open+0x36/0x3c Aug 27 23:29:55 pc7 kernel: [<c015553b>] filp_open+0x36/0x3c Aug 27 23:29:55 pc7 kernel: [__cond_resched+20/57] __cond_resched+0x14/0x39 Aug 27 23:29:55 pc7 kernel: [<c02c597f>] __cond_resched+0x14/0x39 Aug 27 23:29:56 pc7 kernel: [direct_strncpy_from_user+62/93] direct_strncpy_from_user+0x3e/0x5d Aug 27 23:29:56 pc7 kernel: [<c01b8516>] direct_strncpy_from_user+0x3e/0x5d Aug 27 23:29:56 pc7 kernel: [sys_open+49/125] sys_open+0x31/0x7d Aug 27 23:29:56 pc7 kernel: [<c015583a>] sys_open+0x31/0x7d Aug 27 23:29:56 pc7 kernel: [syscall_call+7/11] syscall_call+0x7/0xb Aug 27 23:29:56 pc7 kernel: [<c02c7377>] syscall_call+0x7/0xb Aug 27 23:29:56 pc7 kernel: Code: 5f 5d e9 d2 0c fe ff e9 3c ff ff ff 55 57 56 53 52 8b 78 30 85 ff 74 11 8b 07 85 c0 75 08 0f 0b 1a 01 e2 89 2d c0 f0 ff 07 85 ff <8b> 6f 54 0f 84 cb 00 00 00 8b 77 10 31 c9 ba 6b 00 00 00 b8 6f Aug 27 23:29:56 pc7 kernel: <0>Fatal exception: panic in 5 seconds Aug 27 23:29:56 pc7 kernel: usb 1-3.1: USB disconnect, address 4 Aug 27 23:37:02 pc7 syslogd 1.4.1: restart. Aug 27 23:37:02 pc7 syslog: syslogd startup succeeded Aug 27 23:37:02 pc7 syslog: ESC[60G
Charlie, my request for information pertains to Polyserve. It's a Polyserve's problem. We can work around it with the patch which Bob attached helpfuly, and probably will. However, the correct course of action is to find out what Polyserve does. Unfortunately, I cannot look at their code, because it's proprietary. So Bob has to explain it in English. I hope this clears the situation. I know how exactly USB triggers this (by calling add_disk for a device which fails revalidations immediately). Please see bug 153971 for specifics. So, thanks for the effort, but unfortunately your update was not helpful.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html