Description of problem: kernel panic during surprise device removals: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: [<ffffffff800579b5>] kobject_get_path+0x81/0xc1 PGD 3033d067 PUD 3033e067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /block/md0/md/dev-sdh1/slot CPU 2 Modules linked in: nfs lockd fscache nfs_acl ppp_deflate ppp_async crc_ccitt ppp_generic slhc deflate zlib_deflate af_key autofs4 hidp rfcomm l2cap bluetooth sunrpc bonding(U) iscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_srp ib_sdp ib_ipoib ipv6(U) xfrm_nalgo crypto_api rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror(U) dm_multipath(U) dm_mod(U) video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac parport_pc lp parport txen(U) ipmi_devintf ftmod(PU) ipmi_msghandler vtm(FU) sr_mod cdrom(U) sg(U) radeonfb(FU) fosil(U) e1000(U) i5000_edac edac_mc pcspkr ata_piix(U) aic94xx(U) libsas(U) libata(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3 jbd uhci_hcd(U) ohci_hcd ehci_hcd Pid: 15278, comm: egrep Tainted: PF 2.6.18-92.el5 #1 RIP: 0010:[<ffffffff800579b5>] [<ffffffff800579b5>] kobject_get_path+0x81/0xc1 RSP: 0018:ffff81006053de48 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000016 RCX: ffffffffffffffff RDX: 0000000000000016 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff81007ebb28e0 R08: ffff81006053c000 R09: 00007fff893edcf0 R10: 00000000487c2ef9 R11: 0000000000000206 R12: ffff81007ebb28e0 R13: ffff810041c466d8 R14: 0000000000000017 R15: 0000000000000000 FS: 00002ba0b33e67e0(0000) GS:ffff81007fe0ce40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000006adba000 CR4: 00000000000006e0 Process egrep (pid: 15278, threadinfo ffff81006053c000, task ffff81003bff1040) Stack: ffff81003d46c540 ffffffff80292151 ffff810041c46000 ffff810041c467e8 ffff81007e50fe40 0000000000000000 000000000268b000 ffffffff801f4657 ffff81007e50fe40 ffff810041c467e8 00000000000000a9 0000000000008000 Call Trace: [<ffffffff801f4657>] input_devices_seq_show+0x2d/0x261 [<ffffffff8003f209>] seq_read+0x1b8/0x28c [<ffffffff8000b337>] vfs_read+0xcb/0x171 [<ffffffff80011715>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Code: f2 ae 48 f7 d1 ff c9 29 cb 48 63 d1 48 63 fb 49 8d 3c 3c e8 RIP [<ffffffff800579b5>] kobject_get_path+0x81/0xc1 RSP <ffff81006053de48> Version-Release number of selected component (if applicable): 2.6.18-92.el5 How reproducible: Unexpected removals of USB devices have caused this condition to occur. We have hit this panic a few times during testing at Stratus. Actual results: kernel panic Expected results: no kernel panic Additional info: Our analysis shows the failing code to be taking a strlen of a null k_name. This struct can be found from the address on the stack: crash> eval ffff810041c467e8+fffffffffffff818 hexadecimal: ffff810041c46000 crash> eval ffff810041c46000+0x6d8 hexadecimal: ffff810041c466d8 crash> kobject ffff810041c466d8 struct kobject { k_name = 0x0, name = "input7535\000\000\000\000\000\000\000\000\000\000", kref = { refcount = { counter = 1 } }, ... k_name is only NULL'd in kobject_cleanup. That gets called as follows: */ 382 void kobject_put(struct kobject * kobj) 383 { 384 if (kobj) 385 kref_put(&kobj->kref, kobject_release); 386 } 49 */ 50 int kref_put(struct kref *kref, void (*release)(struct kref *kref)) 51 { 52 WARN_ON(release == NULL); 53 WARN_ON(release == (void (*)(struct kref *))kfree); 54 55 /* 56 * if current count is one, we are the last user and can release object 57 * right now, avoiding an atomic operation on 'refcount' 58 */ 59 if ((atomic_read(&kref->refcount) == 1) || 60 (atomic_dec_and_test(&kref->refcount))) { 61 release(kref); 62 return 1; 63 } 64 return 0; 65 } The optimization of doing the atomic_read of 1 made me nervous. I've not seen this before, and the fact that the current count is 1 seemed like a smoking gun. There could be a race condition here. 1. Looked at the 2.6.9 sources. This line of code was not there. 2. Looked at the 2.6.25 sources. This line of code was not there either! A co-worker noted that via google an article was found that shows a patch that removed the check for refcount of 1. Look backward in the thread to see that there were lots of "false positive" issues with this, and I think this means that this code path was falsely thinking the kref should get removed and was clearing it out from under other code paths.
Googling for "kref refcnt and false positives" leads to the email chain where this was discussed. The patch to fix kref_put() appeared in 2.6.20: --- gregkh-2.6.orig/lib/kref.c +++ gregkh-2.6/lib/kref.c @@ -52,12 +52,7 @@ int kref_put(struct kref *kref, void (*r WARN_ON(release == NULL); WARN_ON(release == (void (*)(struct kref *))kfree); - /* - * if current count is one, we are the last user and can release object - * right now, avoiding an atomic operation on 'refcount' - */ - if ((atomic_read(&kref->refcount) == 1) || - (atomic_dec_and_test(&kref->refcount))) { + if (atomic_dec_and_test(&kref->refcount)) { release(kref); return 1; }
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=f334b60b43a0927f4ab1187cbdb4582f5227c3b1
I need more info on how to reproduce this issue, so I can test my patch. What USB device are you using? On what machine this issue occurs? Is it reproducible on any other machine? Can you please explain reproduce steps in more details? Thanks.
Created attachment 312267 [details] PCI and USB devices before and after fault insertion (In reply to comment #3) Our machine has redundant USB root hubs. The externally accessible USB bus is connected to the active root hub. When an IO fault occurs on the active IO unit, its PCI devices including the active USB root hub are removed and the externally USB bus is switched over to the standby root hub. The attached listing shows all PCI devices and USB devices. The active USB root hub is PCI device 0b:1d. Then a fault is inserted and PCI and USB devices are again listed. The listing shows PCI buses 03 through 0c are no longer present after the fault. And the active USB root hub now is the PCI device 7e:1d. This machine has 2GB of memory and /proc/cpuinfo shows 4 of these processors: vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz stepping : 6 cpu MHz : 2000.117 cache size : 4096 KB We are running the x86_64 kernel. Usually the problem is hit when an IO fault has been inserted as discussed above. However, we have also hit this problem by simply switching which IO unit is active. That would includes moving all connected USB devices from one root hub to another. Thus, this panic should be possible on any x86_64 SMP system when removing a USB hub with several downstream devices. Although this problem should be reproducible on non-Stratus hardware, but my experience shows that we only hit it infrequently. An automated test that does repeated USB device removals may be required to verify this fix.
From Robert Evans at Stratus: I have performed tests with this kernel that included over 1000 device removals. I was not able to reproduce the problem reported in bug 455460. Please consider including this patch in RHEL5.3.
in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Short form: the fix described in Comment #1 does not fix this problem. I have been assigned to investigate this issue for Stratus. I have a reliable way to reproduce this system on a Stratus system. In one window, run a tight loop that swaps the Active Console: $ while true ; do /opt/ft/bin/ftsmaint acSwitch ; done In another window, run a tight loop that reads /proc/bus/input/devices: $ while true ; do cat /proc/bus/input/devices > /dev/null ; done I have reliably crashed my system within ten minutes using this method. More importantly, I *still* reliably crash the system even with a kernel that I've verified has the fix (kernel-2.6.18-92.1.13.el5; this is the errata kernel that we use to run the latest version of ftlinux). We suspect that there may be something else going on. In particular, it may be a longstanding locking issue in drivers/input/input.c. The code that initializes the traversal of the input_dev_list on behalf of /proc carries the following comment: static void *input_devices_seq_start(struct seq_file *seq, loff_t *pos) { /* acquire lock here ... Yes, we do need locking, I knowi, I know... */ return list_get_nth_element(&input_dev_list, pos); } Under normal circumstances on a running system, the input_dev_list has three entries on it. *EVERY* time we crash in this manner and I examine the crash dump, there are two entries on the list. This indicates that one entry has been removed. This suggests that the traversal code is dereferencing a pointer relating to the deleted entry. When I run this same test on the debug kernel, instead of dereferencing a NULL pointer, we get a GPF trying to dereference 0x6b6b6b6b6b6b6b6b. This is the "slab poisoning" pattern indicating a block of memory that's been freed. This add weight to the theory that the traversal code is traversing a list that is in the process of being modified by another processor. Locking has been added to this code in a later kernel; I'll backport and see what happens...
setting back to assigned based on comment 13
Mea culpa.... I've been informed that the problem I'm dealing with is a separate issue that presents in a similar manner to this one. This bug should be set back to the state it was in (ON_QA) and I'll file my issue as a separate bug.
Partners, this bug should be fixed in the latest RHEL 5.3 Snapshot. We believe that you have some interest in its correct functionality, so we're making a friendly request to send us some testing feedback. If you have a chance to test it, please share with us your findings. If you have successfully VERIFIED the fix, please add PartnerVerified to the Bugzilla keywords, along with a description of the results. Thanks!
Fix was verified per comment 9. Stratus did not have the opportunity to retest with a 5.3 beta snapshot, but source inspection of kref.c in kernel-2.6.18-124 verifies the fix is still present.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html