Bug 455460 - kernel NULL pointer dereference in kobject_get_path
kernel NULL pointer dereference in kobject_get_path
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
All Linux
urgent Severity high
: rc
: ---
Assigned To: Jiri Pirko
Martin Jenner
: OtherQA, ZStream
Depends On:
Blocks: 432518 459776
  Show dependency treegraph
Reported: 2008-07-15 12:48 EDT by Robert N. Evans
Modified: 2015-05-04 21:15 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2009-01-20 14:38:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
PCI and USB devices before and after fault insertion (74.89 KB, text/plain)
2008-07-21 10:51 EDT, Robert N. Evans
no flags Details

  None (edit)
Description Robert N. Evans 2008-07-15 12:48:34 EDT
Description of problem:
kernel panic during surprise device removals:
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 [<ffffffff800579b5>] kobject_get_path+0x81/0xc1
PGD 3033d067 PUD 3033e067 PMD 0
Oops: 0000 [1] SMP
last sysfs file: /block/md0/md/dev-sdh1/slot
Modules linked in: nfs lockd fscache nfs_acl ppp_deflate ppp_async crc_ccitt
ppp_generic slhc deflate zlib_deflate af_key autofs4 hidp rfcomm l2cap bluetooth
sunrpc bonding(U) iscsi_tcp ib_iser libiscsi scsi_transport_iscsi ib_srp ib_sdp
ib_ipoib ipv6(U) xfrm_nalgo crypto_api rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm
ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror(U) dm_multipath(U) dm_mod(U)
video sbs backlight i2c_ec i2c_core button battery asus_acpi acpi_memhotplug ac
parport_pc lp parport txen(U) ipmi_devintf ftmod(PU) ipmi_msghandler vtm(FU)
sr_mod cdrom(U) sg(U) radeonfb(FU) fosil(U) e1000(U) i5000_edac edac_mc pcspkr
ata_piix(U) aic94xx(U) libsas(U) libata(U) scsi_transport_sas(U) sd_mod(U)
scsi_mod(U) raid1(U) ext3 jbd uhci_hcd(U) ohci_hcd ehci_hcd
Pid: 15278, comm: egrep Tainted: PF     2.6.18-92.el5 #1
RIP: 0010:[<ffffffff800579b5>]  [<ffffffff800579b5>] kobject_get_path+0x81/0xc1
RSP: 0018:ffff81006053de48  EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000016 RCX: ffffffffffffffff
RDX: 0000000000000016 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff81007ebb28e0 R08: ffff81006053c000 R09: 00007fff893edcf0
R10: 00000000487c2ef9 R11: 0000000000000206 R12: ffff81007ebb28e0
R13: ffff810041c466d8 R14: 0000000000000017 R15: 0000000000000000
FS:  00002ba0b33e67e0(0000) GS:ffff81007fe0ce40(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000006adba000 CR4: 00000000000006e0
Process egrep (pid: 15278, threadinfo ffff81006053c000, task ffff81003bff1040)
Stack:  ffff81003d46c540 ffffffff80292151 ffff810041c46000 ffff810041c467e8
 ffff81007e50fe40 0000000000000000 000000000268b000 ffffffff801f4657
 ffff81007e50fe40 ffff810041c467e8 00000000000000a9 0000000000008000
Call Trace:
 [<ffffffff801f4657>] input_devices_seq_show+0x2d/0x261
 [<ffffffff8003f209>] seq_read+0x1b8/0x28c
 [<ffffffff8000b337>] vfs_read+0xcb/0x171
 [<ffffffff80011715>] sys_read+0x45/0x6e
 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

Code: f2 ae 48 f7 d1 ff c9 29 cb 48 63 d1 48 63 fb 49 8d 3c 3c e8
RIP  [<ffffffff800579b5>] kobject_get_path+0x81/0xc1
 RSP <ffff81006053de48>

Version-Release number of selected component (if applicable):

How reproducible:
Unexpected removals of USB devices have caused this condition to occur.  We have
hit this panic a few times during testing at Stratus.

Actual results:
kernel panic

Expected results:
no kernel panic

Additional info:
Our analysis shows the failing code to be taking a strlen of a null k_name. 
This struct can be found from the address on the stack:
crash> eval ffff810041c467e8+fffffffffffff818
hexadecimal: ffff810041c46000
crash> eval ffff810041c46000+0x6d8
hexadecimal: ffff810041c466d8
crash> kobject ffff810041c466d8
struct kobject {
  k_name = 0x0,
  name = "input7535\000\000\000\000\000\000\000\000\000\000",
  kref = {
    refcount = {
      counter = 1

k_name is only NULL'd in kobject_cleanup.  

That gets called as follows:

382 void kobject_put(struct kobject * kobj)
383 {
384         if (kobj)
385                 kref_put(&kobj->kref, kobject_release);
386 }

 49  */
 50 int kref_put(struct kref *kref, void (*release)(struct kref *kref))
 51 {
 52         WARN_ON(release == NULL);
 53         WARN_ON(release == (void (*)(struct kref *))kfree);
 55         /*
 56          * if current count is one, we are the last user and can release
 57          * right now, avoiding an atomic operation on 'refcount'
 58          */
 59         if ((atomic_read(&kref->refcount) == 1) ||
 60             (atomic_dec_and_test(&kref->refcount))) {
 61                 release(kref);
 62                 return 1;
 63         }
 64         return 0;
 65 }

The optimization of doing the atomic_read of 1 made me nervous.  I've not seen
this before, and the fact that the current count is 1 seemed like a smoking
gun.  There could be a race condition here.

1.  Looked at the 2.6.9 sources.  This line of code was not there.  
2.  Looked at the 2.6.25 sources.  This line of code was not there either!

A co-worker noted that via google an article was found that shows a patch that
removed the check for refcount of 1.  Look backward in the thread to see that
there were lots of "false positive" issues with this, and I think this means
that this code path was falsely thinking the kref should get removed and was
clearing it out from under other code paths.
Comment 1 Robert N. Evans 2008-07-15 14:54:23 EDT
Googling for "kref refcnt and false positives" leads to the email chain where
this was discussed.  The patch to fix kref_put() appeared in 2.6.20:

--- gregkh-2.6.orig/lib/kref.c
+++ gregkh-2.6/lib/kref.c
@@ -52,12 +52,7 @@ int kref_put(struct kref *kref, void (*r
 	WARN_ON(release == NULL);
 	WARN_ON(release == (void (*)(struct kref *))kfree);
-	/*
-	 * if current count is one, we are the last user and can release object
-	 * right now, avoiding an atomic operation on 'refcount'
-	 */
-	if ((atomic_read(&kref->refcount) == 1) ||
-	    (atomic_dec_and_test(&kref->refcount))) {
+	if (atomic_dec_and_test(&kref->refcount)) {
 		return 1;
Comment 3 Jiri Pirko 2008-07-21 06:49:23 EDT
I need more info on how to reproduce this issue, so I can test my patch. What
USB device are you using? On what machine this issue occurs? Is it reproducible
on any other machine? Can you please explain reproduce steps in more details?
Comment 4 Robert N. Evans 2008-07-21 10:51:22 EDT
Created attachment 312267 [details]
PCI and USB devices before and after fault insertion

(In reply to comment #3)

Our machine has redundant USB root hubs.  The externally accessible USB bus is
connected to the active root hub.  When an IO fault occurs on the active IO
unit, its PCI devices including the active USB root hub are removed and the
externally USB bus is switched over to the standby root hub.

The attached listing shows all PCI devices and USB devices.  The active USB
root hub is PCI device 0b:1d.  Then a fault is inserted and PCI and USB devices
are again listed.  The listing shows PCI buses 03 through 0c are no longer
present after the fault.  And the active USB root hub now is the PCI device

This machine has 2GB of memory and /proc/cpuinfo shows 4 of these processors:
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU		  5130	@ 2.00GHz
stepping	: 6
cpu MHz 	: 2000.117
cache size	: 4096 KB

We are running the x86_64 kernel.

Usually the problem is hit when an IO fault has been inserted as discussed
above.	However, we have also hit this problem by simply switching which IO
unit is active.  That would includes moving all connected USB devices from one
root hub to another.  Thus, this panic should be possible on any x86_64 SMP
system when removing a USB hub with several downstream devices.  Although this
problem should be reproducible on non-Stratus hardware, but my experience shows
that we only hit it infrequently.  An automated test that does repeated USB
device removals may be required to verify this fix.
Comment 7 Andrius Benokraitis 2008-08-01 10:42:20 EDT
From Robert Evans at Stratus:

I have performed tests with this kernel that included over 1000 device
removals.  I was not able to reproduce the problem reported in bug
455460.  Please consider including this patch in RHEL5.3.
Comment 10 Don Zickus 2008-09-02 23:40:43 EDT
in kernel-2.6.18-107.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 13 Jim Paradis 2008-10-24 20:35:36 EDT
Short form: the fix described in Comment #1 does not fix this problem.

I have been assigned to investigate this issue for Stratus.  I have a reliable way to reproduce this system on a Stratus system.  In one window, run a tight loop that swaps the Active Console:

$ while true ; do /opt/ft/bin/ftsmaint acSwitch ; done

In another window, run a tight loop that reads /proc/bus/input/devices:

$ while true ; do cat /proc/bus/input/devices > /dev/null ; done

I have reliably crashed my system within ten minutes using this method.  More importantly, I *still* reliably crash the system even with a kernel that I've verified has the fix (kernel-2.6.18-92.1.13.el5; this is the errata kernel that we use to run the latest version of ftlinux).

We suspect that there may be something else going on.  In particular, it may be a longstanding locking issue in drivers/input/input.c.  The code that initializes the traversal of the input_dev_list on behalf of /proc carries the following comment:

static void *input_devices_seq_start(struct seq_file *seq, loff_t *pos)
        /* acquire lock here ... Yes, we do need locking, I knowi, I know... */

        return list_get_nth_element(&input_dev_list, pos);

Under normal circumstances on a running system, the input_dev_list has three entries on it.  *EVERY* time we crash in this manner and I examine the crash dump, there are two entries on the list.  This indicates that one entry has been removed.  This suggests that the traversal code is dereferencing a pointer relating to the deleted entry.

When I run this same test on the debug kernel, instead of dereferencing a NULL pointer, we get a GPF trying to dereference 0x6b6b6b6b6b6b6b6b.  This is the "slab poisoning" pattern indicating a block of memory that's been freed.  This add weight to the theory that the traversal code is traversing a list that is in the process of being modified by another processor.

Locking has been added to this code in a later kernel; I'll backport and see what happens...
Comment 14 Mike Gahagan 2008-10-27 10:27:28 EDT
setting back to assigned based on comment 13
Comment 15 Jim Paradis 2008-10-28 15:25:10 EDT
Mea culpa.... I've been informed that the problem I'm dealing with is a separate issue that presents in a similar manner to this one.  This bug should be set back to the state it was in (ON_QA) and I'll file my issue as a separate bug.
Comment 16 Chris Ward 2008-11-28 02:15:02 EST
Partners, this bug should be fixed in the latest RHEL 5.3 Snapshot. We believe that you have some interest in its correct functionality, so we're making a friendly request to send us some testing feedback. 

If you have a chance to test it, please share with us your findings. If you have successfully VERIFIED the fix, please add PartnerVerified to the Bugzilla keywords, along with a description of the results. Thanks!
Comment 17 Robert N. Evans 2008-12-02 12:45:25 EST
Fix was verified per comment 9.  Stratus did not have the opportunity to retest with a 5.3 beta snapshot, but source inspection of kref.c in kernel-2.6.18-124 verifies the fix is still present.
Comment 19 errata-xmlrpc 2009-01-20 14:38:49 EST
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.