Bug 205076

Summary: kernel oops at make_class_name during udev initialization
Product: [Fedora] Fedora Reporter: Christian Nolte <ch.nolte>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: pfrields, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-21 05:52:37 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch against libata (2174) none

Description Christian Nolte 2006-09-03 20:54:24 UTC
Description of problem:

I did a BIOS update on my ASUS M2N-SLI DELUXE ACPI BIOS Revision 0201 to
Revision 0307 today and went through some kind of hell. After the BIOS update
the ACPI-Table was screwed up which gave me a kernel-panic (IO-APIC + timer
doesn't work) on the installed 2.6.17-1.2174_FC5smp kernel. A reboot with
'noapic' resolved this issue. As I also had trouble with my SIL680 IDE
controller, I decided to remove the card from the system. After restarting the
system I got the following kernel-oops (which I wrote down as I have no second
PC to debug):

---
Starting udev: BUG: unable to handle kernel NULL pointer dereference at virtual
address 000000
printing eip:
 *pde = 3701c001
Oops : 0000 [#1]
SMP
last sysfs file: /class/input/input0/event0/dev

Modules linked in: sata_nv libata scsi_mod snd soundcore snd_page_alloc
dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd

CPU: 1
EIP: 0060:[<c05507e0>]  Not tainted VLI
EFLAGS: 00010206      (2.6.17-1.2174_FC5smp#1)
EIP is at make_class_name+0x29/0x88
eax: 00000000  ebx: ffffffff exc: ffffffff edx: 00000009
esi: f88f28fc  edi: 00000000 ebp: 00000000 esp: c215ad48
ds: 007b   es: 007b   ss:0068

Process modprobe (pid:405, threadinfo=x215a000 task=c2198bb0)

Stack: [...]

Call Trace
   class_device_del+0x91/0x14b  class_device_unregister+0x8/0x18 [libata]
   scsi_remove_host [scsi_mod]  ata_host_remove [libata]
   ata_device_add [libata]      nv_init_one [sata_nv]
   __driver_attach              pci_device_probe
   driver_probe_device          __driver_attach
   bus_for_each_dev             driver_attach
   __driver_attach              bus_add_driver
   __pci_register_driver        sys_init_module
   ata_port_start [libata]      vfs_read
   sysenter_past_esp

Code: [...]
EIP: [<c05507e0>] make_class_name+0x29/0x88  SS: ESP  0068:x21ad48
---

After a few reboots without further luck, I booted the non-smp kernel
2.6.17-1.2174_FC5 and to my surprise no error occured. The system was stable
again, but without effect on the smp kernel, which did not work anymore (the
same oops over and over). The next thing I did was to reinsert the SIL680
IDE-controller, which was a stupid idea because now even the non-smp kernel
stopped working with a similar oops (for which I did not have had the patience
to write it down). Even the removal of the controller did not solve anything.
The system was unusable. Using a rescue disk I tried the older kernel-version
2.6.16-1.2133_FC5smp with no luck and the same problem. 

The last thing I did was to install kernel 2.6.17-1.2611.fc6 which solved the
problem for me. This kernel has some other issues but at least it runs. Perhaps
reinitializing udev somehow would work around this problem, but I do not know
how to do this.

Unfortunately I have not much time at hand, so that I could find out which patch
solves the problem. 

Version-Release number of selected component (if applicable):

I tested this with the following kernels:

2.6.17-1.2174_FC5[smp]
2.6.16-1.2133_FC5[smp]

udev-084-13.fc5.2

Comment 1 David Lawrence 2006-09-05 15:58:55 UTC
Changing to proper owner, kernel-maint.

Comment 2 Christian Nolte 2006-09-05 20:33:58 UTC
I have diffed the changelog of the current development kernel-2.6.17-1.2611.fc6
with kernel-2.6.17-1.2174_FC5 and stumbled upon the libata-device_add patch (see
http://www.redhat.com/archives/fedora-cvs-commits/2006-August/msg00076.html)
which I have tried today against 2.6.17-1.2174_FC5 with no success and the same
oops.

I will try the other PATA/SATA patches from the current development kernel.

Comment 3 Christian Nolte 2006-09-10 11:17:28 UTC
Created attachment 135920 [details]
patch against libata (2174)

Comment 4 Christian Nolte 2006-09-10 11:18:30 UTC
I managed to get a patch together which gets kernel-2.6.17-1.2174_FC5 up running
again (but don't trust on what I have done here, this is ugly!). I made a diff
between libata from 2611 and 2174 and every dependent scsi-drivers. The only
thing I left untouched was the irq-handling macros IRQF_SHARED and IRQF_DISABLED
defined in include/linux/interrupt.h (2611).

The oops during udev initialization is gone now and no strange messages can be
found with dmesg, but some side effects have occured: Playing back sounds via
ESD (gnome) or players using the xine backend result in a looping playback of
the first second of the sound until the process is killed (mplayer has no such
issues). Furthermore it is not possible to start the bluetooth service (it
hangs). These problems could be related to what I have reported in bug #205479
for 2611.



Comment 5 Christian Nolte 2006-09-10 20:35:25 UTC
Referring to the original posting: putting some printk's in make_class_name()
shows that kobject_name(&class_dev->kobj) returns null for the class_device
which shall be removed via class_device_del(). I suppose that the problem lies
somewhere in the sata_nv-driver code, but I don't know where to begin here. Some
help would be appreciated.

Comment 6 Dave Jones 2006-10-16 18:23:57 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 7 Christian Nolte 2006-10-21 05:43:28 UTC
This BUG has been fixed with kernel 2.6.18-1.2200.fc5