|Summary:||kernel oops at make_class_name during udev initialization|
|Product:||[Fedora] Fedora||Reporter:||Christian Nolte <ch.nolte>|
|Component:||kernel||Assignee:||Dave Jones <davej>|
|Status:||CLOSED ERRATA||QA Contact:||Brian Brock <bbrock>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2006-10-21 05:52:37 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Christian Nolte 2006-09-03 20:54:24 UTC
Description of problem: I did a BIOS update on my ASUS M2N-SLI DELUXE ACPI BIOS Revision 0201 to Revision 0307 today and went through some kind of hell. After the BIOS update the ACPI-Table was screwed up which gave me a kernel-panic (IO-APIC + timer doesn't work) on the installed 2.6.17-1.2174_FC5smp kernel. A reboot with 'noapic' resolved this issue. As I also had trouble with my SIL680 IDE controller, I decided to remove the card from the system. After restarting the system I got the following kernel-oops (which I wrote down as I have no second PC to debug): --- Starting udev: BUG: unable to handle kernel NULL pointer dereference at virtual address 000000 printing eip: *pde = 3701c001 Oops : 0000 [#1] SMP last sysfs file: /class/input/input0/event0/dev Modules linked in: sata_nv libata scsi_mod snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd CPU: 1 EIP: 0060:[<c05507e0>] Not tainted VLI EFLAGS: 00010206 (2.6.17-1.2174_FC5smp#1) EIP is at make_class_name+0x29/0x88 eax: 00000000 ebx: ffffffff exc: ffffffff edx: 00000009 esi: f88f28fc edi: 00000000 ebp: 00000000 esp: c215ad48 ds: 007b es: 007b ss:0068 Process modprobe (pid:405, threadinfo=x215a000 task=c2198bb0) Stack: [...] Call Trace class_device_del+0x91/0x14b class_device_unregister+0x8/0x18 [libata] scsi_remove_host [scsi_mod] ata_host_remove [libata] ata_device_add [libata] nv_init_one [sata_nv] __driver_attach pci_device_probe driver_probe_device __driver_attach bus_for_each_dev driver_attach __driver_attach bus_add_driver __pci_register_driver sys_init_module ata_port_start [libata] vfs_read sysenter_past_esp Code: [...] EIP: [<c05507e0>] make_class_name+0x29/0x88 SS: ESP 0068:x21ad48 --- After a few reboots without further luck, I booted the non-smp kernel 2.6.17-1.2174_FC5 and to my surprise no error occured. The system was stable again, but without effect on the smp kernel, which did not work anymore (the same oops over and over). The next thing I did was to reinsert the SIL680 IDE-controller, which was a stupid idea because now even the non-smp kernel stopped working with a similar oops (for which I did not have had the patience to write it down). Even the removal of the controller did not solve anything. The system was unusable. Using a rescue disk I tried the older kernel-version 2.6.16-1.2133_FC5smp with no luck and the same problem. The last thing I did was to install kernel 2.6.17-1.2611.fc6 which solved the problem for me. This kernel has some other issues but at least it runs. Perhaps reinitializing udev somehow would work around this problem, but I do not know how to do this. Unfortunately I have not much time at hand, so that I could find out which patch solves the problem. Version-Release number of selected component (if applicable): I tested this with the following kernels: 2.6.17-1.2174_FC5[smp] 2.6.16-1.2133_FC5[smp] udev-084-13.fc5.2
Comment 1 David Lawrence 2006-09-05 15:58:55 UTC
Changing to proper owner, kernel-maint.
Comment 2 Christian Nolte 2006-09-05 20:33:58 UTC
I have diffed the changelog of the current development kernel-2.6.17-1.2611.fc6 with kernel-2.6.17-1.2174_FC5 and stumbled upon the libata-device_add patch (see http://www.redhat.com/archives/fedora-cvs-commits/2006-August/msg00076.html) which I have tried today against 2.6.17-1.2174_FC5 with no success and the same oops. I will try the other PATA/SATA patches from the current development kernel.
Comment 3 Christian Nolte 2006-09-10 11:17:28 UTC
Created attachment 135920 [details] patch against libata (2174)
Comment 4 Christian Nolte 2006-09-10 11:18:30 UTC
I managed to get a patch together which gets kernel-2.6.17-1.2174_FC5 up running again (but don't trust on what I have done here, this is ugly!). I made a diff between libata from 2611 and 2174 and every dependent scsi-drivers. The only thing I left untouched was the irq-handling macros IRQF_SHARED and IRQF_DISABLED defined in include/linux/interrupt.h (2611). The oops during udev initialization is gone now and no strange messages can be found with dmesg, but some side effects have occured: Playing back sounds via ESD (gnome) or players using the xine backend result in a looping playback of the first second of the sound until the process is killed (mplayer has no such issues). Furthermore it is not possible to start the bluetooth service (it hangs). These problems could be related to what I have reported in bug #205479 for 2611.
Comment 5 Christian Nolte 2006-09-10 20:35:25 UTC
Referring to the original posting: putting some printk's in make_class_name() shows that kobject_name(&class_dev->kobj) returns null for the class_device which shall be removed via class_device_del(). I suppose that the problem lies somewhere in the sata_nv-driver code, but I don't know where to begin here. Some help would be appreciated.
Comment 6 Dave Jones 2006-10-16 18:23:57 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Comment 7 Christian Nolte 2006-10-21 05:43:28 UTC
This BUG has been fixed with kernel 2.6.18-1.2200.fc5