Bug 518660

Summary: [regression] kernels following 2.6.27.12-170.2.5.fc10.i686 fail to boot on Thinkpad 600E
Product: [Fedora] Fedora Reporter: williamnorfleet2000
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 10CC: itamar, kernel-maint, vedran
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-12-18 09:40:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description williamnorfleet2000 2009-08-21 14:45:01 UTC
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.13) Gecko/2009080317 Fedora/3.0.13-1.fc10 Firefox/3.0.13

My Thinkpad 600E continues to work fine under kernel 2.6.27.12-170.2.5.fc10.i686. All subsequent kernels I have tried, including the most current (kernel 2.6.29.6-97.fc10.i686 from the updates-testing repo) fail to complete the boot process about 60% of the time in a variety of ways at a variety of points in the boot process. I often can find no trace of the problem in /var/log/messages or dmesg. The intermittent and variable nature of the problem makes formulating a coherent bug report difficult. As an example, boot sometimes fails immediately following startup of sm-client. The following console messages appear (I have manually transcribed this output and edited out large portions of the text, but, since I have no idea what this information means, I may have deleted important information - sorry):

<snip>
Starting sendmail: [OK]
Starting sm-client: [OK]
BUG: unable to handle kernel paging request at 01740000
IP: [<c043cb72>] find_pid_ns+0x44/0x5a
Oops: 0000 [#1] SMP
Modules linked in: sunrpc ipv6 <snip> pata-acpi [last unloaded: microcode]

Pid: 2282, comm: pidof Not tainted (2.6.27.29-170.2.78.fc10.i686 #1) 26454BU
<snip>
---[ end trace 46ddcb1d20fbcba6 ]---

No further console ouput occurs following this. The service that should start immediately following sm-client is crond.

Errors also occur in a similar fashion but at other points in the execution of startup scripts. Also, sometimes boot freezes after login of a user but before all the icons appear on the desktop. And sometimes starting Firefox causes a Firefox pid to be created, but firefox never actually opens a window, and the process can not be killed with "kill -9" (it's stuck in the kernel?). And sometimes a kerneloops is generated such as:

BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0523915>] list_del+0x9/0x60
*pde = 00000000
Oops: 0000 [#1] SMP
Modules linked in: fuse sunrpc ipv6 nf_conntrack_ftp dm_multipath uinput orinoco_cs orinoco hermes ppdev snd_cs4236 thinkpad_acpi snd_opl3_lib rfkill snd_hwdep snd_cs4236_lib snd_mpu401_uart hwmon snd_rawmidi snd_seq_dummy snd_cs4231_lib snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_piix4 snd_timer parport_pc pcspkr i2c_core snd parport soundcore floppy nsc_ircc ns558 irda crc_ccitt snd_page_alloc gameport video output yenta_socket rsrc_nonstatic ata_generic pata_acpi [last unloaded: microcode]

Pid: 1572, comm: kjournald Not tainted (2.6.27.29-170.2.78.fc10.i686 #1) 26454BU
EIP: 0060:[<c0523915>] EFLAGS: 00010206 CPU: 0
EIP is at list_del+0x9/0x60
EAX: 00000000 EBX: cb88f988 ECX: cba12180 EDX: c38e3000
ESI: 00000014 EDI: cd3d0a00 EBP: ccb15ed0 ESP: ccb15ecc
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process kjournald (pid: 1572, ti=ccb15000 task=ccc259b0 task.ti=ccb15000)
Stack: cbb02070 ccb15efc c04e4d7b cc458300 cb88f988 cbb02070 cd44f030 cb88f188
00000031 c79a2800 c79a2834 cd67c5e8 ccb15f88 c04e32c8 ccb15f54 cc43d540
cd3d0a00 cc458300 cd3d0a14 cd3d0ab8 00000000 cb88f800 c087c67c 00000000
Call Trace:
[<c04e4d7b>] ? journal_write_revoke_records+0xd1/0x11c
[<c04e32c8>] ? journal_commit_transaction+0x5c2/0xc92
[<c043694d>] ? lock_timer_base+0x1f/0x3e
[<c04369b4>] ? try_to_del_timer_sync+0x48/0x4f
[<c04e6090>] ? kjournald+0xbb/0x1ee
[<c043ef86>] ? autoremove_wake_function+0x0/0x33
[<c04e5fd5>] ? kjournald+0x0/0x1ee
[<c043ece3>] ? kthread+0x3b/0x61
[<c043eca8>] ? kthread+0x0/0x61
[<c040590b>] ? kernel_thread_helper+0x7/0x10
======================
======================Code: 53 08 8d 4b 04 8d 46 04 e8 75 00 00 00 8b 53 10 8d 4b 0c 8d 46 0c e8 67 00 00 00 5b 5e 5f 5d c3 90 90 55 89 e5 53 89 c3 8b 40 04 <8b> 00 39 d8 74 16 50 53 68 ed d3 77 c0 6a 30 68 27 d4 77 c0 e8
EIP: [<c0523915>] list_del+0x9/0x60 SS:ESP 0068:ccb15ecc
---[ end trace d09494b80408a4e6 ]---

Another example of failure to boot:

Aug 20 12:34:13 lap kernel: ------------[ cut here ]------------
Aug 20 12:34:13 lap kernel: kernel BUG at mm/vmalloc.c:79!
Aug 20 12:34:13 lap kernel: invalid opcode: 0000 [#1] SMP
Aug 20 12:34:13 lap kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:07.1/host0/target0:0:0/0:0:0:0/block/sda/size
Aug 20 12:34:13 lap kernel: Modules linked in: sunrpc ipv6 nf_conntrack_ftp dm_multipath uinput ppdev snd_seq_dummy thinkpad_a
cpi hwmon snd_cs4236 snd_seq_oss snd_opl3_lib snd_seq_midi_event snd_seq snd_hwdep snd_cs4236_lib snd_pcm_oss snd_wss_lib snd_
mixer_oss snd_pcm i2c_piix4 snd_timer snd_page_alloc pcspkr i2c_core snd_mpu401_uart irda snd_rawmidi snd_seq_device parport_p
c floppy snd ns558 parport crc_ccitt video soundcore gameport output yenta_socket rsrc_nonstatic ata_generic pata_acpi [last u
nloaded: microcode]
Aug 20 12:34:13 lap kernel:
Aug 20 12:34:13 lap kernel: Pid: 1812, comm: Xorg Not tainted (2.6.29.6-97.fc10.i686 #1) 26454BU
Aug 20 12:34:13 lap kernel: EIP: 0060:[<c048d23a>] EFLAGS: 00210212 CPU: 0
Aug 20 12:34:13 lap kernel: EIP is at vunmap_page_range+0x12/0x102
Aug 20 12:34:13 lap kernel: EAX: 00ecf000 EBX: cd369d80 ECX: 000ffa11 EDX: 008e0098
Aug 20 12:34:13 lap kernel: ESI: 00ecf000 EDI: cccd8e60 EBP: cccd8e20 ESP: cccd8e00
Aug 20 12:34:13 lap kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Aug 20 12:34:13 lap kernel: Process Xorg (pid: 1812, ti=cccd8000 task=cd071940 task.ti=cccd8000)
Aug 20 12:34:13 lap kernel: Stack:
Aug 20 12:34:13 lap kernel: 00000000 cf33dfff 008e0098 cf33e000 cd827cf4 cd369d80 000ffff2 cccd8e60
Aug 20 12:34:13 lap kernel: cccd8e40 c048d3ff cccd8e64 cdac8260 ccd2bae0 00000000 00000000 c11c6fac
Aug 20 12:34:13 lap kernel: cccd8e74 c048e52f 00000000 c08fffcc c11c6fd0 c0505d73 00000002 00000020
Aug 20 12:34:13 lap kernel: Call Trace:
Aug 20 12:34:13 lap kernel: [<c048d3ff>] ? __purge_vmap_area_lazy+0x72/0x118
Aug 20 12:34:13 lap kernel: [<c048e52f>] ? vm_unmap_aliases+0x11e/0x127
Aug 20 12:34:13 lap kernel: [<c0505d73>] ? inode_has_perm+0x58/0x62
Aug 20 12:34:13 lap kernel: [<c041ca7f>] ? change_page_attr_set_clr+0xd5/0x2d6
Aug 20 12:34:13 lap kernel: [<c041cd2f>] ? _set_memory_wb+0x19/0x1b
Aug 20 12:34:13 lap kernel: [<c041bd75>] ? ioremap_change_attr+0x22/0x24
Aug 20 12:34:13 lap kernel: [<c041e0ab>] ? phys_mem_access_prot_allowed+0xe9/0x17e
Aug 20 12:34:13 lap kernel: [<c0589d6a>] ? mmap_mem+0x31/0x88
Aug 20 12:34:13 lap kernel: [<c048a985>] ? mmap_region+0x243/0x3f2
Aug 20 12:34:13 lap kernel: [<c048ad70>] ? do_mmap_pgoff+0x23c/0x28c
Aug 20 12:34:13 lap kernel: [<c0406c84>] ? sys_mmap2+0x5a/0x7b
Aug 20 12:34:13 lap kernel: [<c0403dde>] ? syscall_call+0x7/0xb
Aug 20 12:34:13 lap kernel: Code: c2 02 74 02 66 ab f6 c2 01 74 01 aa 5f 5d c3 55 89 e5 e8 9b 9f 00 00 5d c3 55 89 e5 57 56 89
c6 53 83 ec 14 39 d0 89 55 e8 72 04 <0f> 0b eb fe c1 e8 16 8d 3c 85 00 00 00 00 8b 45 e8 03 3d c8 61
Aug 20 12:34:13 lap kernel: EIP: [<c048d23a>] vunmap_page_range+0x12/0x102 SS:ESP 0068:cccd8e00
Aug 20 12:34:13 lap kernel: ---[ end trace 92aa3e6369c423ed ]---

memtest completes a pass with no errors. Resetting the BIOS and re-disabling Quickboot has no effect. Adding "boot_delay=5" to the kernel line in grub.conf does not help.  All recent kernels work fine in my hands on a home-brew AMD Duron desktop and a Dell Inspiron 7500, so this problem is somehow Thinkpad-centric. Just for fun I tried booting with the wlan pcmcia card removed, but no improvement.

Reproducible: Sometimes

Steps to Reproduce:
1. Boot with a kernel more recent than 2.6.27.12-170.2.5.fc10.i686


Actual Results:  
Failure of the boot process at a variety of points about 60% of the time.

Expected Results:  
Boot into normal desktop environment.

Comment 1 Vedran Miletić 2009-08-26 13:56:35 UTC
Please retest with Fedora 11.

Comment 2 Chuck Ebbert 2009-10-12 01:16:51 UTC
This is a long shot, but can you try booting with the kernel option 'noclflush' and see if that works around the problem?

Comment 3 williamnorfleet2000 2009-10-12 22:45:24 UTC
Thank you for the suggestion, Mr. Ebbert!  Alas, appending "linux noclflush" to the kernel line for kernel 2.6.29.6-99.fc10.i686 in grub.conf still results in a failure to boot about 40% of the time.  Some kernel oopses are (looks like the same or similar bug concerning memory allocation):

http://www.kerneloops.org/submitresult.php?number=801107
http://www.kerneloops.org/submitresult.php?number=801155

Hope this helps.

Comment 4 Vedran Miletić 2009-11-01 12:56:45 UTC
Reporter, if possible, please retest with one of the nightly composes
http://alt.fedoraproject.org/pub/alt/nightly-composes/desktop/
and report back whether this is still an issue.

Comment 5 Bug Zapper 2009-11-18 12:12:21 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Bug Zapper 2009-12-18 09:40:54 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.