Red Hat Bugzilla – Bug 201471
Kernel crashes on turion x2 laptop
Last modified: 2009-07-14 13:22:01 EDT
Description of problem:
There appears to either be a crash caused by tsc timer drift (which is somewhat
common on AMD x2 machines) as it crashes on boot with a message of milliseconds
(+/- with the exact same number on both cores, just opposite sign) between the cores
there is a locking issue. I have only received a few errors. The machine is
legacy free so I cannot do a log to a serial port or printer. However, I am
attaching the few error messages that I get. clock=pit doesn't work, it makes it
worse. maxcpus=1 is what I am doing now and it is stable other than a problem
with X (another bug).
Older kernels have fewer problems, test 1 rescue disc from FC6 works ...
reasonably well. I don't know what the difference is.
This is with kernel 2.6.17-2517. It happens with both x86-64 and 32. I am
currently running 32 as it is FAR more stable (completely separate installs, so
no, I don't have library mismatch).
Created attachment 133699 [details]
syslog of the few errors I get (nothing ever shows on screen)
maxcpus=1 was still a little unstable. However, with 2564 I was able to do
noapic and nolapic and get stability. With nolapic I lost some functionality
(powernow for example). I am now runnin noapic and things appear stable.
What do I need to do fo help debug apic as it works fine in windows?
2583 still scrashed without noapic. Would a dmidecode dump help?
I am not a very advanced user but
I would just like to note that I
have had the same/similar problems
with FC5-64 and FC6test2-64. I
will try FC6test2-32 and report
This is on a Turion notebook
(HP Pavilion dv6040us).
ok, mine is dv6045nr. From what I have been told all dv6000 series are the same.
Mine has the webcam, yours may not. We have the same wifi chipset, video, etc.
Hard drive sizes and memory may also be different, but I believe the cpu is also
the same (not just series, but the actual model).
Yes, the dv6040us has a webcam.
I could not care less whether the webcam
works under linux, however. I just want
the machine to work basically.
FC5-32 seems to be slightly more stable.
Now that I can boot the darn thing,
someone please let me know what system
info I should post here, and I will.
(And please give me the command I need
to produce it, thanks.)
Would you reccommend trying F6-test2-32 ?
At this point I don't even want to run
pup for fear of breaking something. I was
even contemplating a severe downgrade
(to like FC3) as an experiment.
(Can you email me or post here what I
have to do to turn off apic, if this
seems to help?)
Also, and here is some information that may
actually be useful to folks: Knoppix boots
and seems to be rock-solid. It's just not
a very practical solution in the long-run.
It was my friend's disk, I think the June
2006 release of Knoppix.
The webcam is the zrsomethingxx driver. It is currently not v4l2 (v4l was
removed), it will be soon.
Boot with option "noapic" on the command line (don't use nolapic so that you
keep speed step and other power management features). This does seem to fix 99%
of all my stability. I have issues with suspend/hibernate not resuming properly
(black screen or freeze and black screen depending on what options I give X and
I am running 32bit as the 64 was just too unstable.
As for system info, I think since we have the same machine (memory and disk size
being the difference it looks like), if they ask for dmidecode we should both
give it, what else they will want, I do not know.
title Fedora Core (2.6.17-1.2586.fc6)
kernel /vmlinuz-2.6.17-1.2586.fc6 ro root=/dev/VolGroup00/LogVol00 quiet
rhgb <-- put noapic here.
This is a clip from my desktop grub.conf... therefore it doesn't have the noapic.
Ok, starting in 259x kernels, I could often get into X. When I can (having some
trouble with 2600), it works great provided I do not try to put the machine to
sleep or switch to another console. (This may be the same bug as 201482.)
If I don't start X, and I am at the console, if I cat /var/log/messages several
times (some times even just once), the machine crashes.
Is this a locking bug? If so, why does it not crash with noapic? Does it change
enough of the timing or the way interrupts are handled?
If this is not a locking bug, why does it seem to only hit on output to the
console or switching virtual consoles? (Yes, this machine crashes on resume from
suspend with or without noapic, it may be related because it seems to be at the
point it would resume the screen... i.e. network is up, disks are up, etc. and
dpms work seems to lock the machine hard instead of just failing to bring up the
screen... which is what normally seems to happen.)
Is there a way to capture an oops without serial, parallel or screen? The
netdump packages don't seem to be functioning.
Slowly able to get oopses out of the crashes. Bug 205183 may or may not be
related to this one. However, it is the same machine. (One more coming shortly.)
Bug 205185 may or may not be related to this one. However, it is affecting the
functionality of the same machine.
Bug 205185 is now closed. That leaves the crash on resume from suspend and if I
don't add noapic. There are also some bugs with Intel HDA audio which I will add
Ok, it seems that the crash seems to happen on VC switch when noapic is not
provided. Most of the time I now boot fine, rhgb starts. The system crashes a
lot when rhgb hands off to gdm. It will always crash if I switch from X to a
text mode VC.
Finally, I was able to get some trace backs out of my logs. I believe this is a
resume from suspend or suspend problem and not a hibernate/resume from hibernate
problem, as hibernate works and there were several of those in my logs without
Oct 18 12:51:12 mysystem kernel: BUG: sleeping function called from invalid
context at kernel/rwsem.c:20
Oct 18 12:51:12 mysystem kernel: in_atomic():0, irqs_disabled():1
Oct 18 12:51:12 mysystem kernel: [<c04051db>] dump_trace+0x69/0x1af
Oct 18 12:51:12 mysystem kernel: [<c0405339>] show_trace_log_lvl+0x18/0x2c
Oct 18 12:51:12 mysystem kernel: [<c04058ed>] show_trace+0xf/0x11
Oct 18 12:51:12 mysystem kernel: [<c04059ea>] dump_stack+0x15/0x17
Oct 18 12:51:12 mysystem kernel: [<c0439446>] down_read+0x12/0x20
Oct 18 12:51:12 mysystem kernel: [<c0431601>] blocking_notifier_call_chain+0xe/0x29
Oct 18 12:51:12 mysystem kernel: [<c05a9798>] cpufreq_resume+0x118/0x135
Oct 18 12:51:12 mysystem kernel: [<c0551440>] __sysdev_resume+0x20/0x53
Oct 18 12:51:12 mysystem kernel: [<c0551583>] sysdev_resume+0x16/0x47
Oct 18 12:51:12 mysystem kernel: [<c0555767>] device_power_up+0x5/0xa
Oct 18 12:51:12 mysystem kernel: [<c04418fd>] suspend_enter+0x3b/0x44
Oct 18 12:51:12 mysystem kernel: [<c0441a2c>] enter_state+0x126/0x176
Oct 18 12:51:12 mysystem kernel: [<c0441b01>] state_store+0x85/0x99
Oct 18 12:51:12 mysystem kernel: [<c04a5fe6>] subsys_attr_store+0x1e/0x22
Oct 18 12:51:14 mysystem kernel: [<c04a60d9>] sysfs_write_file+0xa7/0xce
Oct 18 12:51:14 mysystem kernel: [<c046f805>] vfs_write+0xa8/0x159
Oct 18 12:51:14 mysystem kernel: [<c046fe32>] sys_write+0x41/0x67
Oct 18 12:51:14 mysystem kernel: [<c0404013>] syscall_call+0x7/0xb
Oct 18 12:51:14 mysystem kernel: DWARF2 unwinder stuck at syscall_call+0x7/0xb
Oct 18 12:51:14 mysystem kernel: Leftover inexact backtrace:
Oct 18 12:51:14 mysystem kernel: =======================
Sorry, the oops came from: kernel-2.6.18-1.2798.fc6 (i686 version I believe).
The above oops may be like the acpi_cpufreq one that was fixed within the last
When this one gets fixed, please don't close this bug as there are apic or
locking issues that remain.
Largely, the last year of kernels have been better. I no longer need special
options on boot, except to setup a vgafb. If I don't, it crashes switching
between X and console. Occassionally, I still get odd crashes. I also can't
hibernate or sleep (due to crashes on resume). I haven't messed with the new
work arounds in rawhide yet. This bug may soon be closed.
I have a HP laptop that has the same issues. I see there hasn't been any
traffic on this ticket in a month, what can I do to help?
I have an HP dualcore Turion running Rawhide and it locks up randomly unless I
use "noapic noirqdebug". Nobody has the answer for this problem...
I don't have an answer. I am the original reporter.
noapic, nolapic, and noirqdebug are not a solution to why the kernel doesn't run
properly on this hardware.
Unfortunately, it looks like the debugging Trever has done seems to be the only
work posted in this bug.
I get hard lockups (SysRQ doesn't work) w/o noapic. w/ noapic, I lose USB
ports. I've tried booting w/ apic=debug ignore_loglevel vga=0x0f07 to try to
see where the laptop locks up, but it locks up so solid that the kernel doesn't
print an oops.
I'm a little lost on what to do here, I posted to LKML and got zero response.
We seem to be getting little attention here aswell.
AMD has some driver updates to CPU frequency scaling for the Turion processors,
I'm going to see if I can compile a custom kernel w/o any CPU frequency scaling
and see if that has any affect.
(In reply to comment #21)
> I get hard lockups (SysRQ doesn't work) w/o noapic. w/ noapic, I lose USB
And post your hardware information (make and model of system.)
Created attachment 253781 [details]
dv6408nr noapic noirqdebug info
126.96.36.199-10.fc7 w/ noapic noirqdebug on hp pavilion dv6408nr turion x2 amd
noapic noirqdebug works better (doesn't lock up) but /proc/interrupts shows the
error counter steadilly rising. USB (problem device) and the multimedia hotkeys
trigger an increase in error interrupts.
The laptop is a HP Pavilion dv6409nr. Attached is a small tarball containing
Created attachment 254751 [details]
dv6408nr acpi dsdt dissasembly
Created attachment 254771 [details]
dv6408nr all acpi tables (dsdt, apic, hpet, etc) binary & disassembled
Just installed Fedora 8/i386, I have the same problems as I did under Fedora
7/i386. For whatever reason, x86_64 kernels seem to boot better on this
hardware. I downloaded both Fedora 8 i386 and x86_64, and will try out the
x86_64 disto later and provide the same info as above.
I have lost patience with this and I am dumping
the machine, which is now over 14 months old.
Good luck, everybody.
bump? or something? hello?
Looks like this may be one in the same problem. How can I apply the patch from
this thread to a FC8 kernel? I'd like to test to see if this solves problems.
Changing version to 8 since bug exists there.
(In reply to comment #30)
> Looks like this may be one in the same problem. How can I apply the patch from
> this thread to a FC8 kernel? I'd like to test to see if this solves problems.
I tried the patch and while it did let me use tickless, the system still locks
up after a while. Only "noapic noirqdebug" seems to work.
The recent kernels seem to be better, but once in a great while I still get
problems. (This is on a newer machine as the 6045nr from HP died like most
others sold at that time... garbage.)
The problem is the lapic timer is broken because of the C1E p-state. If you
boot with 'nolapic_timer clocksource=hpet hpet=force' it boots pretty reliably.
Infrequent lockups during boot of the kernel, and infrequent lockups post boot.
My lockups may be thermal related, I'm not entirely sure.
The above kernel boot paramaters permit usage of the APIC, which is essential to
having proper functioning IRQs.
The linux kernel developers are supposedly working on a hpet based nohz
implimentation, but since this particular CPU type on laptops seems to be the
only one affected, nobody seems to want or have any interest in fixing this.
So in the mean time, these laptops run full tilt, at around 56c. Every minute
or so, the fans kick in, and bring it down to around 50c. Lather, rince, repeat.
If C1E p-state is disabled, the system runs with nohz enabled, but it still runs
at full clock speed and gets rather warm.
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
Confirmed still broken in FC9-final.
Confirmed working as previously described in FC9 using FC8
This sounds like another instance of the 'ati chipset has timer problems' bug
that we've seen a few flavours of. If my guess is right, the good news is that
.26rc seems to have fixed this for me, so when we either backport .26 to F9, or
identify the changeset(s) for backport, we'll have this fixed.
Is that the kernel you're talking about?
I can no longer help with this bug as the laptop I have doesn't seem to be
affected. (The old one died.)
(In reply to comment #37)
> This sounds like another instance of the 'ati chipset has timer problems' bug
> that we've seen a few flavours of. If my guess is right, the good news is that
> .26rc seems to have fixed this for me, so when we either backport .26 to F9, or
> identify the changeset(s) for backport, we'll have this fixed.
This is nvidia MCP51/C51 chipset. My HP TX1000 is affected too...
Trying out kernel-2.6.27-0.208.rc1.git2.fc10.src.rpm, to see if it works at all. Just had to replace the HD on the laptop that has this problem, will report back in a few days.
kernel-2.6.27-0.208.rc1.git2.fc10.src.rpm doesn't help any. Still random lockups.
Is there something I can do to help figure out what is causing the lockups? The system locks up hard, sysrq doesn't help any. If a system-wide timer isn't working, how can I debug that?
(In reply to comment #42)
> kernel-2.6.27-0.208.rc1.git2.fc10.src.rpm doesn't help any. Still random
> Is there something I can do to help figure out what is causing the lockups?
> The system locks up hard, sysrq doesn't help any. If a system-wide timer isn't
> working, how can I debug that?
Try adding 'io_delay=0xed' to the kernel boot options.
That notebook is not in the table for the IO delay quirk, but it should be.
(In reply to comment #43)
> Try adding 'io_delay=0xed' to the kernel boot options.
any other ideas? Or is this something that is simply going to have to take development time on the part of the mainstream kernel developers?
(In reply to comment #45)
> any other ideas? Or is this something that is simply going to have to take
> development time on the part of the mainstream kernel developers?
Some people report that adding "nolapic_timer" to the boot options helps.
The only way this laptop is usable is to never turn it off. Otherwise it will randomly lock up on boot still, as I described above.
nohz still does not work, which makes the cpu run full tilt and get quite toasty.
The logs attached to this Bug seem to indicate a failure in the TSC code and also a failure in the cpufreq code. Can we debug both problems a little more? Both issues are big problems.
First, does the problem still exist if the CPU's run at a static freq (i.e. so cpufreq doesnt initiate a frequency transition at any point)? Set the governor in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor to 'performance' so that the system runs at a static CPU freq. Let the system run for a while. See if the problem still occurs.
Second, does this problem still exist if 'notsc' is used as a boot arg? This should help determine if lack of TSC synchronization is partly to blame.
Brian - 99% of the problem occurs on boot. I'm not entirely sure why, but intermittantly the laptop locks up solid. After hard power cycling a random number of times, the kernel manages to get past the point it gets stuck. Once the system is booted, it stays up. That's why I try not to turn it off.
Rignt now I've got it "booting" (a relative term) with io_delay=0xed clocksource=hpet hpet=force nolapic_timer
The only options that seem to help so far are the hpet and lapic timer otions.
Nov 3 22:01:14 xrap kernel: Linux version 2.6.27-0.244.rc2.git1.c1e.fc9.i686 (warewolf@xrap) (gcc version 4.3.0 20080428 (Red Hat 4.3.0-8)
(GCC) ) #1 SMP Thu Aug 28 01:27:13 EDT 2008
Nov 3 22:01:14 xrap kernel: Kernel command line: ro root=/dev/encrypted/Eroot io_delay=0xed clocksource=hpet hpet=force nolapic_timer
Nov 3 22:01:14 xrap kernel: Clocksource tsc unstable (delta = -97636198 ns)
It looks like the system detects it's hosed, and decides not to use it.
[root@xrap clocksource0]# cat available_clocksource
hpet acpi_pm jiffies tsc
[root@xrap clocksource0]# cat current_clocksource
This laptop has no serial ports, so I have little to no chance of getting a console log, unless I hand transcribe it.
Point me at a kernel, and I'll do everything I can to get to the bottom of this issue.
Thanks for the info on TSC. Looks like thats another thing that has to get fixed.
Could you try building a kernel with cpufreq disabled (i.e. not built into the kernel) and see if that makes a difference? It would be nice to determine if this is cpufreq related. I want to try an eliminate the most obvious points of failure first. Im suspicious of cpufreq being I saw some cpufreq related BUG output in one of the logs attached to this BZ. It might not be related though.
Most helpful though would be a stacktrace output from lockup so we can figure out where exactly things are breaking. That would save a lot of debugging time. Take a digital pic of the screen with a camera and attach it to this Bug if its possible.
Created attachment 325634 [details]
photo of laptop screen vga=0x0f07 of nocpufreq kernel (no other cmdline args)
kernel-188.8.131.52-117.fc10.src.rpm kernel w/ modified config to disable cpufreq, including acpi based cpufreq. This was with the command line args vga=0x0f07, and no other arguments.
I just realised I didn't mention the above screen shot was of the laptop locking up. It does sort-of look like I snapped a photo mid-boot, heh.
So there is no stack trace really, it just hangs during boot?
Yep, that's it exactly.
You want me to try installing FC10 on this laptop? .. I'm at a loss for what to do.
Trying a newer kernel could help establish if this is fixed in a newer kernel. If this is the case the fix can easily be backported.
I updated this box last night and the latest fedora 9 kernel (184.108.40.206-73.fc9.i686) doesn't fix it. I still have to try to boot it multiple times with nolapic nolapic_timer, and randomly one out of ten it boots successfully.
Do you know an AMD kernel dev that I could hop on IRC with or something? I get the feeling some interactive "okay try this" debugging may help.
In the mean time, I'll pull down the latest fedora 10 kernel and try that, but I'm not expecting it to work :(
alright update; and this may be something you can roll with. I should have left the system sit for a minute, because it took a moment for the BUG to finally occur.
Fedora 10 kernel BUGs out with this (hand transcribed):
ACPI: processor limited to max C-state 1
BUG: spinlock lockup on CPU#0, swapper/0, c080766c (Not tainted)
PID: 0, comm: swapper Not tainted 220.127.116.11-159.fc10.i686.debug #1
[<c06bd3e5>] ? printk+0xf/0x12
I'm going to try playing with acpi kernel options now.
The problem still exists in the fedora 11 alpha. :(
It looks like ubuntu's 2.6.27-11-generic works just fine on this laptop, and even drops (correctly) into C1 state thus solving my long lasting power problem.
After a year of trying to help, I've lost interest. Good luck. I'm switching to Ubuntu, becuase it -just works-.
For those keeping count, that's three people of the community who have given up. Ofcourse, I'm not counting people with @redhat.com e-mail addresses.
This message is a reminder that Fedora 9 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 9. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '9'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 9's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 9 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.