Description of problem: Kernel freezes Version-Release number of selected component (if applicable): Any kernel version released up to the time this bug was filed. How reproducible: Always Steps to Reproduce: 1. Install F8-Test2 2. Turn on. 3. Actual results: The system freezes up Expected results: The system does not freeze Additional info: Freezes at point where displays the following: PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report --this suggestion doesn't help since it is not related. Does not pass this point until the POWER button is tapped. It then continues, but will again freeze after "starting udev". Pressing keys on keyboard will prompt it to continue until X starts. Again, continuously freezes unless keys are pressed or mouse is moved. System: Acer Aspire 9300 CPU: AMD Turion X2-TL50 Workaround: Add kernel parameter "idle=poll". Should I really have to poll my CPU on a laptop?? More info: kernel parameter "nohz=off" does nothing for this problem, so it may not be related to dynamic tics.
"idle=poll" will make the CPU run too hot. And "idle=halt" either doesn't work or was removed as an option, apparently. Can you try a vanilla kernel to see if this is caused by Fedora-specific patches (probably cpuidle)? http://people.redhat.com/cebbert/kernels/F8/x86_64/kernel-vanilla-2.6.23-0.185.rc6.git7.V.x86_64.rpm
Yes, that kernel appears to be working.
Vanilla kernel works, our kernel hangs unless we have idle=poll. And our cpuidle patch is mixed in with the highres-timers patch. What we have in there doesn't match what's in the ACPI git tree for cpuidle, either.
I'm hunting a regression. Will publish a new -hrt queue with a bunch of fixes later today.
0.202.rc8 still hangs on boot without idle=poll.
Just sent out a patch to LKML which is addressing a AMD X2 problem. http://lkml.org/lkml/2007/9/25/343
Kernel option "noapictimer" works. Apparently this is the same as "nolapic_timer" on i386??
yes
Mine still hangs on boot with the c1e patch applied and kernel options: nolapic nohz=off highres=off (pressing power still makes it continue) Freezes randomly without 'nolapic' (solid lockup.) Does not hang at boot with: nolapic nohz=off highres=off noapictimer Clock interrupts seem to be sent all over the place, even to IRQ7 (until disabled as spurious) CPU0 CPU1 0: 392541 11375 XT-PIC-XT timer 1: 0 233 IO-APIC-edge i8042 2: 0 0 XT-PIC-XT cascade 5: 8 5582 IO-APIC-edge sata_nv 7: 9403 90597 IO-APIC-edge ehci_hcd:usb1 8: 0 1 IO-APIC-edge rtc 9: 0 348 IO-APIC-edge acpi 10: 1 177 IO-APIC-edge HDA Intel 11: 23 38555 IO-APIC-edge ohci_hcd:usb2, eth0 12: 0 879 IO-APIC-edge i8042 14: 4 1491 IO-APIC-edge libata 15: 0 0 IO-APIC-edge libata NMI: 0 0 LOC: 11367 392248 ERR: 0 $ cat /proc/cmdline ro root=LABEL=/ nohz=off highres=off acpi_irq_nobalance nolapic noapictimer
Can you provide a boot log of mainline and the above kernel please ?
(In reply to comment #10) > Can you provide a boot log of mainline and the above kernel please ? > Building kernel-vanilla now. Any specific options to boot with?
Standard FC8 config is fine. I look into it tomorrow.
Created attachment 208991 [details] dmesg from kernel 0.211 with noapictimer option Kernel 0.211 has the fixed AMD c1e patch applied (disable_apic_timer is __cpuinitdata, not __initdata. Even with noapictimer, the local apic interrupt on CPU 0 is incrementing.
*** Bug 237325 has been marked as a duplicate of this bug. ***
When booting on battery, system stalls unless I keep pressing keys on the keyboard. This does not happen when plugged in. Kernel is 0.214, from F8T3. Boot options: noapic noirqdebug
*** Bug 316811 has been marked as a duplicate of this bug. ***
This problem appears to be solved in 0.222.
Not fixed in kernel 2.6.23-4 here (with new highres-timers patch.) Still using "noapic noirqdebug" -- kernel hangs and cooling fan spins up until power switch is pressed, then bootup continues normally. Haven't tried booting on battery yet...
My ntb works with "noapictimer" with the latest rawhide kernel (0.224) without problems. Without it it hangs no matter if it's started on battery or not. After pressing power button the bootup continues, but later when some services are started, I need to press some buttons (like shift etc. or move with the mouse after gpm is started) to stop hanging.
Chuck, is nolapictimer working for you as well ? Comment #7 https://bugzilla.redhat.com/show_bug.cgi?id=299031#c7 says it worked for you before. If it works, I can provide some debug patch, which allows us to decode that problem better.
ok, I stared long enough at the code and I found the reason, why the AMD C1E detection works on 32bit and not on 64bit. It's not trivial to fix, but I have a solution in mind already. Patch will follow ASAP.
Created attachment 226751 [details] AMD c1e detection fix Chuck, does this fix your problem ?
(In reply to comment #22) > Created an attachment (id=226751) [edit] > AMD c1e detection fix > > Chuck, does this fix your problem ? > It doesn't hang at boot any more, but now the tickless mode seems to be disabled.
Created attachment 227881 [details] nvidia timer override patch This patch (on top of the new C1E patch) fixes some of my problems with this nVidia C51/MCP51 chipset x86_64 machine. No messages about c1e print anymore and it hangs on boot even with those last c1e fixes, but the spurious interrupts are all gone now. Somehow this patch is making the c1e detection code get skipped, I guess?
I have this patch in my pile of crap already. I take a look at it right now. The tickless mode is disabled due to the C1E detection. Sorry, that's the fallout from broken hardware. We could get away with permanent broadcasting though, but this makes only sense, when we have a working HPET in the system. PIT is so slow to program and for tickless it is just useless due to the small max. next event delta.
(In reply to comment #25) > The tickless mode is disabled due to the C1E detection. Sorry, that's the > fallout from broken hardware. We could get away with permanent broadcasting > though, but this makes only sense, when we have a working HPET in the system. Most new systems have one, I know this one does. (3 channels, 32 bit)
Yeah, I know. I hope that Venki will come up the per cpu HPET code soon, so we can utilize HPET really instead of just broadcasting.
I *think* I can see what is happening with the latest c1e detection code. We've set up local APIC timer interrupts on CPU0 before detecting the problem and apparently aren't tearing them down properly, causing them to be both broadcast and fired by the hardware. If I force noapictimer on the command line with the latest F8 code, everything works fine: CPU1 has 50000 timer interrupts and CPU0 has 50000 local apic interrupts. So now instead of "noapic noirqdebug" I have to use "noapictimer acpi_use_timer_override" (the nVidia quirk code from Andi apparently doesn't work)
Fixed by the -hrt3 patch.