Bug 746097
Summary: | 3.1.0 kernel will not boot with ACPI enabled on Thinkpad T510 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David L. Crow <crow> | ||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | high | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 16 | CC: | gansalmon, itamar, jfeeney, jonathan, kernel-maint, madhu.chinakonda, mike.reid, redhat, stefan.hoelldampf, tomi.ollila | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | x86_64 | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2012-02-23 23:01:43 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
David L. Crow
2011-10-13 21:10:46 UTC
Created attachment 528102 [details]
Output from "cat /proc/cpuinfo"
Created attachment 528103 [details]
Output from acpidump
Created attachment 528105 [details]
screen shot of last kernel messages
If you install kernel-debug, do you get a trace instead of just a hang? After many tries, I can't get the boot to hang with the debug kernel. I'll continue to try, but in the mean time, I'll attach the dmesg output after booting with the debug kernel in case it is useful. Created attachment 528135 [details]
Output of dmesg command after boot with debug kernel.
if you can get the normal kernel to still hang, try booting with initcall_debug (and remove 'quiet'). This should tell you the last function we entered before the kernel hangs. Actually, the screenshot in attachment 528105 [details] is with initcall_debug enabled and the normal kernel. I did just repro again and had the exact same screen.
Still no luck in getting the debug kernel to fail :-(.
I assume adding clocksource=hpet works? Sorry for the delay in responding. The only change when adding clocksource=hpet is that the "Switching to clocksource tsc" line is not printed. Otherwise it still hangs at the same place. I have updated to the 3.1.0-0.rc10.git0.1.fc16.x86_64 kernel and the behaviour is exactly the same. The debug kernel works every time and the non-debug kernel hangs about 80-90% of the time. Can you get a backtrace by using the sysrq key? Add "sysrq_always_enabled" to the boot options and try hitting alt-sysrq-p for a dump of the current CPU state and/or alt-sysrq-l to show all CPUs: * How do I use the magic SysRq key? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On x86 - You press the key combo 'ALT-SysRq-<command key>'. Note - Some keyboards may not have a key labeled 'SysRq'. The 'SysRq' key is also known as the 'Print Screen' key. Also some keyboards cannot handle so many keys being pressed at the same time, so you might have better luck with "press Alt", "press SysRq", "release SysRq", "press <command key>", release everything. See also: http://en.wikipedia.org/wiki/Magic_SysRq_key I'm now on 3.1.0-1.fc16.x86_64 . The good news (I guess) is that the failure is happening less. I am now successfully about 40% of the time. I still do not know what is different between success and failure. The bad news is that the magic SysRq key is not working. Once booted, it works just fine, so I feel confident I have the magic keyboard incantation (while holding down alt, hold fn, press sysrq, release fn, then press <command key>) down. Perhaps this is not enabled this early in the boot? Created attachment 530528 [details]
another screen shot of boot at hang point
In one of the failed boots today, the screen output was a little different. Two of the output lines that were previously about 5-10 lines up were at the bottom as is shown in the attachment.
I'm not if that means anything or provides any clues.
*** Bug 756154 has been marked as a duplicate of this bug. *** *** Bug 768133 has been marked as a duplicate of this bug. *** I'm having similar problems with a Lenovo T510. However, it turns out that if I leave it long enough it works: Booting: 3.1.5-6.fc16.i686.PAE (which is in updates-testing) from dmesg: ... [ 1.178143] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONF IG_ACPI_PROCFS_POWER cleared [ 1.178412] ACPI: Battery Slot [BAT0] (battery present) [ 1.521036] isapnp: No Plug & Play device found [ 1.521293] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled [ 1.522723] Non-volatile memory driver v1.3 [ 1.522891] Linux agpgart interface v0.103 [ 1.523298] tpm_tis 00:0b: 1.2 TPM (device-id 0x0, rev-id 78) [ 2.115995] Refined TSC clocksource calibration: 2526.999 MHz. [ 2.116178] Switching to clocksource tsc [ 121.457029] tpm_tis 00:0b: Operation Timed out ... So it actually works, but takes a VERY long time to time-out... Are all of you seeing the same thing that is reported in comment #16? If so, this sounds like bug 733964 Sure enough, that is what I see [ 1.029206] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared [ 1.029418] ACPI: Battery Slot [BAT0] (battery present) [ 1.030659] serial 0000:00:16.3: PCI INT B -> GSI 17 (level, low) -> IRQ 17 [ 1.051314] 0000:00:16.3: ttyS0 at I/O 0x1800 (irq = 17) is a 16550A [ 1.060647] Non-volatile memory driver v1.3 [ 1.060789] Linux agpgart interface v0.103 [ 1.061093] tpm_tis 00:0b: 1.2 TPM (device-id 0x0, rev-id 78) [ 1.930427] Refined TSC clocksource calibration: 2659.999 MHz. [ 1.930573] Switching to clocksource tsc [ 120.760356] tpm_tis 00:0b: Operation Timed out [ 120.784825] loop: module loaded [ 120.785014] ahci 0000:00:1f.2: version 3.0 [ 120.785023] ahci 0000:00:1f.2: PCI INT B -> GSI 16 (level, low) -> IRQ 16 [ 120.785203] ahci 0000:00:1f.2: irq 41 for MSI/MSI-X [ 120.785238] ahci: SSS flag set, parallel bus scan disabled [ 120.785432] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 3 Gbps 0x33 impl SATA mode [ 120.785641] ahci 0000:00:1f.2: flags: 64bit ncq sntf ilck stag pm led clo pio slum part ems sxs apst [ 120.785849] ahci 0000:00:1f.2: setting latency timer to 64 I apologize that I did not have the patience to wait. Just to keep up-to-date, I am now running kernel-3.1.5-6.fc16.x86_64 . Regarding whether this is a duplicate, I never had a problem in Fedora 15 and the problem only started when Fedora 16 alpha/beta moved to the 3.1 kernel. The 3.0 kernel never showed problems for me. As described for bug 733964, I have tried adding: tpm_tis.interrupts=0 to the boot options. This does seem to be helpful. Sometimes it boots immediately, but sometimes not. I can't see a pattern in which boots are slow after trying various combinations of restart, cold boot, with external power, with only battery power... Can you try adding tpm_tis.itpm=1 to the kernel command line and seeing if that helps. If not, can you try booting with nohz=off and see if the hang goes away? The TPM driver is doing some weird stuff and I'd like to see if this is a problem with the iTPM probe function, or something more general. I added tpm_tis.itpm=1 to my Lenovo T510 and it's booted 5 times now without hanging using a variety of battery power/mains power/warm restart/cold boot. Any other info you need? (In reply to comment #22) > I added tpm_tis.itpm=1 to my Lenovo T510 and it's booted 5 times now without > hanging using a variety of battery power/mains power/warm restart/cold boot. > > Any other info you need? I don't think so. The problem is that the tpm driver is built into the kernel in f15/f16 and on this particular machine it goes off and probes for an iTPM because it's detected via ACPI. Except for some reason nothing returns data and the driver sits there for up to 2 minutes waiting for it to respond. This timeout seems rather excessive, and it probably should be blacklisted (or something) on these machines anyway. For rawhide, we actually switched this to a module, where it can be loaded and hang as long as it wants without stalling the boot. I'll think of what to do for F15/F16. I've also emailed the upstream maintainers to see if there are other options or debug to pursue. OK, thanks. Of course for now, from my point of view the problem is "solved" (in that I don't have to wait 2 minutes to boot). Just let me know if you want me to test something else. try the build at http://koji.fedoraproject.org/koji/buildinfo?buildID=279607 (without any boot parameters, it contains a patch which should make it just do the right thing by default). OK, booted twice OK on my Lenovo T510. Can see a slight pause when it hits the point where it used to hang, as you can see from this dmesg extract: ... [ 1.974235] Refined TSC clocksource calibration: 2526.999 MHz. [ 1.974411] Switching to clocksource tsc [ 3.414125] loop: module loaded ... And I guess this is part of what was causing the problem: ... [ 3.625251] IMA: No TPM chip found, activating TPM-bypass! ... I had the same problem -- boot hang for ... 2 minutes (?) after 'loading initial ramdisk' and then said something about timeout. Now I yum updated by F16. kernel 3.1.6-1.fc16.x86_64. Now it stopped for a while (for 1 second (or 2) if what I paste below is what happened: ... [ 0.994495] tpm_tis 00:0b: 1.2 TPM (device-id 0x0, rev-id 78) [ 1.922197] Refined TSC clocksource calibration: 2393.999 MHz. [ 1.922204] Switching to clocksource tsc [ 3.021751] loop: module loaded [ 3.021836] ahci 0000:00:1f.2: version 3.0 ... [ 3.077076] IMA: No TPM chip found, activating TPM-bypass! ... This is definite improvement (presumed it lasts...) Thanks! I haven't seen this problem in a while, so as far as I am concerned, this bug can be closed. |