Bug 804351 - Change since 3.3.0-0.rc7.git1.2 causes kernel hang on boot
Summary: Change since 3.3.0-0.rc7.git1.2 causes kernel hang on boot
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 17
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: John Feeney
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-03-18 02:09 UTC by Vallimar
Modified: 2013-01-10 07:44 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-04-30 15:52:59 UTC
Type: ---


Attachments (Terms of Use)
Section of git patchset causing kernel hang (457 bytes, patch)
2012-03-18 02:09 UTC, Vallimar
no flags Details | Diff
Output of dmesg | grep aspm (595 bytes, text/plain)
2012-03-18 02:10 UTC, Vallimar
no flags Details
dmesg dump (66.91 KB, text/x-log)
2012-03-18 22:08 UTC, Andy Lawrence
no flags Details

Description Vallimar 2012-03-18 02:09:20 UTC
Created attachment 570858 [details]
Section of git patchset causing kernel hang

Description of problem:
When choosing the the 3.3.0-rc7 kernels starting with git1 to now,
the boot process hangs after grub has processed. Cannot access
machine or dracut or anything. Have to forcibly reset system and
choose an earlier kernel.

Version-Release number of selected component (if applicable):
Tested these:
  kernel-3.3.0-0.rc7.git1.2.fc17 
  kernel-3.3.0-0.rc7.git2.3.fc17

How reproducible:
Always.

Steps to Reproduce:
1. Turn on pc.
2. Choose kernel from grub
3. Kernel and Init echo lines process.. then nothing.
  
Actual results:
Plymouth splash or bootup text should appear, but system hangs immediately
upon handoff from grub.

Expected results:
Graphics or text should kick in and kernel should boot.

Additional info:
I am booting through EFI. Do not have bios configured to test but
I am uncertain that matters in this case.

Did some comparisons and further testing by removing bits from the
git patches and testing reboots with kernels and I narrowed the cause
of my problem to one specific change, attached.

After seeing this, I did a test by setting the kernel option "pcie_aspm=force"
and I was then able to boot. This was not needed previously. Doing a grep
of 'dmesg' output for aspm from hanging kernel and non-hanging, they are
the same except for the 'force' acknowledgment. Snippet is attached.

In case it matters: My mobo is an ASUS P8Z68-M PRO. Video is attached
to a Nvidia GeForce 520 in the PCIe x16 slot. I did not test with the
integrated graphics.

Comment 1 Vallimar 2012-03-18 02:10:09 UTC
Created attachment 570859 [details]
Output of dmesg | grep aspm

Comment 2 Andy Lawrence 2012-03-18 22:06:51 UTC
I'm having the same issue, however pcie_aspm=force doesn't not allow newer kernels to boot.  I even tried a vanilla 3.3rc7, git pull as of today with no luck.

My MB happens to also be a P8Z68-M PRO, Intel graphics.

System is F17 with all updates as of today.

Comment 3 Andy Lawrence 2012-03-18 22:08:05 UTC
Created attachment 570961 [details]
dmesg dump

Comment 4 Andy Lawrence 2012-03-18 22:13:01 UTC
Clarification:

Newer Fedora kernels do not boot with pcie_aspm=force, however Linus GIT kernels with pcie_aspm=force do.

Comment 5 Josh Boyer 2012-03-19 13:21:43 UTC
There's been an upstream report that commit 4949be16822e92a18ea0cc1616319926628092ee is causing similar boot issues:

http://article.gmane.org/gmane.linux.kernel/1269495

Matthew?

Comment 6 Vallimar 2012-03-19 14:08:51 UTC
Just to be thorough, I tested with the kernel-3.3.0-1.fc17 rpm from koji and can confirm the problem still exists, as Josh indicates in comment #5.

I am still able to workaround using pcie_aspm=force.

Comment 7 Pumpino 2012-03-22 23:26:05 UTC
I too am experiencing this issue with kernel 3.3.0 in Fedora 16 (and Xubuntu and Arch). Hardware is an Asus P8Z68-M Pro motherboard, 8gb Kingston 1600 ram, i5 2500 and a Crucial m4 SSD.

If it's of any assistance, here is the output taken with my camera. http://img600.imageshack.us/img600/4395/p1000229w.jpg

Comment 8 Pumpino 2012-03-22 23:30:34 UTC
I just realised two of you have the same Asus motherboard as me. That can't be a coincidence. Now I'm thinking I should've chosen an Intel board instead of Asus. ;)

Comment 9 Vallimar 2012-03-23 13:23:26 UTC
There is definitely something off with the board, but it seems to work fine for the most part so far. If you want to be able to poll your sensors, you'll need to pass the "acpi-enforce-resources=lax" as well I've discovered, or you are limited to cpu temps. FWIW, I'm running with the latest 3702 firmware and it makes no difference.

Maybe Fedora/RedHat can whitelist this mobo from the recent aspm change or hopefully include an errata/known issues bullet point somewhere.

Comment 10 Peter Levart 2012-04-01 09:58:37 UTC
I have the same board (Asus P8Z68-M Pro) and it worked with latest F16 kernel (3.3.0-4.fc16.x86_64) without problems until I upgraded the mobo bios (from 0402 to 3702). Now I get system panic as soon as kernel starts to boot. (And bios is not allowing me to downgrade!). 

I'm writing this messages while booted from the F17 Aplha USB Live image (3.3.0-0.rc3.git7.2.fc17.x86_64) though.

All F16 kernels and F17 aplha kernel don't boot cleanly on this motherboard though. I have to change the mode of storage controler (from AHCI to IDE) and have to blacklist the pata_acpi driver. But that's another story.

Comment 11 Peter Levart 2012-04-01 10:15:56 UTC
I can confirm that pcie_aspm=force workaround works on latest F16 kernel (3.3.0-4.fc16.x86_64).

Comment 12 Peter Levart 2012-04-01 15:59:24 UTC
And even later F16 kernel (updated today: 3.3.0-8.fc16.x86_64) does not need this pcie_aspm=force workaround any more. It seems that this is fixed now.

Comment 13 Pumpino 2012-04-01 22:36:26 UTC
I haven't tested, but it appears that the issue was fixed in kernel-3.3.0-8.fc16.

* Wed Mar 28 2012 Josh Boyer <jwboyer>
- Fix disabled ASPM regression

Do we know if the patch will be included upstream in 3.3.1?


Note You need to log in before you can comment on or make changes to this bug.