Description of problem: I was unable to boot the Fedora 15 and Fedora 16 installation media. The system would crash following a kernel panic. I was able to distro-sync up to 16 and continue to use the 14 kernel, so I concluded that this is an upstream problem with the kernel. I was able to get my system working in a very crippled manner using pci=noacpi. The keyboard did not work correctly. Key press and release events would be missed. Keystrokes would appear very slowly or not at all and every key was at risk of becoming stuck (including the arrow keys, Page Up, Page Down, Control, Shift, and all other keys). Suspend stopped working. There were problems with IRQs being missing (irq n: nobody cared in the logs followed by call traces). The system seemed to run more slowly (but I didn't perform any benchmarks to prove this). Then I happened upon this document: http://fedoraproject.org/wiki/Common_kernel_problems Under "Can't find installation CD/DVD or hard drives" there is this line: "Try the boot option pci=nocrs on 2.6.34 and later kernels." I'd almost recommend turning this on by default, as it seems to cause no harm whatsoever and without it my hardware becomes unbootable. What I would definitely recommend is suggesting this option if there is some level of success with pci=noacpi in the "Crashes/Hangs" section of that document. I've done extensive searches over the past week and I've seen many instances of people having problems with Dell keyboard hardware. I feel like correct initialization of recent Dell keyboard hardware is a prerequisite of it working correctly and that noacpi causes that not to happen. This lockup issue also seems to be common with Ubuntu. I hope that Ubuntu users with this issue will eventually see my report. Version-Release number of selected component (if applicable): All versions shipped with Fedora 16. [At least] current versions shipped with Fedora 15. How reproducible: Always. Steps to Reproduce: 1. Attempt to boot the kernel with installation media. 2. 3. Actual results: Failure to boot. Expected results: Successful booting and system operation. Additional info:
please attach output of dmidecode from systems that need this option.
Created attachment 549856 [details] Output of dmidecode from a Dell Studio 1536 Here is the output of dmidecode from the Dell Studio 1536, which requires pci=nocrs to boot 3.1.x.
Thanks. I've just added a patch that should make this work without the need for to pass the pci=nocrs. When the build at http://koji.fedoraproject.org/koji/taskinfo?taskID=3609574 finishes, give it a try.
The patch is good. It would fix the Fedora 15 and 16 installation media for this hardware but more importantly it will fix Fedora 17. It's no fun to have to boot the installation media with pci=nocrs or any kernel flag. The machine seems to run better with the patch than with pci=nocrs, but I haven't taken any scientific measurements. Good work!
thanks for testing, I'll get this sent upstream.
John, would you mind attaching the complete dmesg log? I think the log from a boot with the patch or with "pci=nocrs" is sufficient. This will identify the exact problem and help create a fix that's more generic than the blacklist. Turning off _CRS is definitely a workaround for this machine, but it does mean that we can't handle multiple PCI host bridges correctly, so we can't turn off _CRS across the board. This is likely the same problem as https://bugzilla.kernel.org/show_bug.cgi?id=31602, which is on a Dell 1546. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/647043 is another report (with many Ubuntu duplicates).
The Dell Studio 1557 will also need a separate blacklist entry (or the more general fix you would like to create): https://bugzilla.redhat.com/show_bug.cgi?id=769657 kernel.org is not responding right now but beware that the 1546 is a different line of hardware, an Inspiron not a Studio. The Studio line is/was Dell's entry-level hardware. I believe the Inspiron is slightly better than entry level, but the hardware may indeed be similar enough... As far as the Ubuntu bug report goes, the underlying problem might be the same, but the result is that the Fedora 15 and 16 installation media and all shipped Fedora 15 and 16 kernels crash on my hardware during boot (without pci=nocrs). There is no opportunity to test USB or any kernel functionality for that matter. One data point worth mentioning is that I have 4 GB of system memory and I'm using the PAE kernel. I read something in the bug report about a mapping issue of some sort. In any case, I am attaching the output of dmesg with Dave Jones' patched kernel. If you need me to boot with pci=nocrs and attach that dmesg let me know. (The machine is unstable in that mode.) Note that Fedora 14 ran fine on my system. This bug came about one or two kernel revisions before Linus jumped to version 3.x.
Created attachment 550500 [details] dmesg output with Dave Jones' patched 3.1.6-2.fc16 kernel
There are two problems on the 1536, both of which look like BIOS bugs: 1) the BIOS left USB devices outside all the PCI host bridge apertures 2) the apertures include an unreported MMCONFIG area When Linux tries to fix problem 1, it moves the USB devices into the unreported MMCONFIG area, where they don't work. Windows will also fix problem 1 by moving the USB devices, but it chooses a different area that doesn't fall into the MMCONFIG area. The chipset is programmed to claim [mem 0xd0000000-0xdfffffff] as MMCONFIG space, but the ACPI MCFG description only tells the OS about [mem 0xd0000000-0xd3ffffff]. In fact, the rest of that MMCONFIG region is reported as the PCI host bridge aperture [mem 0xd4000000-0xfebfffff]. Here are the relevant parts of your dmesg log: BIOS-e820: 00000000c7e60800 - 00000000d4000000 (reserved) Fam 10h mmconf [d0000000, dfffffff] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xd0000000-0xd3ffffff] (base 0xd0000000) pci_root PNP0A03:00: host bridge window [mem 0x000a0000-0x000bffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0x000d0000-0x000dffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xd4000000-0xfebfffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xfec10000-0xfecfffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xfed00400-0xfedfffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xfee10000-0xff9fffff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0xffc00000-0xffdffbff] (ignored) pci_root PNP0A03:00: host bridge window [mem 0x130000000-0x327ffffff] (ignored) pci 0000:00:12.0: reg 10: [mem 0xffb00000-0xffb00fff] pci 0000:00:12.1: reg 10: [mem 0xffb01000-0xffb01fff] pci 0000:00:12.2: reg 10: [mem 0xffa80000-0xffa800ff] pci 0000:00:13.0: reg 10: [mem 0xffb02000-0xffb02fff] pci 0000:00:13.1: reg 10: [mem 0xffb03000-0xffb03fff] pci 0000:00:13.2: reg 10: [mem 0xffa80400-0xffa804ff] Since this kernel ignores _CRS, we don't move the USB devices. The unpatched F16 & F17 kernels will move them to the area at 0xd4000000, since that appears unused.
> In any case, I am attaching the output of dmesg with Dave Jones' patched > kernel. If you need me to boot with pci=nocrs and attach that dmesg let me > know. (The machine is unstable in that mode.) I don't know what patch is in Dave's kernel, but the patch he posted upstream (https://lkml.org/lkml/2011/12/30/59) should be exactly equivalent to booting with "pci=nocrs". If they behave differently, I'm missing something or there's another issue.
(In reply to comment #10) > > In any case, I am attaching the output of dmesg with Dave Jones' patched > > kernel. If you need me to boot with pci=nocrs and attach that dmesg let me > > know. (The machine is unstable in that mode.) > > I don't know what patch is in Dave's kernel, but the patch he posted upstream > (https://lkml.org/lkml/2011/12/30/59) should be exactly equivalent to booting > with "pci=nocrs". If they behave differently, I'm missing something or there's > another issue. After a few days of testing, they do appear to be equivalent.
Yes, same patch that I posted upstream.
Created attachment 550776 [details] PNP quirk for unreported MMCONFIG space Dave, John, I'd appreciate it if you could test this patch in a kernel without the Dell blacklist. If it works as intended, it should fix the issue on all AMD systems, not just ones in the blacklist.
Dave, can you easily generate a Fedora 16 kernel package to test this change? Thanks!
sure, give me a few hours.
build started. the rpm's will show up at the bottom of this page once they're built. http://koji.fedoraproject.org/koji/taskinfo?taskID=3620090
3.1.7-2 looks good to me. It seems to correctly work around the BIOS bugs.
The official 3.1.7-1 and 3.1.9 were good but 3.2.1-3 now crashes in the same way that all previous did and booting with pci=nocrs is required again. Did Bjorn's change get reverted?
(In reply to comment #18) > The official 3.1.7-1 and 3.1.9 were good but 3.2.1-3 now crashes in the same > way that all previous did and booting with pci=nocrs is required again. > > Did Bjorn's change get reverted? Sigh. Sort of. It was dropped as a patch to the Fedora kernel when we rebased to 3.1.10, because the upstream 3.1.10 release contained that patch already. Then we rebased to 3.2.1, which apparently doesn't contain said patch. It looks like 3.2.2 will include it and that should be released later today, so the next Fedora kernel should have this fixed again. Sorry for the inconvenience.
kernel-3.2.2-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/FEDORA-2012-0949/kernel-3.2.2-1.fc16
kernel-3.2.2-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.