Bug 665109
Summary: | e100 problems on old Compaq Proliant DL320 | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | joshua | ||||||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | medium | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | 14 | CC: | bjorn.helgaas, dwmw2, gansalmon, itamar, jonathan, kernel-maint, kmcmartin, madhu.chinakonda | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | kernel-2.6.35.14-96.fc14 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2011-09-06 23:58:40 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Created attachment 470285 [details]
sosreport from the machine
We would love to use F14 and future Fedora releases on these machines... please fix! Try upgrading to the rawhide 2.6.37-rc6 kernel and let us know if it's resolved there? Is there any way you can capture the full oops? The photo attached only shows the tail end of it. Trying http://download.fedora.redhat.com/pub/fedora/linux/development/rawhide/i386/os/Packages/kernel-2.6.37-0.rc7.git0.1.fc15.i686.rpm now.... Ok... that didn't work, but didn't cause an EIP. I have dmesg output from both the F14 and rawhide kernels. dmesg.f14 contains the full EIP info that you were looking for... not sure what dmesg.rawhide contains, but it does have several lines about how it is unhappy with the PCI device base address for the e100 NICs. Created attachment 470321 [details]
dmesg output from the EIP'ing F14 kernel
Created attachment 470322 [details]
dmesg output from the rawhide kernel
Thanks, if you boot with "e100.use_io=1" on the kernel cmdline does that help? I'll pull the debuginfo and try to figure out where it's dying. e100.use_io=1 doesn't change anything for the rawhide or the F14 kernels Well, that's odd. Can you attach: "sudo lspci -vvnn -s 0000:01:03.0" (which should be the ethernet card.) It looks like the BARs are not being set up correctly (resulting in a null ptr deref when it tries to use them in 2.6.35, but is being correctly handled somehow in 2.6.37...) # sudo lspci -vvnn -s 0000:01:03.0 01:03.0 Ethernet controller [0200]: Intel Corporation 82557/8/9/0/1 Ethernet Pro 100 [8086:1229] (rev 08) Subsystem: Compaq Computer Corporation NC3163 Fast Ethernet NIC (embedded, WOL) [0e11:b134] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 66 (2000ns min, 14000ns max), Cache Line Size: 32 bytes Interrupt: pin A routed to IRQ 21 Region 0: Memory at d0200000 (32-bit, non-prefetchable) [size=4K] Region 1: I/O ports at b000 [size=64] Region 2: Memory at d0000000 (32-bit, non-prefetchable) [size=1M] [virtual] Expansion ROM at 40100000 [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME- Kernel driver in use: e100 Kernel modules: e100 Ah, nevermind, I can see that in the dmesg from 2.6.37... Looks like it can't claim the resources and then decides to fail. I'll try to fix 2.6.35 to not misbehave and to at least handle it the way .37 does... If you boot with pci=use_crs does your e100 work? What about pnpacpi=off? pnpacpi=off doesn't change anything for either kernel (rawhide or F14), but pci=use_crs does in fact do the trick for both! Is this a "bug", or a "feature"? It would be nice to have things just work in F14 without having to specify this... like in F13 Yeah, we don't enable CRS on BIOS' older than 2008 because they were buggy (I assume.) I'll point Bjorn at this, hopefully he can shed some light on it. I'm on vacation until Jan 4, but will look in more detail then. Ugh, what a mess. I think arch/x86/pci/broadcom_bus.c is screwing things up here. IIRC, that was added for machines where we don't have ACPI, so it was the best we could figure out. But in this case, we *do* have ACPI, and it's telling us more reliable stuff than broadcom_bus.c is. We don't enable pci=use_crs automatically on machines before 2008 just out of paranoia about the reliability of ACPI _CRS. But it's only fear, not any real data, behind that date. We could easily add a quirk (see arch/x86/pci/acpi.c) to turn it on for this specific machine. Or, if we felt daring, we could adjust or remove that 2008 date for enabling pci=use_crs automatically. It would be nice to have some data showing that Windows relies on it on boxes this old (e.g., an Everest report or something). Interesting. I can say that F13 which apparently doesn't have this phobia about old BIOSes works perfectly, right out of the box, though I can't provide over-arching statistics about all/most/other older machines. The broadcom_bus.c file didn't get introduced until after 2.6.34 was released. :/ Created attachment 471767 [details]
ignore broadcom_bus.c if machine supports ACPI
The problem is that broadcom_bus.c discovers bogus windows on this machine (it doesn't know how to discover io windows, and it looks like it should ignore the upper 32 bits of some of the mem windows). This machine is older than 2008, so we don't use ACPI _CRS information, and bus_numa.c uses the faulty information from broadcom_bus.c.
We're in the gray area of pre-2008 machines with ACPI. I think the possibilities are:
1) Fix broadcom_bus.c. We don't have documentation to do a complete job of this, and there's no reason to think the result would be better than using the generic ACPI driver.
2) Ignore _CRS and make broadcom_bus.c do nothing. This gets us back to the working situation of F13. Since we don't have any host bridge information, things like PCI hotplug and option ROM mapping may not work, but that's the way it's always been on these boxes.
3) Turn on _CRS, either just for this machine with a DMI quirk or for a whole class of machines. This should make hotplug work, but changing lots of machines is risky.
I think (2) is the safest, and it will likely fix other CNB20LE-based systems as well. That's what this patch should do.
The patch in comment #19 went upstream in 2.6.38 with commit 30e664afb5cb597dd6f7651e6d116e10b9741084 Joshua, are you still using F14 or have you moved on to f15/f16 at this point? I'm using Fedora 15 now, which doesn't need the pci=use_crs work around. That said, shouldn't F14 pick up this change from F15, since we know F15 works on these older systems, just like F13 did? (In reply to comment #21) > I'm using Fedora 15 now, which doesn't need the pci=use_crs work around. Excellent. > That said, shouldn't F14 pick up this change from F15, since we know F15 works > on these older systems, just like F13 did? Yes. I have it prepped to go into the F14 kernel. I asked to make sure the upstream commit worked for you. Thanks for letting us know! kernel-2.6.35.14-96.fc14 has been submitted as an update for Fedora 14. https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14 Package kernel-2.6.35.14-96.fc14: * should fix your issue, * was pushed to the Fedora 14 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-2.6.35.14-96.fc14' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/kernel-2.6.35.14-96.fc14 then log in and leave karma (feedback). Yes, this works on my old servers without the need for pci=use_crs. Thank you! kernel-2.6.35.14-96.fc14 has been pushed to the Fedora 14 stable repository. If problems still persist, please make note of it in this bug report. |
Created attachment 470284 [details] picture of the boottime e100_probe EIP Description of problem: F13 i386 works perfectly fine, but F14 i386 can't initialize the e100 cards on my old Compaq Proliant DL320s. Screenshot attached. Version-Release number of selected component (if applicable): Any F14 i386 kernel How reproducible: Try the installer for F14, or install F13, and install any F14 kernel