Description of problem: 07:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8040T PCI-E Fast Ethernet Controller (rev 12) kernel-2.6.36-0.27.rc5.git6.fc15.x86_64 says: Sep 26 21:11:08 laptop14 kernel: sky2: driver version 1.28 Sep 26 21:11:08 laptop14 kernel: sky2 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Sep 26 21:11:08 laptop14 kernel: sky2 0000:07:00.0: unsupported chip type 0xff Sep 26 21:11:08 laptop14 kernel: sky2 0000:07:00.0: PCI INT A disabled Sep 26 21:11:08 laptop14 kernel: sky2: probe of 0000:07:00.0 failed with error -95 kernel-2.6.36-0.24.rc5.git0.fc15.x86_64 says: Sep 26 21:28:28 laptop14 kernel: sky2: driver version 1.28 Sep 26 21:28:28 laptop14 kernel: sky2 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Sep 26 21:28:28 laptop14 kernel: sky2 0000:07:00.0: Yukon-2 FE+ chip revision 0 Sep 26 21:28:28 laptop14 kernel: sky2 0000:07:00.0: eth0: addr 00:1e:68:63:4d:74 Sep 26 21:29:05 laptop14 NetworkManager[1211]: <info> (eth0): new Ethernet device (driver: 'sky2' ifindex: 2) Sep 26 21:29:05 laptop14 kernel: sky2 0000:07:00.0: eth0: enabling interface Sep 26 21:29:06 laptop14 kernel: sky2 0000:07:00.0: eth0: Link is up at 100 Mbps, full duplex, flow control both Version-Release number of selected component (if applicable): kernel-2.6.36-0.27.rc5.git6.fc15.x86_64 How reproducible: Tried twice Steps to Reproduce: 1. Boot... 2. 3. Actual results: Ethernet not working Expected results: Additional info:
We do not have any sky2 changes between 2.6.35 and 2.6.36-rc6, so this problem is caused by some other subsystem change - probably PCI. It's hard to say what is broken. Unfortunately some work from you will be needed to fix that problem. You have to compile the kernel, and if still not work perform bisection to find commit that broke driver. Firstly please clone current linus git tree > git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Then you have to build it. To install all needed tools do (as root) > yum-builddep kernel You can use fedora kernel config to complile kernel. But better is to customise config to remove unneeded options/drivers to speed up kernel compilation, but remember that kernel still need to boot and run on your machine. If don't want to customise config, use fedora config like in example below: > $ cp /boot/config-2.6.36-0.27.rc5.git6.fc15.x86_64 linux-2.6/ > $ cd linux-2.6/ > $ make oldconfig Then compile, to speed up use -j Number_of_processors you have i.e > $ make -j 3 Then install (as root): > $ make modules_install > $ make install Then boot the compiled kernel. If problem is fixed, that will mean some fedora patches broke driver or problem was upstream and is now fixed. If problem still occurs, perform bisection by "git bisect" between last known working commit (i.e. 2.6.36-rc4) and HEAD. Bisection is described here: http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html You need to compile and boot kernel at every step, something about 14 steps will be needed.
Sorry, didn't see the above response until today. But with yesterday's kernel (kernel-2.6.36-0.28.rc6.git0.fc15.x86_64) both sky2 and iwlagn are broken the same way. Sep 30 08:59:34 laptop14 kernel: sky2: driver version 1.28 Sep 30 08:59:34 laptop14 kernel: sky2 0000:07:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Sep 30 08:59:34 laptop14 kernel: sky2 0000:07:00.0: unsupported chip type 0xff Sep 30 08:59:34 laptop14 kernel: sky2 0000:07:00.0: PCI INT A disabled Sep 30 08:59:34 laptop14 kernel: sky2: probe of 0000:07:00.0 failed with error - 95 Sep 30 08:59:34 laptop14 kernel: iwlagn: Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:d Sep 30 08:59:34 laptop14 kernel: iwlagn: Copyright(c) 2003-2010 Intel Corporatio n Sep 30 08:59:34 laptop14 kernel: iwlagn 0000:08:00.0: PCI INT A -> GSI 17 (level , low) -> IRQ 17 Sep 30 08:59:34 laptop14 kernel: iwlagn 0000:08:00.0: Detected Intel(R) Wireless WiFi Link 4965AGN, REV=0xFFFFFFFF Sep 30 08:59:34 laptop14 kernel: iwlagn 0000:08:00.0: Unknown hardware type Sep 30 08:59:34 laptop14 kernel: iwlagn 0000:08:00.0: Unable to init EEPROM Sep 30 08:59:34 laptop14 kernel: iwlagn 0000:08:00.0: PCI INT A disabled Sep 30 08:59:34 laptop14 kernel: iwlagn: probe of 0000:08:00.0 failed with error -2
BTW, I need to copy: $ cp /boot/config-2.6.36-0.27.rc5.git6.fc15.x86_64 linux-2.6/.config for the above recipe to work (yes, noticed too late ;-)
Got lucky... v2.6.36-rc6-6-g4193d91 works (both sky2 and iwlagn). Thanks!
Annoyingly, with this kernel now the speaker beeps (for example when a bash completion isn't unique). I suppose _that_ was broken before ;-)
This is probably caused by the PCI patches I added that are queued for 2.6.37. Does booting with "pci=nocrs" fix the problem? Can you post boot logs from older working kernels and the new failing one?
Created attachment 450929 [details] Boot log for kernel-2.6.36-0.24.rc5.git0.fc15.x86_64 (works)
Created attachment 450930 [details] Boot log for kernel-2.6.36-0.27.rc5.git6.fc15.x86_64 (broken)
Created attachment 450931 [details] Boot log for kernel-2.6.36-0.28.rc6.git0.fc15.x86_64 (broken)
Tried booting with pci=nocrs, same result.
Below is link to rawhide kernel build with removed patches: > pci-v2-1-4-resources-ensure-alignment-callback-doesn-t-allocate-below-available-start.patch > pci-v2-2-4-x86-PCI-allocate-space-from-the-end-of-a-region-not-the-beginning.patch > pci-v2-3-4-resources-allocate-space-within-a-region-from-the-top-down.patch > pci-v2-4-4-PCI-allocate-bus-resources-from-the-top-down.patch http://koji.fedoraproject.org/koji/taskinfo?taskID=2506125 Does it also work?
That one does work. Just booted it, have eth0 and wlan0.
Created attachment 451138 [details] Boot log for kernel-2.6.36-0.30.rc6.git0.fc15.x86_64 (broken) kernel-2.6.36-0.30.rc6.git0.fc15.x86_64 is again broken. Just checked 2.6.36-0.30.rc6.git0.bz637647.fc15.x86_64 again, that one _does_ work (running it right now, in fact).
We do not fix the bug yet. 2.6.36-0.30.rc6.git0.bz637647 was just test kernel to prove where the problem is. I'm not sure if we will remove these four broken pci-v2-* patches or will try to fix them (for sure we need to report problem to patches author).
OK. Do the patches make sense each one separately? Were do they come from?
Still the same with kernel-2.6.36-0.35.rc7.git0.fc15.x86_64. I guess it will be vanilla kernels for me from here on... Am I the *only* one to see this? This is a Toshiba Satellite Pro U400 notebook. It seems my Samsung N210 netbook is not affected.
The patches that caused the problem are from here: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c49
Thanks for pointing me at this bugzilla, Chuck. Horst, could you please try a boot with the "pci=use_crs" options and attach the dmesg log and the contents of /proc/iomem? (The other logs look like they came from somewhere else; they're missing the KERN_DEBUG output.) Apparently the BIOS did configure the sky2 and iwlagn devices because the broken kernel log shows this: pci 0000:07:00.0: BAR 0: trying firmware assignment [mem 0xf0200000-0xf0203fff 64bit] pci 0000:08:00.0: BAR 0: trying firmware assignment [mem 0xf0300000-0xf0301fff 64bit] but left the bridge windows leading to them disabled. The working 2.6.36-0.24 kernel assigned space for the windows from the available area at [mem 0xc0000000-0xdfffffff] and then moved the sky2 and iwlagn devices into the windows: pci 0000:00:1c.4: BAR 14: assigned [mem 0xc1000000-0xc11fffff] (a mem window) pci 0000:07:00.0: BAR 0: assigned [mem 0xc1000000-0xc1003fff 64bit] pci 0000:00:1c.5: BAR 14: assigned [mem 0xc1400000-0xc15fffff] (a mem window) pci 0000:08:00.0: BAR 0: assigned [mem 0xc1400000-0xc1401fff 64bit] The broken 2.6.36-0.35 kernel failed to assign space for the bridge windows so it left them disabled. Disabling the windows means we can't allocate space for the devices behind the bridge either, so we fell back to the original BIOS assignments, which still don't work because the bridge window is still disabled. The question is why we couldn't allocate window space. There should be plenty of space available. Maybe the /proc/iomem will have a clue.
Created attachment 452589 [details] update iomem_resource end One thing that's wrong is that on x86, we statically initialize iomem_resource to [mem 0x00000000-0xffffffffffffffff] (the entire 64-bit physical address space) and never update it based on the CPU capabilities. My patches make us allocate from the top-down, but of course no current x86 CPU supports a full 64-bit physical address space, so the end of that range, which we assigned to a 1c.0 bridge window, is useless: pci 0000:00:1c.0: BAR 15: assigned [mem 0xffffffffffe00000-0xffffffffffffffff 64bit pref] I don't think this patch will fix the sky2 and iwlagn problems, but at least we shouldn't assign this useless window.
Created attachment 452593 [details] fix resource 64-bit wrap I think I see the problem. The resource allocator doesn't handle the case where a child ends exactly ~0, because it looks for space after the child and computes ~0 + 1, which equals 0. This makes it mistakenly hand out space that may already be in use. So the previous patch probably *will* fix sky2 and iwlagn, because it prevents the case where a resource ends at ~0 by restricting iomem_resource to end earlier. But we should also do something like this patch to fix the allocator in general. My allocator changes (the ones referenced in comment 17) haven't been merged upstream yet, so I'll probably incorporate these two patches into the series and repost it so that upstream never sees this problem. It would be very helpful if we could test these two fixes on this machine first to make sure they actually fix the problem.
Exactly which patches should I apply? To the vanilla kernel or the Fedora patched one? BTW, this can't be a "all 64 bit problem", I've got an assortment of 64 bit machines and only one of them shows this problem. Sure, they have other eth/WiFi controllers, but the above discussion sounds like "(almost) all PCI is broken". [I don't want to waste my/your time here, I'm quite comfortable building my own kernels and fooling around with git]
If you could apply the patches from comment 19 and comment 20 to the Fedora kernel, I think that would be what we want. I'm also going to send you the complete updated series against upstream via email. If it's convenient for you to test that, that would be even better. It's not really that all PCI is broken. To hit this, you need these: - Machine old enough that we don't turn on "pci=use_crs" automatically - A device behind a bridge, where the BIOS left the bridge disabled Most machines won't have the second situation, so they won't see the problem.
OK, applied in turn: 189182 189232 189242 189252 as discussed in comment 17, and then the patches in comments 19 and 20 to vanilla 2.6.36-rc8. Compiled clean (a bunch of warnings, unrelated AFAICS), result crashes on boot (somewhere in the read(2) system call, didn't get the whole backtrace on screen in any case). Currently compiling plain 2.6.38-rc8. My earlier vanilla kernel was v2.6.36-rc7-199-gae42d8d, works fine.
Sorry, Linus sneaked in a commit. The vanilla kernel I'm running now (and which I patched for comment 23) is v2.6.36-rc8-1-g8fd01d6.
Comment on attachment 452593 [details] fix resource 64-bit wrap This fix doesn't work. Current version of this series starts here: http://marc.info/?l=linux-pci&m=128709830705469&w=2
(In reply to comment #25) > This fix doesn't work. Current version of this series starts here: > http://marc.info/?l=linux-pci&m=128709830705469&w=2 Those patches are now in kernel-2.6.36-0.41.rc8.git5.fc15
kernel-2.6.36-0.40.rc8.git0.fc15.x86_64 does work fine here...
(In reply to comment #27) > kernel-2.6.36-0.40.rc8.git0.fc15.x86_64 does work fine here... That version should have the same bug as the previous one. The only change that went in there was to use _CRS by default.
OK, now I'm confused :-) From comment 26, I thought kernel-2.6.36-0.41.rc8.git5.fc15 included the v4 patches from the series at http://marc.info/?l=linux-pci&m=128709830705469&w=2 . If that's true, I expect that kernel to work, because it has all the known issues fixed.
Something changed between 0.40 and 0.41 that broke pcmcia on my old thinkpad 770Z, and adding "resource_alloc_from_bottom" when booting 0.41 makes it work again. I put a partial dmesg diff in bz #646027
It hasn't happened again since comment #27.