Bug 229099 - No networking with kernel 2930/2932/2936
Summary: No networking with kernel 2930/2932/2936
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: i686
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
: 229519 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-17 00:41 UTC by Guido Ledermann
Modified: 2007-11-30 22:11 UTC (History)
7 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-03-01 18:32:36 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Output of lspci -vvv (19.12 KB, application/octet-stream)
2007-02-21 21:20 UTC, Guido Ledermann
no flags Details
lscpi -vvv with kernel 2914 (19.12 KB, application/octet-stream)
2007-02-21 22:32 UTC, Guido Ledermann
no flags Details
patch-forcedeth-msix (308 bytes, patch)
2007-02-23 20:16 UTC, Andy Gospodarek
no flags Details | Diff
the culprit according to git bisect (2.74 KB, text/plain)
2007-02-26 06:07 UTC, John Reiser
no flags Details

Description Guido Ledermann 2007-02-17 00:41:56 UTC
Description of problem:

After updating the kernel from 2925 to 2930 and 2932 I could not get networking
up and running. I could not ping any other host than myself. When booting with
the older kernel it works fine. No matter if it is a static IP or DHCP.

Version-Release number of selected component (if applicable):
2.6.20-1.2930.fc7
2.6.20-1.2932.fc7

How reproducible:

Anytime.

Steps to Reproduce:
1. Boot with one of the mentioned kernels
2. "ping 192.168.1.1" or something similar.
  
Actual results:
192.168.1.50 Destination host unreachable

Expected results:
ping should work as usual.

Additional info:
dmesg reports forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.59.

Comment 1 Andy Gospodarek 2007-02-20 20:34:41 UTC
Can you atttach the output from lspci -vvv from one of these systems?

Comment 2 Guido Ledermann 2007-02-21 21:20:32 UTC
Created attachment 148537 [details]
Output of lspci -vvv

Comment 3 Andy Gospodarek 2007-02-21 22:08:49 UTC
Was that the output from the working (2925) kernel or one of the non-working
(2930 and 2932) kernels?

Comment 4 Guido Ledermann 2007-02-21 22:18:59 UTC
This was the output with 2932. My next post will be with 2914. I'll be back in a
minute.

Comment 5 Guido Ledermann 2007-02-21 22:32:53 UTC
Created attachment 148545 [details]
lscpi -vvv with kernel 2914

This is the lspci -vvv output where my networking is running.

Comment 6 Mladen Kuntner 2007-02-22 22:11:29 UTC
Have the same problem on
 Giga-byte Technology GA-K8N Ultra-9 Mainboard

Stil not working on 2936 kernel.
Please tell me if i can help somehow.

Comment 7 Andy Gospodarek 2007-02-23 14:56:07 UTC
On these systems where the network doesn't work, does the module not load or
does it load and the network just doesn't work?

Comment 8 Guido Ledermann 2007-02-23 15:01:02 UTC
How can I check this?

Comment 9 Andy Gospodarek 2007-02-23 15:31:47 UTC
Guido,  Don't worry about it for now.  I was able to reproduce this with the
latest kernels, so I'll let you know when I have some more info.


Comment 10 Guido Ledermann 2007-02-23 15:39:46 UTC
Great, thank you, Andy!

Comment 11 John Reiser 2007-02-23 18:18:39 UTC
*** Bug 229519 has been marked as a duplicate of this bug. ***

Comment 12 Andy Gospodarek 2007-02-23 20:16:48 UTC
Created attachment 148711 [details]
patch-forcedeth-msix

This patch was recently posted to netdev and might be worth trying.

"There seems to be an issue when both MSI-X is enabled and NAPI is
configured. This patch disables MSI-X until the issue is root caused.

Signed-Off-By: Ayaz Abdulla <aabdulla>"

Comment 13 John Reiser 2007-02-24 20:25:19 UTC
Applying the patch of the previous Comment #12 to kernel 2942 did not work for
me: still no networking.  DHCP on ipv4 seems to get an address via multi-user
startup scripts, but 'ping' does not work ["Destination Host Unreachable"] from
multi-user mode.  system-config-network thinks eth0 is active, and allows me to
deactivate, but trying to re-activate fails.  I have turned off ipv6 by adding
"alias net-pf-10 off" to /etc/modprobe.conf.  Hardware is nVidia nForce4 CK804.
 Kernel 2914 is my last kernel with operational networking.

Comment 14 Andy Gospodarek 2007-02-25 17:40:52 UTC
Thanks for the update, John.  Since it doesn't appear that patch worked, I'll
continue to work to determine if the gcc used is the issue.  Was your system
fully updated or did you just update the kernel?  Also are you running a 32-bit
or 64-bit kernel?  



Comment 15 John Reiser 2007-02-25 19:34:10 UTC
Hardware is x86_64, but I am running i686 kernel (32-bit only) on it for this
test.  The rest of the system is fully updated to Saturday's fc7 (development
2007-02-24) using yum.

I'm now trying git bisect.  So far:
-----
git-bisect start
# bad: [9654640d0af8f2de40ff3807d3695109d3463f54] Merge
master.kernel.org:/pub/scm/linux/kernel/git/sfrench/cifs-2.6
git-bisect bad 9654640d0af8f2de40ff3807d3695109d3463f54
# good: [bf81b46482c0fa8ea638e409d39768ea92a6b0f0] Linux 2.6.20-rc4
git-bisect good bf81b46482c0fa8ea638e409d39768ea92a6b0f0
# bad: [27aa6ef3c0e8220b27b0a8d2d0bae7cd0a6d2f78] uml: make signal handlers static
git-bisect bad 27aa6ef3c0e8220b27b0a8d2d0bae7cd0a6d2f78
# good: [64358164f5bfe5e11d4040c1eb674c29e1436ce5] USB: remove duplicate device
id from zc0301
git-bisect good 64358164f5bfe5e11d4040c1eb674c29e1436ce5
-----
and it's taking about 1.3 hours per iteration, with some 500 changesets remaining.

Comment 16 John Reiser 2007-02-25 19:35:54 UTC
In particular the "gcc --version" is:
gcc (GCC) 4.1.2 20070214 (Red Hat 4.1.2-1)


Comment 17 Andy Gospodarek 2007-02-25 20:31:27 UTC
Thanks for the update, John.

I'm running a 32-bit kernel as well.  My first test was actually on fc6 with fc7
kernels (and could reproduce the issue), but I've since installed fc7test1 and
plan to test with a few different kernels before doing a `yum update` and moving
all the packages forward.  If seems the kernels after 14 Feb were broken for you
and I also noticed there were some gcc changes checked in on Feb 14, so I want
to try a build with both the version before the breakage and the version after
to see if it makes a difference.  I should know more tomorrow morning or
afternoon when I can get physical access to box where I can recreate this.



Comment 18 John Reiser 2007-02-25 21:43:59 UTC
"git bisect" is running into compilation errors.  At changeset:
31321bc946527f2e4c50b6b08459d1c0d81fa978  Remove a couple final references to
obsolete verify_area()

I see:
-----
drivers/net/cxgb3/cxgb3_main.c:77:1: warning: "to_net_dev" redefined
In file included from drivers/net/cxgb3/cxgb3_main.c:37:
include/linux/netdevice.h:536:1: warning: this is the location of the previous
definition
drivers/net/cxgb3/cxgb3_main.c: In function ‘attr_show’:
drivers/net/cxgb3/cxgb3_main.c:441: error: ‘struct net_device’ has no member
named ‘class_dev’
drivers/net/cxgb3/cxgb3_main.c:441: warning: type defaults to ‘int’ in
declaration of ‘__mptr’
drivers/net/cxgb3/cxgb3_main.c:441: warning: initialization from incompatible
pointer type
drivers/net/cxgb3/cxgb3_main.c:441: error: ‘struct net_device’ has no member
named ‘class_dev’
-----
so I am trying "git bisect reset --hard HEAD~n" with guesses about 'n'.

Comment 19 John Reiser 2007-02-26 06:07:17 UTC
Created attachment 148780 [details]
the culprit according to git bisect

This is the first bad commit according to git bisect, as well as the full
bisection log.	The short answer is:
-----
86b22b0dfbf462e6ed75e54fc83575dae01e3c69 is first bad commit
commit 86b22b0dfbf462e6ed75e54fc83575dae01e3c69
Author: Ayaz Abdulla <aabdulla>
Date:	Sun Jan 21 18:10:37 2007 -0500

    forcedeth: optimized routines
-----

I worked around a compilation error at some stage in the middle of the
bisection by editing the .config to have 'n' (no) instead of 'm' (module) for
all CONFIG_CHELSIO* options.  Adjusting the bisection point using "git bisect
reset --hard HEAD~n" was taking too long to find something that would compile.

Comment 20 Andy Gospodarek 2007-02-26 13:55:26 UTC
Adding Ayaz to the CC list for this one.

Comment 21 Mladen Kuntner 2007-02-26 19:46:38 UTC
From bad to worse:

kernel 2940: no networking
kernel 2942: kernel panic - no netvorking witn 'noapic'
kernel 2947: kernel panic - no netvorking witn 'noapic'

kernel panic is:
.....
ACPI: Core revision 20070126
..MP-BIOS bug: 8254 timer not connected to IO-APIC
Kernel panic - not syncing: IO-APIC + TIMER doesn't work!
Try using the 'noapic' kernel parameter

all on fully updated 64bit rawhide


Comment 22 Andy Gospodarek 2007-02-26 20:46:44 UTC
The optimizations posted are the ones that were part of the move to 0.60 of the
forcedeth driver (which I think happened in build 2930), so hopefully Ayaz can
provide some feedback about why this might be happening.

Comment 23 Chuck Ebbert 2007-02-26 21:10:09 UTC
Did anyone try booting with the "pci=nomsi" kernel option?


Comment 24 Ayaz 2007-02-26 21:24:28 UTC
Can you try this patch?
http://marc.theaimsgroup.com/?l=linux-netdev&m=117199770027017&w=2


Comment 25 John Reiser 2007-02-26 21:38:23 UTC
Please suggest how to get a real .patch file.   Copy+paste from
marc.theaimsgroup.com (Comment #24) requires extensive fixup to restore tabs,
etc.  wget from the listed URL results in tons of HTML markup that is impossible
to remove.

Comment 26 Chuck Ebbert 2007-02-26 21:44:47 UTC
Click on the the "Download message RAW" link to get:

http://marc.theaimsgroup.com/?l=linux-netdev&m=117199770027017&q=raw

Comment 27 John Reiser 2007-02-26 22:10:55 UTC
Thanks for the hint; building is in progress.

Comment 28 John Reiser 2007-02-26 23:15:20 UTC
The patch referenced from Comment #26 works for me in git kernel version
f8f2de40ff3807d3695109d3463f54 which is 2.6.21-rc1 plus (was master head on
Sunday Feb.25).  My Athlon64 3200+ with nVidia nForce4 CK804 ethernet, running
as a i686 (32 bit mode) now can ping and surf the net.

Comment 29 Andy Gospodarek 2007-03-01 18:32:36 UTC
Sounds great.  This patch will make its way into upcoming builds.

Comment 30 Bill Nottingham 2007-03-02 17:45:48 UTC
Moving to 'devel' as discussed on
https://www.redhat.com/archives/fedora-devel-list/2007-March/msg00095.html.

Comment 31 Guido Ledermann 2007-03-03 18:55:04 UTC
2960 works now for me.


Note You need to log in before you can comment on or make changes to this bug.