Bug 242317 - e1000 NIC causes system hang
Summary: e1000 NIC causes system hang
Keywords:
Status: CLOSED DUPLICATE of bug 241783
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 7
Hardware: i686
OS: Linux
low
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-06-03 08:34 UTC by Byron Li
Modified: 2007-11-30 22:12 UTC (History)
9 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-06-08 18:08:24 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
output of `lspci -vvv' (20.72 KB, text/plain)
2007-06-03 12:17 UTC, David Kovalsky
no flags Details

Description Byron Li 2007-06-03 08:34:16 UTC
Description of problem:
e1000 NIC didn't work properly under F7. 
It seems that there is two major problems with it:
1. It can hardly get IP addres from DHCP. 
2. If unplug the network wire from NIC, it causes the system hang.

The kernel I'm current using is kernel-2.6.21-1.3194.fc7 on i686 platform.


How reproducible:
This problem is always reproducible on my IBM ThinkPad T42 2373-NTH. 

Steps to Reproduce:
1. Start Fedora 7 normally with network configured as get IP from DHCP.
2. It may not get IP address on boot up. This is the first problem.
3. If it get IP on boot up. After boot up, open an terminal and run "service
network restart". It cannot get IP address at this time.
4. Unplug the network wire from the NIC, the system hangs, it will not respond
to keyboard anymore. You have to shutdonw the power and reboot your machine.
  
Actual results:
1. Cannot get IP on boot up or restaring network service.
2. System hangs when unpluging the network wire.

Expected results:
1. It can get IP correctly.
2. System won't hang when unpluging the wire.

Additional info:
If I replace the kernel with RHEL5's kernel-2.6.18-8.el5 or FC6's
kernel-2.6.20-1.2952.fc6 this problem no longer exists.
If I rebuild the F7 kernel from source, this problem still exists.
It seems that this bug is similar to 
http://www.uwsg.iu.edu/hypermail/linux/kernel/0705.3/0086.html
But I cannot confirm this.

Comment 1 Joe Rozner 2007-06-03 09:23:09 UTC
I am using 2.6.21-1.3194.fc7 as well on my Thinkpad T40 and am also running into
this problem. But also unplugging my nic and then moving my laptop to another
network I also run into the problem, even when it is turned off.

Comment 2 David Kovalsky 2007-06-03 12:16:11 UTC
I have a Lenovo T60, same problem, only with 1.3194.fc7. 1.3189.fc7 doesn not
suffer from this issue.

1) eth0 does not get an IP during boot time
2) ifup eth0 powers up the if, but does not acquire the IP
3) dhclient eth0 and ifconfig eth0 shows that there are packets leaving the if,
   unfortunately the system stops responding to the keyboard in about 2 minutes
   and becomes totally unresponsive if trying to reboot or poweroff

I see nothing interesting in messages nor on the terminal.

Comment 3 David Kovalsky 2007-06-03 12:17:41 UTC
Created attachment 156021 [details]
output of `lspci -vvv'

Comment 4 Damien McVeigh 2007-06-03 17:10:39 UTC
I also experienced this issue on Dell Poweredge 1800, using Intel 82541GI gige 
controller.  For me the problem appears when attempting to re-configure the IP 
following a successful initialisation.  

For example, when kickstarting the new build the box was able to intially 
acquire an IP and successfully download the kickstart configuration, which 
included a dhcp network statement; once this was applied the kickstart would 
hang as it was unable to mount the nfs on the server to connect to the 
installation files as it re-configures the interface before proceeding. (dhcpd 
and nfsd run on the same server).  At both points the dhcpd logs that it 
offers the client the correct IP but on the second attempt the client does not 
reply with a dhcp request packet and eventually becomes unresponsive at the 
console.

Workaround for this was to edit the kickstart config to use a static IP and 
the build completes successfully.  Following firstboot I re-configured the NIC 
back to use dhcp and restarted the network service - the NIC did not re-
acquire an IP and the system became unresponsive.  On all subsequent boots the 
system always acquires an IP via dhcp during boot but will fail post-boot when 
the NIC is brought down then back up.

Comment 5 Joe Rozner 2007-06-03 18:50:27 UTC
I've been playing with my laptop this morning trying to discover the exact
conditions in which this happened and I've  discovered some interesting things.
These system hangs and failures of the NIC only happen when the link is
unplugged while the box has power and maybe only when it's booted. I was
successfully able to turn off my laptop, unplug the power and then unplug the
link then replug in the link and the power (in that order) and still have my NIC
work. I then attempted to bootup without link, shutdown, bootup with link and it
worked fine. I haven't tried plugging the link into a booted system or
unplugging while the NIC has power.

Comment 6 Thomas Bittermann 2007-06-04 07:52:55 UTC
The e1000 problem was not the only one after a fresh f7 install:
- suspend didn't work
- postfix didn't stop after network mess and so system didn't shutdown

My system is a Lenovo ThinkPad T60.

I replaced the kernel this morning with 2.6.21-1.3200.fc8 from development and
everything is working fine again.

BTW: Why is %dist tag still named "fc"?

Comment 7 Giuseppe Paterno 2007-06-04 13:36:05 UTC
I have an x32, same problem with the integrated ethernet using e1000.
I'm using 2.6.20-1.2948.fc6 in order to make the network work.
No messages in /var/log/messages when it hangs.


Comment 8 Ian Laurie 2007-06-05 03:25:42 UTC
Similar trouble on an Intel OEM server board (with on board e1000) SE7210TP1E.

1. Can't always get an IP from DHCP server (usually can't).
2. Ethernet is intermittent.  FTP & SSH sessions get randomly dropped.
3. Can't download updates.
4. System hangs when shutting down.
5. Other inconsistent weirdness.

Was a fresh F7 install.


Comment 9 Alex 2007-06-05 05:35:58 UTC
The same issue happened on my Benq S72G-110.
When I "Active" the network, the computer hangs.
After replacing kernel-2.6.21-1.3194.fc7 with kernel-2.6.20-1.2952.fc6, the 
issue seems to be solved as the issue doesn't happen.

Comment 10 Andy Lawrence 2007-06-05 14:16:53 UTC
Same result with the VIA VT6102 Rhine-II rev 7c network card.  Heavy network
traffic = hang.

Kernel 3212.i686 is better, but still hangs.

Comment 11 George Avrunin 2007-06-05 23:57:51 UTC
Same symptoms on a Lenovo X60 running kernel 3194, x86_64.  I don't normally get
an address from DHCP when I plug in the wire, though I seem to be ok if I boot
the machine with the wire plugged in or (at least sometimes) if it's the first
network connection made by NetworkManager even if I don't have the wire plugged
in when the machine boots.  If I remove the wire and reinsert it later, the
machine locks up completely and I have to turn off the power. 

Comment 12 Stanis Trendelenburg 2007-06-06 08:18:09 UTC
Switching to kernel-2.6.21-1.3207.fc8.i686
(http://koji.fedoraproject.org/koji/buildinfo?buildID=8063) solved this issue
for me: DHCP at boot time now works fine, ifup/ifdown and 'modprobe -r e1000' no
longer block indefinitely.

Comment 13 bobsyeruncle 2007-06-07 16:36:00 UTC
Seeing similar on T60 IBM/Lenovo running the F7 GA

What is see.

nic plugged into
boot single user
modprobe/rmmod e1000 aqs often as you like NO problems
FIRST time service network start, no problem, NIC gets IP from DHCP bootps request
The service network stop is Ok

then ANY command that touches eth0

i.e. ifconfig eth0, ifdown/up eth0 is a FULL SYSTEM HANG....

need to power off.

Thanks,


Comment 14 Auke Kok 2007-06-07 17:47:39 UTC
Several patches have made it upstream to fix some netif_poll_* issues. Once they
are available for FC users it should be all OK.

Comment 15 Andy Lawrence 2007-06-07 18:14:49 UTC
(In reply to comment #14)
> Several patches have made it upstream to fix some netif_poll_* issues. Once 
they
> are available for FC users it should be all OK.

I tested the 3212 FC Kernel where, which has the fix:

http://koji.fedoraproject.org/koji/buildinfo?buildID=7769

It was better but still hard locked.

I had to drop back to 3143, which had an older e1000 driver.  I do not believe 
the netif_poll fin e1000_open fixes the issue.  Well, unless implemented 
wrong.... 



Comment 16 Auke Kok 2007-06-07 18:42:26 UTC
As I said, there are "several" patches upstream.

Please test the latest git tree from Linus to make sure you have all of the
latest patches. I personally do not track fedora kernels and do not know which
patches are merged in there.

Comment 17 Chuck Ebbert 2007-06-07 18:45:57 UTC
There are more fixes in kernel 3218.


Comment 18 Chuck Ebbert 2007-06-07 22:26:44 UTC
kernel 3218 or later can be found at

http://people.redhat.com/davej/kernels/Fedora/fc7/

Would somenw with this bug please test.

Comment 19 Thomas Davis 2007-06-08 02:49:31 UTC
I'm having this problem, and I've just installed kernel-2.6.21-1.3219.fc7.i686.rpm, I'll let 
you know if it fixes this.

Comment 20 Thomas Davis 2007-06-08 03:00:52 UTC
Ok, it failed.  Simply doing ifdown eth0/ifup eth0 demonstrates this bug.


Comment 21 Jared Hoover 2007-06-08 04:59:36 UTC
This one is a real showstopper for me and stops me from being able to do
kickstart installations.  I am seeing the DHCP timeout issue in every system
that requires the e1000 driver so far.  I will try and gather as much
information as I can to help solve this problem next week.

This problem also arose previously for me when I rebuilt the Fedora Core 6
kernel with the same version e1000 driver that is in the Fedora 7 default kernel.

Comment 22 Jared Hoover 2007-06-08 05:04:58 UTC
In addition to my previous comment I can confirm that this issue is on both
x86_64 and i386 architectures.

Comment 23 bobsyeruncle 2007-06-08 06:04:27 UTC
I tried this kernel 
http://koji.fedoraproject.org/koji/buildinfo?buildID=8282

And found that 

[1] I could get a DHCP address every time
[2] That I could remove/add the physical ethernet cabel during a ping 
    and NetworkManager WOULD now get my link back BUT

[a] very variable round trip latencies pinging a one hop
    router, 2-500ms where its normally 2ms

[b] and ifconfig eth0 down/ifconfig eth0 up 

still hangs everything.

Thanks


Comment 24 George Avrunin 2007-06-08 15:51:48 UTC
I am running kernel-2.6.21-1.3219.fc7.x86_64 from koji now (Lenovo X60) with
NetworkManager.  It still doesn't notice if the network cable is removed from
the machine (and so doesn't try to start up the wireless),  The ping latencies
seem very high and quite variable, ranging from 7ms to 309ms on successive
packets.  

If I click on the wired network in NetworkManager, I do get a popup saying the
network is disconnected but it then quickly reconnects correctly.  If I disable
networking in NetworkManager, I am unable to re-enable it.  And doing "service
NetworkManager restart" at that point hangs, leaving the machine in a state
where I can't open a new gnome-terminal and restarting from the menu in the
panel doesn't work.  In fact, although I can switch to VT1 and login as root,
"shutdown -h now" from there gives repeated error messages about failing to
umount /home and then hangs at "Synchronizing SCSI cache for disk sda".  So I
still have to use the power button to turn the machine off.  

So I basically confirm the information from bobsyeruncle in comment #23.  It
does get DHCP correctly, but still doesn't work properly.

Comment 25 Warren Togami 2007-06-08 18:08:24 UTC

*** This bug has been marked as a duplicate of 241783 ***


Note You need to log in before you can comment on or make changes to this bug.