All kerrnels from 3189+ began to exhibit this new problem. - Networking stops working after you restart NetworkManager or suspend/resume. - At this point, DHCP attempts fail. - kill -9 of NetworkManager fails. - Sometimes processes like ifconfig or gedit get stuck. - The system deadlocks during a subsequent reboot attempt. There seem to be other ways of triggering this networking breakage, but "service NetworkManager restart" or suspend/resume seem to be the most easily reproducible. This is NOT related at all to iwl3945. Testing was done with iwl3945 deleted and the same behavior persists. Tested Kernels ============== kernel-2.6.21-1.3163.fc7 WORKING kernel-2.6.21-1.3175.fc7 WORKING kernel-2.6.21-1.3176.fc7 WORKING kernel-2.6.21-1.3180.fc7 WORKING kernel-2.6.21-1.3186.fc7 CANNOT BOOT: UNABLE TO FIND LV's kernel-2.6.21-1.3189.fc7 FAILURE kernel-2.6.21-1.3194.fc7 FAILURE kernel-2.6.21-1.3201.fc7 FAILURE kernel-2.6.21-1.3209.fc7 FAILURE Tested Hardware/Arch ==================== T60 x86_64 T42 i386
Tested vanilla 2.6.22-rc3. Behavior is equally broken to Fedora 3189+.
Just tested 3209 here on a dell d820. No problems at all. I can suspend, restart NetworkManager, reboot, etc. The wired net on here is a tg3. I wonder if this is a e1000 problem somehow? Warren: can you try blacklisting the e1000 module and see if that makes any difference?
Confirmed that e1000 seems to trigger this issue. Something changed between 3180 and 3189 to cause this issue. Further testing indicates that restarting NetworkManager is not necessary to trigger this problem. It seems that switching between wired and wireless a few times can trigger it.
GOOD NEWS! http://kernel.org/pub/linux/kernel/v2.6/testing/ChangeLog-2.6.22-rc4 Upstream 2.6.22-rc4 seems to have fixed this problem. Please backport?
devel kernel was updated to 2.6.22-rc4 on Jun 5th.
*** Bug 242317 has been marked as a duplicate of this bug. ***
...and the fix from 2.6.22-rc4 is in 2.6.21.5-rc1
2.6.21.5-rc1 renders previous "e1000: fix netif_poll_enable crash" patch obsolete, so I removed it. Please try kernel-2.6.21-1.3223.fc7 (or later)...does this fix it for you?
Lenovo (IBM) T60 e1000 with kernel 'Linux localhost.localdomain 2.6.21-1.3224.fc7 #1 SMP Fri Jun 8 22:04:55 EDT 2007 i686 i686 i386 GNU/Linux from http://koji.fedoraproject.org/koji/ I am NOT seeing any more issues with [1] ifconfig up.down on eth0 [2] Nor Network manager restarts [3] NOR wlan0 stop/starts (THANKS!) BUT I am seeing large latency issues on a 1 hop to a router [root@localhost Desktop]# ping 10.1.1.254 PING 10.1.1.254 (10.1.1.254) 56(84) bytes of data. 64 bytes from 10.1.1.254: icmp_seq=1 ttl=254 time=381 ms 64 bytes from 10.1.1.254: icmp_seq=2 ttl=254 time=256 ms 64 bytes from 10.1.1.254: icmp_seq=3 ttl=254 time=349 ms 64 bytes from 10.1.1.254: icmp_seq=4 ttl=254 time=224 ms 64 bytes from 10.1.1.254: icmp_seq=5 ttl=254 time=350 ms 64 bytes from 10.1.1.254: icmp_seq=6 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=7 ttl=254 time=351 ms 64 bytes from 10.1.1.254: icmp_seq=8 ttl=254 time=716 ms 64 bytes from 10.1.1.254: icmp_seq=9 ttl=254 time=353 ms 64 bytes from 10.1.1.254: icmp_seq=10 ttl=254 time=1033 ms 64 bytes from 10.1.1.254: icmp_seq=11 ttl=254 time=322 ms 64 bytes from 10.1.1.254: icmp_seq=12 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=13 ttl=254 time=322 ms 64 bytes from 10.1.1.254: icmp_seq=14 ttl=254 time=1001 ms 64 bytes from 10.1.1.254: icmp_seq=15 ttl=254 time=323 ms 64 bytes from 10.1.1.254: icmp_seq=16 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=17 ttl=254 time=184 ms 64 bytes from 10.1.1.254: icmp_seq=18 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=19 ttl=254 time=325 ms 64 bytes from 10.1.1.254: icmp_seq=20 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=21 ttl=254 time=291 ms 64 bytes from 10.1.1.254: icmp_seq=22 ttl=254 time=1001 ms 64 bytes from 10.1.1.254: icmp_seq=23 ttl=254 time=291 ms 64 bytes from 10.1.1.254: icmp_seq=24 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=25 ttl=254 time=291 ms 64 bytes from 10.1.1.254: icmp_seq=26 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=27 ttl=254 time=291 ms 64 bytes from 10.1.1.254: icmp_seq=28 ttl=254 time=1001 ms 64 bytes from 10.1.1.254: icmp_seq=29 ttl=254 time=291 ms 64 bytes from 10.1.1.254: icmp_seq=30 ttl=254 time=1001 ms 64 bytes from 10.1.1.254: icmp_seq=31 ttl=254 time=291 ms --- 10.1.1.254 ping statistics --- 31 packets transmitted, 31 received, 0% packet loss, time 30089ms rtt min/avg/max/mdev = 184.523/588.866/1033.012/339.314 ms, pipe 2 [root@localhost Desktop]# i'd say THIS kernel fixes the 'show stopper' effect of system hangs now but there still may be some performance issues. Thanks
Using 2.6.21-1.3224.fc7, heavy network traffic still causes system freeze.
kernel 2.6.21-1.3224.fc7.x86_64 on Lenovo X60, running NetworkManager-0.6.5-3.fc7. NetworkManager still doesn't seem to know when the network cable is removed or reinserted (except that it did notice the *first* time I removed the cable), but it is possible to get NetworkManager to switch to/from ethernet and wireless. Suspend and resume work, though NetworkManager does not reconnect to the network automatically and I have to do that manually. I'm also seeing pretty extreme variation in ping times to a host plugged into the same switch. From this machine, 20 packets transmitted, 20 received, 0% packet loss, time 18997ms rtt min/avg/max/mdev = 0.408/8.474/70.235/19.240 ms while from a machine running FC6 on the same switch 20 packets transmitted, 20 received, 0% packet loss, time 19009ms rtt min/avg/max/mdev = 0.342/0.464/0.590/0.075 ms I haven't tested under a sustained high network load yet. I also tried with NetworkManager-0.6.5-4.fc7 from updates-testing and got essentially the same results. So this kernel is an improvement--with a bit of manual intervention, I can switch connections, etc.--but there's still some serious performance problem.
Trying kernel http://koji.fedoraproject.org/koji/buildinfo?buildID=8514 Version 2.6.21 Release 1.3225.fc7 T60/E1000 NIC. Local router at 10.1.1.254, DHCP from 10.1.1.5, IP leased at 10.1.1.216 I still since very variable latencies in PING that were not present on FC6 on the same network and laptop hardware. I performed no stress test by loading the E1000. Networkmanager/ifconfig eth0 up/down / wlan0 up/down all seemed to work OK as did plugging in/removing the ethernet cable. Thanks [root@localhost Desktop]# ping 10.1.1.254 PING 10.1.1.254 (10.1.1.254) 56(84) bytes of data. 64 bytes from 10.1.1.254: icmp_seq=1 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=2 ttl=254 time=200 ms 64 bytes from 10.1.1.254: icmp_seq=3 ttl=254 time=732 ms 64 bytes from 10.1.1.254: icmp_seq=4 ttl=254 time=591 ms 64 bytes from 10.1.1.254: icmp_seq=5 ttl=254 time=1000 ms 64 bytes from 10.1.1.254: icmp_seq=6 ttl=254 time=441 ms 64 bytes from 10.1.1.254: icmp_seq=7 ttl=254 time=3.14 ms 64 bytes from 10.1.1.254: icmp_seq=8 ttl=254 time=27.3 ms 64 bytes from 10.1.1.254: icmp_seq=9 ttl=254 time=2.68 ms 64 bytes from 10.1.1.254: icmp_seq=10 ttl=254 time=819 ms 64 bytes from 10.1.1.254: icmp_seq=11 ttl=254 time=2.20 ms 64 bytes from 10.1.1.254: icmp_seq=12 ttl=254 time=247 ms 64 bytes from 10.1.1.254: icmp_seq=13 ttl=254 time=375 ms 64 bytes from 10.1.1.254: icmp_seq=14 ttl=254 time=580 ms 64 bytes from 10.1.1.254: icmp_seq=15 ttl=254 time=97.1 ms --- 10.1.1.254 ping statistics --- 16 packets transmitted, 15 received, 6% packet loss, time 15024ms rtt min/avg/max/mdev = 2.202/408.174/1000.869/350.769 ms, pipe 2
I've seen this kind of strange latencies where some responses are EXACTLY 1000 ms with pre-3189 kernels and http://sourceforge.net/projects/e1000 e1000-7.5.5. I am not seeing any latency problems with Fedora F7 3226 yet.
With 2.6.21-1.3226.fc7.x86_64, pings are still strange. And much more in one direction than the other. From the Fedora 7 laptop (Lenovo X60) to an FC6 box on another port on the same switch (and there's essentially no other traffic on this segment right now): 64 bytes from g2 (192.168.1.6): icmp_seq=1 ttl=64 time=0.670 ms 64 bytes from g2 (192.168.1.6): icmp_seq=2 ttl=64 time=495 ms 64 bytes from g2 (192.168.1.6): icmp_seq=3 ttl=64 time=0.512 ms 64 bytes from g2 (192.168.1.6): icmp_seq=4 ttl=64 time=1.07 ms 64 bytes from g2 (192.168.1.6): icmp_seq=5 ttl=64 time=1.53 ms 64 bytes from g2 (192.168.1.6): icmp_seq=6 ttl=64 time=0.899 ms 64 bytes from g2 (192.168.1.6): icmp_seq=7 ttl=64 time=1.35 ms 64 bytes from g2 (192.168.1.6): icmp_seq=8 ttl=64 time=0.701 ms 64 bytes from g2 (192.168.1.6): icmp_seq=9 ttl=64 time=1.12 ms 64 bytes from g2 (192.168.1.6): icmp_seq=10 ttl=64 time=0.516 ms --- g2 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 8999ms rtt min/avg/max/mdev = 0.512/50.436/495.979/148.514 ms And going the other way: 64 bytes from gsa-wired (192.168.1.22): icmp_seq=1 ttl=64 time=1.08 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=2 ttl=64 time=240 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=3 ttl=64 time=491 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=4 ttl=64 time=240 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=5 ttl=64 time=491 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=6 ttl=64 time=1000 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=7 ttl=64 time=1.15 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=8 ttl=64 time=1001 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=9 ttl=64 time=1.37 ms 64 bytes from gsa-wired (192.168.1.22): icmp_seq=10 ttl=64 time=1170 ms --- gsa-wired ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9002ms rtt min/avg/max/mdev = 1.083/464.086/1170.262/426.521 ms, pipe 2 And if I disable wireless in Network Manager and remove the network cable, NM pops up the disconnect message correctly. If I reinsert the cable, it makes a new connection to the wired network. But if I remove the cable a second time, NM doesn't notice that there's no connection.
> And if I disable wireless in Network Manager and remove the network cable, NM > pops up the disconnect message correctly. If I reinsert the cable, it makes a > new connection to the wired network. But if I remove the cable a second time, > NM doesn't notice that there's no connection. I see this behavior as well. However testing 3180, it doesn't notice that ethernet was unplugged even the first time.
With the final release image of Fedora 7 I get timeouts trying to do a kickstart installation with nfs and http installation methods and pxe boot. On the same system Fedora Core 6/5/4/3 installs just fine with pxe and nfs. Earlier today I rebuilt the installation media with the 2.6.21-1.3226.fc7.x86_64 kernel and attempted kickstart installation on a system that requires the e1000 driver to get the packages over the network. I still get dhcp timeouts causing the install to fail to get the stage 2 image files from the server. I have tried using nfs and http installation methods and both fail the same way. Tomorrow I will try taking an old version of the e1000 driver and compile it into the latest davej kernel for f7 and run a rebuild on the installation media and see if I am able to kickstart it.
Is the problem with the module known yet as far as what is causing the slow and such large variations in response time.
*** Bug 243977 has been marked as a duplicate of this bug. ***
The new 2.6.21-1.3228.fc7 fixes this issue on my two 32 bit boxes running F7.
3228 does not fix the problem for me (though I'm on 64 bit). I'm still seeing weird ping times, e.g., 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=1 ttl=63 time=53.3 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=2 ttl=63 time=24.3 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=3 ttl=63 time=0.816 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=4 ttl=63 time=491 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=5 ttl=63 time=0.634 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=6 ttl=63 time=1.01 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=7 ttl=63 time=625 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=8 ttl=63 time=0.850 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=9 ttl=63 time=411 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=10 ttl=63 time=0.663 m s 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=11 ttl=63 time=1.05 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=12 ttl=63 time=15.4 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=13 ttl=63 time=0.884 m s 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=14 ttl=63 time=207 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=15 ttl=63 time=0.698 m s 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=16 ttl=63 time=1.11 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=17 ttl=63 time=30.4 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=18 ttl=63 time=0.916 m s 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=19 ttl=63 time=485 ms 64 bytes from ah.math.umass.edu (128.119.47.96): icmp_seq=20 ttl=63 time=0.732 m s --- ah.math.umass.edu ping statistics --- 20 packets transmitted, 20 received, 0% packet loss, time 18999ms rtt min/avg/max/mdev = 0.634/117.797/625.977/201.200 ms And NetworkManager seems not to always know whether the network cable is attached or not. (Though restarting NM does seem to resolve the problem.)
Ok, I seemed to have jumped the gun a bit. Initially all appeared to be fixed, but over time I'm starting to see the exact same ping time weirdness. The issue with it locking up the system is definitely fixed though.
No good for me either....
I haven't had any of the ping issues and don't use NetworkManager, but the new kernel has fixed all the dhcp problems and don't die when I unplug it anymore.
Is the NetworkManager working? Is it interacting poorly with the E1000?
FWIW, while working on a suspend issue (#241310), incidentaly, I noticed that booting with the option 'hpet=disable' improved a lot the ping rtt regularity (Intel Corporation 82573L Gigabit Ethernet Controller, on a thinkpad T60p)
I've updated the e1000.ko module in the kickstart initrd.img with modules built from the following e1000 sources: kernel-2.6.20-1.3104.fc7 kernel-2.6.21-1.3228.fc7 vanilla 2.6.21.5 kernel and I still can't get kickstart to work with DHCP. Yet it works fine with static IPs in the ks.cfg. Anyone here able to kickstart a box with an e1000 using DHCP? If so, please share your secret...
I grabbed the latest e1000 version from Intel, built it for the F7 release kernel (i386), and rebuilt initrd.img with the updated e1000. With that, kickstart DHCP works for me. For anyone that wants to try this, I put up my RPMs and rebuilt initrd.img files at http://www.cmadams.net/fedora/f7-e1000/ (the x86_64 files are untested; I don't have an x86_64 with e1000 to test at the moment).
Thanks, I meant to update the ticket with my (eventual) success. I took the -3228 e1000 code and rebuilt it for 3194 and got kickstart working again. Guess the first time I tried I screwed it up somehow.
*** Bug 247480 has been marked as a duplicate of this bug. ***
I tested Chris Adams' initrd.img for x86_64 (see comment #27) and it worked just fine.
Also tested Chris Adams' initrd.img for x86_64 and it resolves the dhcp+ks problem.
*** Bug 244493 has been marked as a duplicate of this bug. ***
This seems to have been fixed for a while. Closing.
Although this has been listed as closed I have been seeing behavior similar to comment #20 intermittently using kernels: kernel-2.6.21-1.3194.fc7 kernel-2.6.22.5-76.fc7 The following is the ping statistics from an F7 machine to the switch: --- 192.168.8.1 ping statistics --- 63 packets transmitted, 63 received, 0% packet loss, time 62476ms rtt min/avg/max/mdev = 0.308/10.963/641.857/80.124 ms Although most packets are in the 1 ms range every few end up in the 200-1000 ms range giving the statistics above. lspci: 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller I have tried installing the vanilla e1000 7.6.5 drivers from Intel and this has fixed the latency problems, however this appears to cause the machines to freeze during high network load. I have not noticed any issues with DHCP as referenced above.
For all of you having latency problems or crashes during high network loads on a T60/X60. We (Intel) have a patch that disables ASPM on the PCI-Express interface and which (I believe) solves both problems. Please let me know if you would like me to follow up with the patch.
Yes, Jesse. The patch would be appreciated. My T60 suffers!!! Also, would this roll into Fedora 8? Is the SAME error present in newer Redhat core commercial products on E1000's? Thanks Ivan
For anyone looking for the followup to Jesse's refernce to the patch, it has been supplied in bug #400561. I've only used the patch supplied in comment #10, but it appears to work perfectly.
Fix is in F7 kernel CVS
This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists. Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs: http://docs.fedoraproject.org/release-notes/ The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
assuming fixed.