Description of problem: Intel ESB2/Gilgal (82563EB) NIC (for instance, this NIC is used on Supermicro motherboards like this: http://www.supermicro.com/products/motherboard/Xeon1333/5000V/X7DVL-E.cfm) requires at driver version 7.6.5-NAPI or later. Although driver versions before 7.6.5-NAPI announce support for PCI ID 0x8086:0x1096 the fact is that the system with such a NIC becomes unreachable via network in 5-10 minutes after the boot. Version-Release number of selected component (if applicable): e1000 7.3.20-k2 as included in the latest rhel5.1 kernels How reproducible: The problem manifests itself on the specified hardware - no network connectivity after several minutes from server's startup. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I have also added a description of this issue to bug #398921, and submitted a bug report to OpenVZ bugzilla (they are using RHEL5.1 kernels and enhancing them with OpenVZ functionality) here: http://bugzilla.openvz.org/show_bug.cgi?id=530#c6
The Intel driver from sourceforge has some interesting heritage. Intel did a major refactor of the driver, but when they went to push it upstream it wasn't too well received. The changes were so drastic that it was determined that Intel should split the driver into 2 versions. The first submission of the e1000e driver simply added support for some hardware that didn't previously exist -- this was the best way to not disturb the e1000 driver. Recently e1000e was considered stable enough to move all the PCIe hardware over to e1000e and those changes were made upstream. The older e1000 driver did a poor job of driving much of the newer e1000 PCIe hardware so I expect this will help, but I have not tested on your specific hardware to be sure. Just yesterday I pulled these changes into my experimental gtest kernels. It would help if you could try them out here: http://people.redhat.com/agospoda/#rhel5 You will probably have to change your /etc/modprobe.conf to use e1000e instead of e1000 for your NIC[s], but other than that you should be fine.
(In reply to comment #2) > Just yesterday I pulled these changes into my experimental gtest kernels. It > would help if you could try them out here: I have tried to boot one of our servers with e1000e instead of e1000, no luck. e1000e just didn't recognize 82563EB. BTW, e1000 (7.3.20-k2) from vanilla kernel 2.6.22.1 works with 82563EB without a single failure.
(In reply to comment #3) > BTW, e1000 (7.3.20-k2) from vanilla kernel 2.6.22.1 works with 82563EB without a > single failure. Oh, just spotted that there is no NAPI enabled in that build of e1000, but we need this functionality. Perhaps there is something in NAPI code of older e1000 modules that lockups the NIC? it's just a speculation since I hadn't investigated this.
Intersting that you say your device (0x8086,0x1096) isn't supported by e1000e since my latest test kernels show that as an included device. # modinfo e1000e | grep 1096 alias: pci:v00008086d00001096sv*sd*bc*sc*i* Is that not the pci-id of the card you are using?
My test kernels moved those PCI ids to e1000e so they should work much better. Install them from here: http://people.redhat.com/agospoda/#rhel5 and then switch modprobe.conf to use e1000e instead of e1000 for those devices and you should notice significantly better (at least more stable) performance.
Could this cause a problem I'm seeing. Upgraded a server from FC5 (worked 100% ok) to F8. Instantly our samba file sharing gives strange TCP errors and drops connections. Keeping F8, but using F6/F5 samba version seems to help a bit, but not much. The box has onboard Intel LAN: 04:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03) Intel(R) PRO/1000 Network Driver - version 7.3.20-k2-NAPI Copyright (c) 1999-2006 Intel Corporation. ACPI: PCI Interrupt 0000:04:00.0[A] -> GSI 17 (level, low) -> IRQ 17 PCI: Setting latency timer of device 0000:04:00.0 to 64 e1000: 0000:04:00.0: e1000_probe: (PCI Express:2.5Gb/s:Width x1) 00:13:20:d3:5b:18 e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection Looks like the o/b LAN is PCIe in disguise. It's using the e1000 driver. I cannot seem to force it to e1000e. Using latest, stock, F8 kernel. modinfo shows this card's id is set to use e1000. Could e1000e possibly help with my issue? I've exhausted all other ideas at this point. If these cards should use e1000e, why aren't the latest F8 modules setup that way? I am heading out in the next few days to try a (ah, always reliable) 8139 NIC if I can. Thanks!
Trevor, you may want to check this out if the kernels you are using have don't allow use of e1000e for 82573 hardware. The e1000e driver that has support for this hardware has a workaround for the power-saving issue, but the firmware fix described here supposedly works with the older drivers. http://e1000.sourceforge.net/doku.php?id=known_issues#v_l_e_tx_unit_hang_messages
Update to comment #7, please ignore completely. The strange problem was not the e1000, it was a flaky 48p Gb switch! It was randomly corrupting/dropping packets. I expected more of a $1k switch. I did try the power-saving fix, which can't hurt. I'll see if F8's newer kernels support e1000e, otherwise I'm sure it will be in F9. Thanks!
GalaxyMaster, I'm guessing this is no longer a problem since I haven't heard from you in over a year. Please reopen this bug if the problem still persists. Thanks!