Bug 190262
Summary: | channel bonding causes system lockup / freeze | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Randy Zagar <zagar> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.0 | CC: | jbaron, mmahudha | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i386 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-10-12 15:07:47 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Randy Zagar
2006-04-29 16:05:57 UTC
Created attachment 128396 [details]
Kermel messages captured by syslog prior to latest lockup
Obviously the hang is bad, but you have a 10.8.17.227 and that's messed up too. Has 10.8.17.227 been messed up for a long time or is that something new this week? 10.8.17.227 works just fine on my private non-routable network... What do _you_ think should be wrong with this? hmmm, i belive this should already be fixed in the beta. please try: http://people.redhat.com/~jbaron/rhel4/ Any test results? The log looks more like an e1000 problem, although it is possible that the bonding transmit scheduler is interfering w/ the expected operation of the driver. Have you tried any other bonding modes? Mode 2 or even mode 4 are likely to be "drop-in" replacements for mode 0. Please note that after changing the "mode=..." option in modprobe.conf you will need to either reboot or explicitly remove the bonding module before reloading it. Just to be clear, are you able to successfully complete the same operation using only individual e1000 interfaces rather than a bond? The log output looks a lot like bug 182215 (a Fedora bug), FWIW... Unfortunately, I no longer have e1000 systems available for testing. I have had two server failures and all my test equipment is now in full production. For those of you working for RedHat, there is an open service request with more information. It is Service Request 874560. I do, however, have 4 HP DL360s with dual broadcom BCM5703X adapters. I will try to reproduce the problem on those systems... 8 copies of netcat shoving local:/dev/zero into remote:/dev/null ought to do the trick... Randy, could you try "ethtool -K eth0 tso off" (replace eth0 as appropriate)? Please do so and post the results here...thanks! Copy/Posting from BZ #194460 (for FC5) One of our test (internal) file servers had the same problem yesterday, and it took the network down along with it as well (very serious) .. Only eth0 i.e. onboard 82573V was in use at the time of the problem. Currently this interface has been downed, and the server is currently running of the other onboard NIC. # lspci 00:00.0 Host bridge: Intel Corporation E7230 Memory Controller Hub 00:1c.0 PCI bridge: Intel Corporation 82801G (ICH7 Family) PCI Express Port 1 (rev 01) 00:1c.4 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 5 (rev 01) 00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01) 00:1d.0 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #1 (rev 01) 00:1d.1 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #2 (rev 01) 00:1d.2 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #3 (rev 01) 00:1d.3 USB Controller: Intel Corporation 82801G (ICH7 Family) USB UHCI #4 (rev 01) 00:1d.7 USB Controller: Intel Corporation 82801G (ICH7 Family) USB2 EHCI Controller (rev 01) 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1) 00:1f.0 ISA bridge: Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) 00:1f.2 SATA controller: Intel Corporation 82801GR/GH (ICH7 Family) Serial ATA Storage Controller AHCI (rev 01) 00:1f.3 SMBus: Intel Corporation 82801G (ICH7 Family) SMBus Controller (rev 01) 03:00.0 Ethernet controller: Intel Corporation 82573V Gigabit Ethernet Controller (Copper) (rev 03) 04:04.0 VGA compatible controller: ATI Technologies Inc ES1000 (rev 02) 04:05.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller (rev 05) # lspci -n 00:00.0 Class 0600: 8086:2778 00:1c.0 Class 0604: 8086:27d0 (rev 01) 00:1c.4 Class 0604: 8086:27e0 (rev 01) 00:1c.5 Class 0604: 8086:27e2 (rev 01) 00:1d.0 Class 0c03: 8086:27c8 (rev 01) 00:1d.1 Class 0c03: 8086:27c9 (rev 01) 00:1d.2 Class 0c03: 8086:27ca (rev 01) 00:1d.3 Class 0c03: 8086:27cb (rev 01) 00:1d.7 Class 0c03: 8086:27cc (rev 01) 00:1e.0 Class 0604: 8086:244e (rev e1) 00:1f.0 Class 0601: 8086:27b8 (rev 01) 00:1f.1 Class 0101: 8086:27df (rev 01) 00:1f.2 Class 0106: 8086:27c1 (rev 01) 00:1f.3 Class 0c05: 8086:27da (rev 01) 03:00.0 Class 0200: 8086:108b (rev 03) 04:04.0 Class 0300: 1002:515e (rev 02) 04:05.0 Class 0200: 8086:1076 (rev 05) ifconfig before taking down the problem NIC: # cat ifconfig.out eth0 Link encap:Ethernet HWaddr 00:13:20:D6:AD:E3 inet addr:10.65.6.1 Bcast:10.65.6.255 Mask:255.255.255.0 inet6 addr: fe80::213:20ff:fed6:ade3/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:84132882 errors:297966072 dropped:297966072 overruns:297966072 frame:0 TX packets:10677632885 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:75213992657 (70.0 GiB) TX bytes:854693824469 (795.9 GiB) Base address:0x2000 Memory:88100000-88120000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6082 errors:0 dropped:0 overruns:0 frame:0 TX packets:6082 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:868538 (848.1 KiB) TX bytes:868538 (848.1 KiB) # ethtool -e eth0 Offset Values ------ ------ 0x0000 00 13 20 d6 ad e3 30 0b 46 f7 01 10 ff ff ff ff 0x0010 ff ff ff ff 6b 02 a3 30 86 80 8b 10 86 80 de 80 0x0020 00 00 00 20 14 7e 00 00 00 00 d8 00 00 00 00 27 0x0030 c9 6c 50 31 22 07 0b 04 84 09 00 00 00 c0 06 07 0x0040 08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 0x0060 00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 22 57 # ethtool -e eth1 Offset Values ------ ------ 0x0000 00 13 20 d6 ad e4 10 02 ff ff 00 10 ff ff ff ff 0x0010 ff ff ff ff 0b 64 a1 30 86 80 76 10 86 80 84 b2 0x0020 dd 20 22 22 00 00 90 2f 80 23 12 00 20 1e 12 00 0x0030 20 1e 12 00 20 1e 12 00 20 1e 09 00 00 02 00 00 0x0040 0c 00 a6 93 0b 28 00 00 00 04 ff ff ff ff ff ff 0x0050 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 02 06 0x0060 00 01 00 40 1c 12 07 40 ff ff ff ff ff ff ff ff 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 83 18 # uname -rmpio 2.6.9-34.ELsmp x86_64 x86_64 x86_64 GNU/Linux Bug 194460 is closed as UPSTREAM. Can you try the test kernels here? http://people.redhat.com/linville/kernels/rhel4/ Those have a very late driver from upstream. Please give them a try and post the results here...thanks! Closed due to lack of response. Please reopen when the requested information becomes available...thanks! |