Hello Andy, Since bnx2 update to 1.7.9, all my servers dropping packets randomly. I've test with/without TSO and rx-checksumming and produce same behaviour. [12:49:52] root@mail-fe05(temora):~# ifconfig eth1 | grep pack RX packets:5795441 errors:0 dropped:1922 overruns:0 frame:0 TX packets:5054322 errors:0 dropped:0 overruns:0 carrier:0 [12:50:48] root@mail-fe05(temora):~# ifconfig eth1 | grep pack RX packets:6162389 errors:0 dropped:2294 overruns:0 frame:0 TX packets:5373380 errors:0 dropped:0 overruns:0 carrier:0 [12:57:13] root@mail-fe05(temora):~# uname -a Linux temora.hst.terra.com.br 2.6.9-80.0.2.ELsmp #1 SMP Sun Feb 8 15:03:49 UTC 2009 i686 athlon i386 GNU/Linux [12:59:35] root@mail-fe05(temora):~# ethtool -i eth1 driver: bnx2 version: 1.7.9-1 firmware-version: 1.9.6 bus-info: 0000:07:05.0 [12:59:39] root@mail-fe05(temora):~# ethtool -i eth0 driver: bnx2 version: 1.7.9-1 firmware-version: 1.9.6 bus-info: 0000:06:04.0 [12:59:46] root@mail-fe05(temora):~# lspci | grep -i eth 03:04.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 04:05.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 06:04.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 07:05.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 41:01.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) 41:02.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5706 Gigabit Ethernet (rev 02) Extra patches: [13:01:07] root@mail-fe05(temora):~# rpm -q --changelog kernel-smp-2.6.9-80.0.2.EL | more * Sun Feb 08 2009 Marcus Alves Grando <marcus.grando.br> [TERRA VERSION] -kernel: Enable REISERFS and XFS modules -kernel: Change default kernel HZ to 250 -relatime: Relative atime updates (default: off) -e1000: descriptor ring dump -e1000: msi test and switch to intx -bonding: keep all traffic when inactive device in promiscuous mode -bnx2: fixup poll_controller routine -bnx2: enable netdump again -bnx2: fixup needed to allow netdump operation to complete -bnx2: enable netdump again * Fri Jan 23 2009 Vivek Goyal <vgoyal> [2.6.9-80] ... Something to test? Regards
Andy, After those patches, work worst. -bnx2: fixup poll_controller routine -bnx2: enable netdump again -bnx2: fixup needed to allow netdump operation to complete -bnx2: enable netdump again Nic reseting randomly... NETDEV WATCHDOG: eth1: transmit timed out bnx2: eth1 NIC Copper Link is Down bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth1: transmit timed out bnx2: eth1 NIC Copper Link is Down bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth1: transmit timed out bnx2: eth1 NIC Copper Link is Down bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex Regards
Marcus, which version was the last one that worked well?
(Sorry to not ask these questions in one comment.) Marcus, can you send me the output from ethtool -S eth1 and eth0? I'd like to understand why the frames are being dropped and the ethtool output might give us more information. I would like to see if 'rx_fw_discards' is incrementing.
(In reply to comment #3) > (Sorry to not ask these questions in one comment.) > > Marcus, can you send me the output from ethtool -S eth1 and eth0? I'd like to > understand why the frames are being dropped and the ethtool output might give > us more information. I would like to see if 'rx_fw_discards' is incrementing. Yes, sure. [16:39:56] root@mail-fe05(temora):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; echo "---------"; sleep 10; done rx_discards: 0 rx_fw_discards: 1875 rx_discards: 0 rx_fw_discards: 4368 --------- rx_discards: 0 rx_fw_discards: 1875 rx_discards: 0 rx_fw_discards: 4412 --------- rx_discards: 0 rx_fw_discards: 1875 rx_discards: 0 rx_fw_discards: 4503 [16:40:59] root@mail-fe05(temora):~# ethtool -S eth0 NIC statistics: rx_bytes: 1746844574 rx_error_bytes: 0 tx_bytes: 3297469196 tx_error_bytes: 0 rx_ucast_packets: 2909104 rx_mcast_packets: 0 rx_bcast_packets: 1486 tx_ucast_packets: 3655832 tx_mcast_packets: 0 tx_bcast_packets: 0 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 0 rx_align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 tx_deferred: 0 tx_excess_collisions: 0 tx_late_collisions: 0 tx_total_collisions: 0 rx_fragments: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_oversize_packets: 0 rx_64_byte_packets: 1228957 rx_65_to_127_byte_packets: 418654 rx_128_to_255_byte_packets: 114947 rx_256_to_511_byte_packets: 5520 rx_512_to_1023_byte_packets: 137437 rx_1024_to_1522_byte_packets: 1005075 rx_1523_to_9022_byte_packets: 0 tx_64_byte_packets: 959754 tx_65_to_127_byte_packets: 477018 tx_128_to_255_byte_packets: 37313 tx_256_to_511_byte_packets: 36872 tx_512_to_1023_byte_packets: 78072 tx_1024_to_1522_byte_packets: 2066803 tx_1523_to_9022_byte_packets: 0 rx_xon_frames: 0 rx_xoff_frames: 0 tx_xon_frames: 0 tx_xoff_frames: 0 rx_mac_ctrl_frames: 0 rx_filtered_packets: 27327 rx_discards: 0 rx_fw_discards: 2278 [16:41:02] root@mail-fe05(temora):~# ethtool -S eth1 NIC statistics: rx_bytes: 133768241 rx_error_bytes: 0 tx_bytes: 17685864 tx_error_bytes: 0 rx_ucast_packets: 137602 rx_mcast_packets: 1 rx_bcast_packets: 84 tx_ucast_packets: 120623 tx_mcast_packets: 0 tx_bcast_packets: 0 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 0 rx_align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 tx_deferred: 0 tx_excess_collisions: 0 tx_late_collisions: 0 tx_total_collisions: 0 rx_fragments: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_oversize_packets: 0 rx_64_byte_packets: 6863 rx_65_to_127_byte_packets: 2033 rx_128_to_255_byte_packets: 25042 rx_256_to_511_byte_packets: 17473 rx_512_to_1023_byte_packets: 4543 rx_1024_to_1522_byte_packets: 81733 rx_1523_to_9022_byte_packets: 0 tx_64_byte_packets: 47359 tx_65_to_127_byte_packets: 14982 tx_128_to_255_byte_packets: 54104 tx_256_to_511_byte_packets: 2921 tx_512_to_1023_byte_packets: 191 tx_1024_to_1522_byte_packets: 1066 tx_1523_to_9022_byte_packets: 0 rx_xon_frames: 0 rx_xoff_frames: 0 tx_xon_frames: 0 tx_xoff_frames: 0 rx_mac_ctrl_frames: 0 rx_filtered_packets: 50 rx_discards: 0 rx_fw_discards: 90 Regards
(In reply to comment #2) > Marcus, which version was the last one that worked well? 1.6.9 does not work too. I'll try to find out. [16:44:33] root@mail-fe01(beleterro):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; echo "---------"; sleep 10; done rx_discards: 0 rx_fw_discards: 7553697 rx_discards: 0 rx_fw_discards: 823645 --------- rx_discards: 0 rx_fw_discards: 7556198 rx_discards: 0 rx_fw_discards: 823788 --------- rx_discards: 0 rx_fw_discards: 7556320 rx_discards: 0 rx_fw_discards: 823788 [16:45:11] root@mail-fe01(beleterro):~# ethtool -i eth1 driver: bnx2 version: 1.6.9 firmware-version: 1.9.6 bus-info: 0000:07:05.0
(In reply to comment #2) > Marcus, which version was the last one that worked well? Andy, I've tried with 2.6.9-67.0.22 and works well. A litte bit strange but much better than >2.6.9-78 [17:17:55] root@mail-fe05(temora):~# ethtool -i eth0 driver: bnx2 version: 1.5.11-rh firmware-version: 1.9.6 bus-info: 0000:06:04.0 [17:10:29] root@mail-fe05(temora):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; ifconfig | grep dropp; echo "-----------"; sleep 10; done rx_discards: 0 rx_fw_discards: 5537 rx_discards: 0 rx_fw_discards: 0 RX packets:8225543 errors:0 dropped:5537 overruns:0 frame:0 TX packets:11102060 errors:0 dropped:0 overruns:0 carrier:0 RX packets:16763930 errors:0 dropped:0 overruns:0 frame:0 TX packets:14512118 errors:0 dropped:0 overruns:0 carrier:0 RX packets:4591321 errors:0 dropped:0 overruns:0 frame:0 TX packets:4591321 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 5537 rx_discards: 0 rx_fw_discards: 0 RX packets:8302261 errors:0 dropped:5537 overruns:0 frame:0 TX packets:11197698 errors:0 dropped:0 overruns:0 carrier:0 RX packets:16873455 errors:0 dropped:0 overruns:0 frame:0 TX packets:14606980 errors:0 dropped:0 overruns:0 carrier:0 RX packets:4623614 errors:0 dropped:0 overruns:0 frame:0 TX packets:4623614 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 5840 rx_discards: 0 rx_fw_discards: 0 RX packets:8377853 errors:0 dropped:5840 overruns:0 frame:0 TX packets:11305200 errors:0 dropped:0 overruns:0 carrier:0 RX packets:17009128 errors:0 dropped:0 overruns:0 frame:0 TX packets:14726534 errors:0 dropped:0 overruns:0 carrier:0 RX packets:4658835 errors:0 dropped:0 overruns:0 frame:0 TX packets:4658835 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 5840 rx_discards: 0 rx_fw_discards: 0 RX packets:8433314 errors:0 dropped:5840 overruns:0 frame:0 TX packets:11378551 errors:0 dropped:0 overruns:0 carrier:0 RX packets:17113454 errors:0 dropped:0 overruns:0 frame:0 TX packets:14816876 errors:0 dropped:0 overruns:0 carrier:0 RX packets:4697507 errors:0 dropped:0 overruns:0 frame:0 TX packets:4697507 errors:0 dropped:0 overruns:0 carrier:0 Maybe those dropped packets in eth0 is something between client and server? Regards
It looks like 'rx_fw_discards' number matches the number of 'dropped' in ifconfig. This is what I was expecting. When the firmware is dropping the frames it is often because large amounts of data are coming into the box and: - the driver wants to put more in the ring buffer than are available slots OR - interrupts are not being serviced fast enough and frames are being dropped My first suggestion would be to bump up the size of the ring buffer. First check the ring buffer setting with 'ethtool -g eth0' and pick a size larger than what is used right now (use ethtool -G eth0 rx [num] to set it). You could try setting it to the maximum allowed if you want, but the larger it is, the more memory will be consumed by each interface. If you are still seeing 'rx_fw_discards' then you can modify the coalesce settings since it seems likely that bursts of traffic might be causing frames to be dropped before the interrupt can even service them. You can view and change the coalesce settings with 'ethtool -c eth0' and 'ethcool -C eth0 [new values]' if needed. If you are willing, I would try with the latest driver and increase the rx ring buffer size to see if that helps.
(In reply to comment #7) Andy, I've tried to increase ring buffer and I've got different behavior. In bnx2 1.6.9 does not work: [18:48:13] root@mail-fe01(beleterro):~# ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 Current hardware settings: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 [18:48:17] root@mail-fe01(beleterro):~# ethtool -g eth1 Ring parameters for eth1: Pre-set maximums: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 Current hardware settings: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 [18:48:44] root@mail-fe01(beleterro):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; ifconfig | grep dropp; echo "-----------"; sleep 30; done rx_discards: 0 rx_fw_discards: 11034 rx_discards: 0 rx_fw_discards: 0 RX packets:6057474 errors:0 dropped:11034 overruns:0 frame:0 TX packets:7862766 errors:0 dropped:0 overruns:0 carrier:0 RX packets:9741815 errors:0 dropped:0 overruns:0 frame:0 TX packets:8255022 errors:0 dropped:0 overruns:0 carrier:0 RX packets:1942205347 errors:0 dropped:0 overruns:0 frame:0 TX packets:1942205347 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 11946 rx_discards: 0 rx_fw_discards: 0 RX packets:6312287 errors:0 dropped:11946 overruns:0 frame:0 TX packets:8190477 errors:0 dropped:0 overruns:0 carrier:0 RX packets:10070560 errors:0 dropped:0 overruns:0 frame:0 TX packets:8532193 errors:0 dropped:0 overruns:0 carrier:0 RX packets:1942287609 errors:0 dropped:0 overruns:0 frame:0 TX packets:1942287609 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 12012 rx_discards: 0 rx_fw_discards: 0 RX packets:6548230 errors:0 dropped:12012 overruns:0 frame:0 TX packets:8484342 errors:0 dropped:0 overruns:0 carrier:0 RX packets:10421532 errors:0 dropped:0 overruns:0 frame:0 TX packets:8832216 errors:0 dropped:0 overruns:0 carrier:0 RX packets:1942374543 errors:0 dropped:0 overruns:0 frame:0 TX packets:1942374543 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 12657 rx_discards: 0 rx_fw_discards: 0 RX packets:6794666 errors:0 dropped:12657 overruns:0 frame:0 TX packets:8826395 errors:0 dropped:0 overruns:0 carrier:0 RX packets:10825648 errors:0 dropped:0 overruns:0 frame:0 TX packets:9167128 errors:0 dropped:0 overruns:0 carrier:0 RX packets:1942466484 errors:0 dropped:0 overruns:0 frame:0 TX packets:1942466484 errors:0 dropped:0 overruns:0 carrier:0 In bnx2 1.7.9 + your netdump patches works fine until reset nic: [18:54:35] root@mail-fe05(temora):~# ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 1020 RX Mini: 0 RX Jumbo: 4080 TX: 255 Current hardware settings: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 [18:54:38] root@mail-fe05(temora):~# ethtool -g eth1 Ring parameters for eth1: Pre-set maximums: RX: 1020 RX Mini: 0 RX Jumbo: 4080 TX: 255 Current hardware settings: RX: 1020 RX Mini: 0 RX Jumbo: 0 TX: 255 [18:54:39] root@mail-fe05(temora):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; ifconfig | grep dropp; echo "-----------"; sleep 30; done rx_discards: 0 rx_fw_discards: 0 rx_discards: 0 rx_fw_discards: 0 RX packets:1180452 errors:0 dropped:0 overruns:0 frame:0 TX packets:1614812 errors:0 dropped:0 overruns:0 carrier:0 RX packets:10638187 errors:0 dropped:0 overruns:0 frame:0 TX packets:9327734 errors:0 dropped:0 overruns:0 carrier:0 RX packets:3015973 errors:0 dropped:0 overruns:0 frame:0 TX packets:3015973 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 0 rx_discards: 0 rx_fw_discards: 0 RX packets:1402519 errors:0 dropped:0 overruns:0 frame:0 TX packets:1904640 errors:0 dropped:0 overruns:0 carrier:0 RX packets:11024476 errors:0 dropped:0 overruns:0 frame:0 TX packets:9667036 errors:0 dropped:0 overruns:0 carrier:0 RX packets:3129453 errors:0 dropped:0 overruns:0 frame:0 TX packets:3129453 errors:0 dropped:0 overruns:0 carrier:0 ----------- rx_discards: 0 rx_fw_discards: 0 rx_discards: 0 rx_fw_discards: 0 RX packets:1623194 errors:0 dropped:0 overruns:0 frame:0 TX packets:2209672 errors:0 dropped:0 overruns:0 carrier:0 RX packets:11390446 errors:0 dropped:0 overruns:0 frame:0 TX packets:9988863 errors:0 dropped:0 overruns:0 carrier:0 RX packets:3238349 errors:0 dropped:0 overruns:0 frame:0 TX packets:3238349 errors:0 dropped:0 overruns:0 carrier:0 After +-15min without any error, nic reseting, see below: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex NETDEV WATCHDOG: eth0: transmit timed out bnx2: eth0 NIC Copper Link is Down bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex * Coalesce default: [18:59:48] root@mail-fe05(temora):~# ethtool -c eth0 Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 999936 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 18 rx-frames: 6 rx-usecs-irq: 18 rx-frames-irq: 6 tx-usecs: 80 tx-frames: 20 tx-usecs-irq: 80 tx-frames-irq: 20 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 What do you suggest to change? Regards
Marcus, I'll look at the transmit timeouts try to figure out why it's not coming back.
Andy, another point is: I have other servers with Intel NICs and works fine with same traffic. Why bnx2 driver is not optimized for high traffic like e1000? Why ring buffer is too low? Regards
(In reply to comment #10) > Andy, another point is: > > I have other servers with Intel NICs and works fine with same traffic. Why bnx2 > driver is not optimized for high traffic like e1000? Why ring buffer is too > low? > > Regards Many of the kernel developers (myself included) feel like we should not waste kernel memory for things like driver ring buffers when many users will never need them. That is the main reason why the number of bnx2 ring buffer entries is so small.
(In reply to comment #8) > > After +-15min without any error, nic reseting, see below: > > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex > NETDEV WATCHDOG: eth0: transmit timed out > bnx2: eth0 NIC Copper Link is Down > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > NETDEV WATCHDOG: eth0: transmit timed out > bnx2: eth0 NIC Copper Link is Down > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > NETDEV WATCHDOG: eth0: transmit timed out > bnx2: eth0 NIC Copper Link is Down > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > NETDEV WATCHDOG: eth0: transmit timed out > bnx2: eth0 NIC Copper Link is Down > bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex > Marcus, are there any logs before these messages that look different? I find it strange that you are suddenly getting tx timeouts. Does this box transmit a lot of frames as well as receive them? Specifically I wonder if you see any messages like this: BUG! Tx ring full when queue awake! Any other messages are obviously helpful too. I don't like to blindly recommend increasing the tx ring buffers too, but it might be worth a try if this box often sends as much data as it receives.
Adding upstream maintainer in case Michael has any thoughts on this.
(In reply to comment #12) > Marcus, are there any logs before these messages that look different? No. Only these. Before that only boot messages. > I find > it strange that you are suddenly getting tx timeouts. Does this box transmit a > lot of frames as well as receive them? Specifically I wonder if you see any > messages like this: > > BUG! Tx ring full when queue awake! No. I've never see this before neither on my servers. > > Any other messages are obviously helpful too. > > I don't like to blindly recommend increasing the tx ring buffers too, but it > might be worth a try if this box often sends as much data as it receives. Those servers are mail servers. IMAP/POP/SMTP mounting via NFS. Nothing special. eth0 (user interface) usually has 80Mb/s OUT and 30MB/s IN and eth1 (nfs interface) has 100Mb/s IN and 15Mb/s OUT. No one log additional. Another idea?
Increasing the rx ring size to 1020 usually will be enough to prevent or reduce the number of dropped packets. But sometimes, 1020 may not be big enough still. DaveM would not allow me to increase the max beyond 1020 several years ago. I don't know why some versions will continue to drop with ring size 1020, and some versions will not drop anymore. Is the traffic pattern the same when trying different versions? If it continues to drop, we can experiment by increasing the ring beyond 1020. Just change the MAX_RX_RINGS from 4 to 8, and MAX_RX_PG_RINGS from 16 to 32 in bnx2.h. I don't know why we get transmit timeouts. We haven't seen those for a long time in our lab. One possibility is that the bnx2 NIC is receiving a ton of flow control packets, preventing it from sending out any packets and causing the timeout. But this happens very rarely.
Guys, I tested some scenarios and results are: kernel 2.6.9-78.0.1 bnx2 1.6.9 with ring size 1020: dropped packets often. kernel 2.6.9-78.0.13 bnx2 1.7.9-1 without ring size 1020: dropped packets often. kernel 2.6.9-78.0.13 bnx2 1.7.9-1 with ring size 1020: worked fine. kernel 2.6.9-78.0.13 bnx2 1.7.9-1 with ring size 1020 and netdump patches[1]: does not dropped packets but transmited timed out [1]: http://people.redhat.com/agospoda/rhel4/0069-bnx2-fixup-poll_controller-routine.patch http://people.redhat.com/agospoda/rhel4/0072-bnx2-enable-netdump-again.patch http://people.redhat.com/agospoda/rhel4/0073-bnx2-fixup-needed-to-allow-netdump-operation-to-com.patch http://people.redhat.com/agospoda/rhel4/0074-bnx2-enable-netdump-again.patch I can test Michael tip but why from bnx2 1.5.11 worked much better than 1.[6-7].9? Does not dropped packets in NFS and in frontnet a number of dropped packets are low. About transmit timed out seem something in netdump patches, isn't Andy? Regards
Marcus, thank you so much for the detailed testing and analysis! I will take a look at the netdump patches and see if I can understand what might be happening.
(In reply to comment #16) > Guys, > > > kernel 2.6.9-78.0.13 bnx2 1.7.9-1 with ring size 1020: worked fine. > kernel 2.6.9-78.0.13 bnx2 1.7.9-1 with ring size 1020 and netdump patches[1]: > does not dropped packets but transmited timed out > Since adding these patches caused problems, I wanted to break-down which ones could be the problem with a short description of each: > [1]: > http://people.redhat.com/agospoda/rhel4/0069-bnx2-fixup-poll_controller-routine.patch This patch was only used when doing netconsole or netdump, so I do not think it is effecting you. > http://people.redhat.com/agospoda/rhel4/0072-bnx2-enable-netdump-again.patch This patch was the first one that could be causing problems. It was the first attempt to poll each dummy_netdev contained in each napi instance individually. The dummy_netdevs were added so we could do msi-x, but we needed to unify them under the regular netdev and use that for polling so netdump would work. > http://people.redhat.com/agospoda/rhel4/0073-bnx2-fixup-needed-to-allow-netdump-operation-to-com.patch First attempt to fixup netdump problems found with previous patch. > http://people.redhat.com/agospoda/rhel4/0074-bnx2-enable-netdump-again.patch Final changes that seemed to make netdump behave as I expected. Marcus, can you paste the contents of /proc/interrupts when running any version of 1.7.9-1? I presume you are using MSI?
I was briefly able to reproduce ONE TX timeout on my system, so I think that's progress. I'm not exactly sure how I did it, but I ran netperf for a while an when it stopped working I started pinging and doing all sorts of other network tests but I finally made it die. I'm not sure if I can reproduce it again, but I've made a small change (part of a patch from upstream), and I'm going to let it run while I sleep. diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c index 22580ab..364294d 100644 --- a/drivers/net/bnx2.c +++ b/drivers/net/bnx2.c @@ -3195,14 +3195,15 @@ static int bnx2_poll(struct net_device *dev, int *budget) work_done = bnx2_poll_work(bp, bnapi, work_done, *budget); - if (unlikely(work_done >= *budget)) - break; - /* bnapi->last_status_idx is used below to tell the hw how * much work has been processed, so we must read it before * checking for more work. */ bnapi->last_status_idx = sblk->status_idx; + + if (unlikely(work_done >= *budget)) + break; + rmb(); if (likely(!bnx2_has_work(bnapi))) { if (likely(bp->flags & BNX2_FLAG_USING_MSI_OR_MSIX)) {
That patch didn't do much -- I just managed to hang the network interface again. I'll do more testing and inspection tomorrow.
(In reply to comment #18) > Marcus, can you paste the contents of /proc/interrupts when running any version > of 1.7.9-1? I presume you are using MSI? root@mail-fe05(temora):~# cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 0: 285342 282634 125053 122559 2506162 2506190 2506205 2513340 IO-APIC-edge timer 1: 0 0 0 0 0 0 0 9 IO-APIC-edge i8042 8: 6 8 5 3 59 55 56 63 IO-APIC-edge rtc 9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi 14: 0 1 0 6 0 408 60 412 IO-APIC-edge ide0 66: 5977 0 0 0 1914233 98700 1809029 94163 PCI-MSI-X cciss0 169: 0 0 0 0 0 0 1 38 IO-APIC-level ohci_hcd 177: 0 0 0 0 0 0 0 17 IO-APIC-level ehci_hcd 193: 115293886 0 0 1 1 0 0 5763 IO-APIC-level eth1 201: 0 0 0 0 0 1 0 78 IO-APIC-level uhci_hcd 209: 0 0 121990831 0 0 0 0 197 IO-APIC-level eth0 NMI: 0 0 0 0 0 0 0 0 LOC: 10847365 10847365 10847364 10847363 10847361 10847361 10847360 10847359 ERR: 1 MIS: 0 No. I'm not using MSI. My nics appear does not support MSI. Regards
Ok, I discovered what is wrong. I should have a patch for you to test in a few minutes.
Created attachment 331629 [details] bnx2-fixup-problems-with-netpoll-implementation.patch I tested this patch and everything seems to work well. Netdump still works and I as able to run netperf for about 2 hours without any issues. Please apply it on top of the other patches you have already applied.
My test kernels have been updated to include a patch for this bugzilla. http://people.redhat.com/agospoda/#rhel4 Please test them and report back your results. Without immediate feedback there is a good chance this or any other fix for this driver will not be included in the upcoming update.
(In reply to comment #23) > Created an attachment (id=331629) [details] > bnx2-fixup-problems-with-netpoll-implementation.patch > > I tested this patch and everything seems to work well. Netdump still works and > I as able to run netperf for about 2 hours without any issues. Please apply it > on top of the other patches you have already applied. Andy, I tested and works fine for at least 3 hours. But I tested with high ring size. I'll test with ring size default now. rx_discards: 0 rx_fw_discards: 1871 rx_discards: 0 rx_fw_discards: 0 RX packets:103321728 errors:0 dropped:1871 overruns:0 frame:0 TX packets:140809181 errors:0 dropped:0 overruns:0 carrier:0 RX packets:174826927 errors:0 dropped:0 overruns:0 frame:0 TX packets:152450304 errors:0 dropped:0 overruns:0 carrier:0 RX packets:47868681 errors:0 dropped:0 overruns:0 frame:0 TX packets:47868681 errors:0 dropped:0 overruns:0 carrier:0 Regards
Andy, I tested again without ring changes... works better than 1.7.9 without your patches. Now eth1 interface does not dropped any packet and eth0 (users interface) dropped some, but less than before. Now with your patches, works fine like 1.5.11. # ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc rx_discards: 0 rx_fw_discards: 30871 rx_discards: 0 rx_fw_discards: 0 # ifconfig | grep dropp RX packets:101959202 errors:0 dropped:30871 overruns:0 frame:0 TX packets:138883699 errors:0 dropped:0 overruns:0 carrier:0 RX packets:175810912 errors:0 dropped:0 overruns:0 frame:0 TX packets:153077867 errors:0 dropped:0 overruns:0 carrier:0 RX packets:49959776 errors:0 dropped:0 overruns:0 frame:0 TX packets:49959776 errors:0 dropped:0 overruns:0 carrier:0 # dmesg | tail -n 3 ip_conntrack version 2.1 (32768 buckets, 262144 max) - 348 bytes per conntrack bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex # uptime 17:47:59 up 3:10, 1 user, load average: 35.79, 34.05, 33.77
Created attachment 331879 [details] bnx2-revert-multiqueue-support-and-fix-netdump.patch Marcus, I took at look at those huge patches I did, and decided I would rather not support multiqueue receive on RHEL4 than risk all that change. I realize you have done quite a bit of testing for me, but if you can test one more patch I would really appreciate it. This one can be applied directly on top of the -80 kernel. No other patches from my test kernels are needed. Thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
(In reply to comment #28) > Created an attachment (id=331879) [details] > bnx2-revert-multiqueue-support-and-fix-netdump.patch > > Marcus, I took at look at those huge patches I did, and decided I would rather > not support multiqueue receive on RHEL4 than risk all that change. I realize > you have done quite a bit of testing for me, but if you can test one more patch > I would really appreciate it. > > This one can be applied directly on top of the -80 kernel. No other patches > from my test kernels are needed. Thanks! Andy, first try does not boot. I'll verify crash dump and sent to you. Regards
Marcus, sorry to hear that. I will try and reproduce that here too.
Info from crash: please wait... (gathering module symbol data) WARNING: cannot access vmalloc'd module memory KERNEL: /usr/lib/debug/lib/modules/2.6.9-81.0.1.ELsmp/vmlinux DUMPFILE: vmcore-incomplete [PARTIAL DUMP] CPUS: 8 DATE: Mon Feb 16 13:25:37 2009 UPTIME: 00:03:50 LOAD AVERAGE: 0.20, 0.05, 0.01 TASKS: 88 NODENAME: temora.hst.terra.com.br RELEASE: 2.6.9-81.0.1.ELsmp VERSION: #1 SMP Sun Feb 15 00:20:41 UTC 2009 MACHINE: i686 (2612 Mhz) MEMORY: 18 GB PANIC: "Oops: 0002 [#1]" (check log for details) PID: 3584 COMMAND: "ip" TASK: f63e7930 [THREAD_INFO: f7acc000] CPU: 2 STATE: TASK_RUNNING (PANIC) crash> log [...] disk_dump: total blocks required: 4193920 (header 3 + bitmap 144 + memory 4193773) ip_tables: (C) 2000-2002 Netfilter core team ip_conntrack version 2.1 (32768 buckets, 262144 max) - 348 bytes per conntrack Unable to handle kernel NULL pointer dereference at virtual address 00000024 printing eip: f88f63ba *pde = 0a383001 Oops: 0002 [#1] SMP Modules linked in: iptable_filter ipt_REDIRECT iptable_nat ip_conntrack ip_tables ide_dump cciss_dump scsi_dump diskdump zlib_deflate xfs joydev dm_mirror dm_mod button battery ac ohci_hcd ehci_hcd uhci_hcd k8_ edac edac_mc bnx2 floppy ext3 jbd cciss sd_mod scsi_mod CPU: 2 EIP: 0060:[<f88f63ba>] Not tainted VLI EFLAGS: 00010297 (2.6.9-81.0.1.ELsmp) EIP is at bnx2_napi_enable+0x15/0x29 [bnx2] eax: 00000000 ebx: 00000000 ecx: f6919240 edx: f69192d0 esi: f6919240 edi: f6919000 ebp: 00000000 esp: f7acced4 ds: 007b es: 007b ss: 0068 Process ip (pid: 3584, threadinfo=f7acc000 task=f63e7930) Stack: f6919000 f88fcce6 f6919000 00000000 00001002 00000000 c0288515 f6919000 00001003 c0289988 00000000 ffffff9d f7accf38 00000000 c02c1f02 00000000 00000000 f6919000 00000000 bfe8f840 00008914 08041003 bfe8f8c4 00e76638 Call Trace: [<f88fcce6>] bnx2_open+0x52/0x154 [bnx2] [<c0288515>] dev_open+0x2e/0x6d [<c0289988>] dev_change_flags+0x48/0xed [<c02c1f02>] devinet_ioctl+0x2b2/0x61d [<c02c3aff>] inet_ioctl+0x79/0xa5 [<c0280801>] sock_ioctl+0x28c/0x2b4 [<c016db72>] sys_ioctl+0x227/0x269 [<c02ddb2f>] syscall_call+0x7/0xb Code: 5d 9e c7 eb d9 45 83 c3 28 3b ae 6c 04 00 00 7c cb 5b 5e 5f 5d c3 53 31 db 89 c1 3b 98 6c 04 00 00 7d 1a 8d 90 90 00 00 00 8b 02 <f0> 0f ba 70 24 05 43 83 c2 28 3b 99 6c 04 00 00 7c ec 5b c3 57 crash> bt PID: 3584 TASK: f63e7930 CPU: 2 COMMAND: "ip" #0 [f7accd84] die at c010604e #1 [f7accdb4] do_page_fault at c011bad4 #2 [f7acceec] dev_open at c0288513 #3 [f7accef8] dev_change_flags at c0289986 #4 [f7accf0c] devinet_ioctl at c02c1efd #5 [f7accf68] inet_ioctl at c02c3afa #6 [f7accf7c] sock_ioctl at c02807fe #7 [f7accf94] sys_ioctl at c016db6f #8 [f7accfc0] system_call at c02ddb28 EAX: 00000036 EBX: 00000003 ECX: 00008914 EDX: bfe8f840 DS: 007b ESI: 00000003 ES: 007b EDI: bfe8f840 SS: 007b ESP: bfe8f7e8 EBP: bfe8f958 CS: 0073 EIP: 00206834 ERR: 00000036 EFLAGS: 00000206 Anything else? Regards
Thanks, Marcus. I think I found this problem. Just testing with netperf and netdump with msi enabled and disabled and then I will post a new patch to obsolete the patch in comment #28.
Created attachment 332069 [details] bnx2-final-working-and-tested.patch This patch applied to 2.6.9-80 should work. I've been running netperf for about 30 minutes and I have been able to netdump with msi and legacy mode interrupts without problems. Thanks again for testing these patches, Marcus!
(In reply to comment #34) > Created an attachment (id=332069) [details] > bnx2-final-working-and-tested.patch Andy, this one works perfectly with high RX ring. Without touch RX ring, NFS interface and user interface dropped packets too often. With high RX ring, interface dropped packets when traffic are near 4x of a normal server, otherwise does not dropped any packet. ------ Without high RX ring -------- [19:10:43] root@mail-fe05(temora):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; ifconfig | grep dropp; sleep 10; done rx_discards: 0 rx_fw_discards: 1288 rx_discards: 0 rx_fw_discards: 229 RX packets:3458894 errors:0 dropped:1288 overruns:0 frame:0 TX packets:4670440 errors:0 dropped:0 overruns:0 carrier:0 RX packets:6672043 errors:0 dropped:229 overruns:0 frame:0 TX packets:5863899 errors:0 dropped:0 overruns:0 carrier:0 RX packets:2006154 errors:0 dropped:0 overruns:0 frame:0 TX packets:2006154 errors:0 dropped:0 overruns:0 carrier:0 rx_discards: 0 rx_fw_discards: 1288 rx_discards: 0 rx_fw_discards: 346 RX packets:3604345 errors:0 dropped:1288 overruns:0 frame:0 TX packets:4878953 errors:0 dropped:0 overruns:0 carrier:0 RX packets:6960446 errors:0 dropped:346 overruns:0 frame:0 TX packets:6113120 errors:0 dropped:0 overruns:0 carrier:0 RX packets:2074435 errors:0 dropped:0 overruns:0 frame:0 TX packets:2074435 errors:0 dropped:0 overruns:0 carrier:0 rx_discards: 0 rx_fw_discards: 1426 rx_discards: 0 rx_fw_discards: 352 RX packets:3762686 errors:0 dropped:1426 overruns:0 frame:0 TX packets:5104817 errors:0 dropped:0 overruns:0 carrier:0 RX packets:7215366 errors:0 dropped:352 overruns:0 frame:0 TX packets:6336175 errors:0 dropped:0 overruns:0 carrier:0 RX packets:2141348 errors:0 dropped:0 overruns:0 frame:0 TX packets:2141348 errors:0 dropped:0 overruns:0 carrier:0 ------ Without high RX ring -------- ------ With high RX ring -------- [19:24:48] root@mail-fe05(temora):~# while true; do ethtool -S eth0 | grep disc; ethtool -S eth1 | grep disc; ifconfig | grep dropp; sleep 30; done rx_discards: 0 rx_fw_discards: 649 rx_discards: 0 rx_fw_discards: 0 RX packets:12079733 errors:0 dropped:649 overruns:0 frame:0 TX packets:16469995 errors:0 dropped:0 overruns:0 carrier:0 RX packets:21344168 errors:0 dropped:0 overruns:0 frame:0 TX packets:18720164 errors:0 dropped:0 overruns:0 carrier:0 RX packets:7777040 errors:0 dropped:0 overruns:0 frame:0 TX packets:7777040 errors:0 dropped:0 overruns:0 carrier:0 rx_discards: 0 rx_fw_discards: 649 rx_discards: 0 rx_fw_discards: 0 RX packets:12500338 errors:0 dropped:649 overruns:0 frame:0 TX packets:17042594 errors:0 dropped:0 overruns:0 carrier:0 RX packets:22055818 errors:0 dropped:0 overruns:0 frame:0 TX packets:19346640 errors:0 dropped:0 overruns:0 carrier:0 RX packets:7979074 errors:0 dropped:0 overruns:0 frame:0 TX packets:7979074 errors:0 dropped:0 overruns:0 carrier:0 rx_discards: 0 rx_fw_discards: 649 rx_discards: 0 rx_fw_discards: 0 RX packets:12909651 errors:0 dropped:649 overruns:0 frame:0 TX packets:17587961 errors:0 dropped:0 overruns:0 carrier:0 RX packets:22724569 errors:0 dropped:0 overruns:0 frame:0 TX packets:19930921 errors:0 dropped:0 overruns:0 carrier:0 RX packets:8176045 errors:0 dropped:0 overruns:0 frame:0 TX packets:8176045 errors:0 dropped:0 overruns:0 carrier:0 ------ With high RX ring -------- If it's latest patch, it's possible bump RX ring? Otherwise default RX ring works worst than 1.5.11-rh bnx2 version. Regards
When the traffic that is dropping frames is running, does it burst a large amount of data over a short period of time or does it just sent a continuous amount of traffic at a high rate?
(In reply to comment #36) > When the traffic that is dropping frames is running, does it burst a large > amount of data over a short period of time or does it just sent a continuous > amount of traffic at a high rate? I don't know, but I'll collect some tcpdumps to verify. Regards
Andy, These samples are without ring changes and a half load of other servers. I collected every five seconds, see below some drops. eth0 RXbytes:10437437 RXpackets:19062 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:16397765 TXpackets:21448 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:23878763 RXpackets:30482 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:4360726 TXpackets:27378 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth0 RXbytes:9810484 RXpackets:23023 RXerrs:0 RXdrop:551 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:39043646 TXpackets:36548 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:38111763 RXpackets:42258 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:5396406 TXpackets:37600 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth0 RXbytes:8344257 RXpackets:23698 RXerrs:0 RXdrop:15 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:35800397 TXpackets:34272 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:37071245 RXpackets:41511 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:5262631 TXpackets:37639 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 ---------------- eth0 RXbytes:10403610 RXpackets:19183 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:20079846 TXpackets:23413 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:24871627 RXpackets:30783 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:4258572 TXpackets:27103 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth0 RXbytes:9026210 RXpackets:24906 RXerrs:0 RXdrop:197 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:53757271 TXpackets:45666 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:32971682 RXpackets:35901 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:4444185 TXpackets:30640 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 Something else? Regards
These sample are taken every one second eth0 RXbytes:1810340 RXpackets:3426 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:3036694 TXpackets:3936 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:3450924 RXpackets:4760 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:717496 TXpackets:4316 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth0 RXbytes:2194973 RXpackets:7155 RXerrs:0 RXdrop:193 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:23052541 TXpackets:17168 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:3356591 RXpackets:4336 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:616822 TXpackets:3887 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth0 RXbytes:1513523 RXpackets:3472 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:3394439 TXpackets:4311 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 eth1 RXbytes:5009266 RXpackets:6344 RXerrs:0 RXdrop:0 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:869859 TXpackets:5564 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0
Marcus, I'm not sure what to think about this. These numbers are the same as the rx_fw_discards ethtool stat, right? Was this a problem with 1.5.11 or 1.6.9?
(In reply to comment #40) > Marcus, I'm not sure what to think about this. These numbers are the same as > the rx_fw_discards ethtool stat, right? Yes. > > Was this a problem with 1.5.11 or 1.6.9? I'll check tomorrow. Regards
(In reply to comment #40) > Marcus, I'm not sure what to think about this. These numbers are the same as > the rx_fw_discards ethtool stat, right? > > Was this a problem with 1.5.11 or 1.6.9? It's happened in 1.5.11 too # ethtool -i eth0 driver: bnx2 version: 1.5.11-rh firmware-version: 1.9.6 bus-info: 0000:06:04.0 Collected with 10 seconds interval of netstat: eth0 RXbytes:17812447 RXpackets:37704 RXerrs:0 RXdrop:95 RXfifo:0 RXframe:0 RXcompressed:0 RXmulticast:0 TXbytes:62352098 TXpackets:56126 TXerrs:0 TXdrop:0 TXfifo:0 TXcolls:0 TXcarrier:0 TXcompressed:0 # ethtool -S eth0 NIC statistics: rx_bytes: 241903798 rx_error_bytes: 0 tx_bytes: 1069444900 tx_error_bytes: 0 rx_ucast_packets: 693609 rx_mcast_packets: 0 rx_bcast_packets: 59 tx_ucast_packets: 1034147 tx_mcast_packets: 0 tx_bcast_packets: 5 tx_mac_errors: 0 tx_carrier_errors: 0 rx_crc_errors: 0 rx_align_errors: 0 tx_single_collisions: 0 tx_multi_collisions: 0 tx_deferred: 0 tx_excess_collisions: 0 tx_late_collisions: 0 tx_total_collisions: 0 rx_fragments: 0 rx_jabbers: 0 rx_undersize_packets: 0 rx_oversize_packets: 0 rx_64_byte_packets: 423534 rx_65_to_127_byte_packets: 119297 rx_128_to_255_byte_packets: 8056 rx_256_to_511_byte_packets: 2227 rx_512_to_1023_byte_packets: 8902 rx_1024_to_1522_byte_packets: 131652 rx_1523_to_9022_byte_packets: 0 tx_64_byte_packets: 172773 tx_65_to_127_byte_packets: 103993 tx_128_to_255_byte_packets: 18100 tx_256_to_511_byte_packets: 33356 tx_512_to_1023_byte_packets: 36079 tx_1024_to_1522_byte_packets: 669851 tx_1523_to_9022_byte_packets: 0 rx_xon_frames: 0 rx_xoff_frames: 0 tx_xon_frames: 0 tx_xoff_frames: 0 rx_mac_ctrl_frames: 0 rx_filtered_packets: 1183 rx_discards: 0 rx_fw_discards: 95 I really don't know why dropped packets with low network usage... Regards
Even with low overall usage, if you have a burst of traffic (which is likely with a mailserver) you will need to make sure you can handle those bursts when they happen. I would recommend increasing your ring buffer and leaving it higher.
(In reply to comment #46) > Even with low overall usage, if you have a burst of traffic (which is likely > with a mailserver) you will need to make sure you can handle those bursts when > they happen. I would recommend increasing your ring buffer and leaving it > higher. It's OK for me. So, your last patch will be commited? He's work fine here. Thanks a lot.
Yes, the patch in comment #34 will be included in RHEL 4.8. Thank you for all of your work testing and debugging these patches.
*** Bug 488749 has been marked as a duplicate of this bug. ***
------- Comment From 2009-03-06 04:08 EDT------- (In reply to comment #36) > Hello IBM, > Please let me know if you would be able to test the kernel from: > http://people.redhat.com/agospoda/ yes, I tested the kernel-smp-2.6.9-81.EL.gtest.60.x86_64.rpm and it helped in taking a dump over network. With this kernel, netdump works on the ls21 machine. thanks Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by jkachuck issue 268900
Committed in 83.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
*** Bug 480693 has been marked as a duplicate of this bug. ***
Patch is in -89.EL kernel.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1024.html