Bug 448353

Summary: Problem with net card: D-Link DGE-550T Gigabit Ethernet Adapter
Product: Red Hat Enterprise Linux 4 Reporter: I. Piasecki <irekpias>
Component: kernelAssignee: Michal Schmidt <mschmidt>
Status: CLOSED WONTFIX QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: low    
Version: 4.6CC: vgoyal
Target Milestone: rc   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-06-20 16:04:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description I. Piasecki 2008-05-26 07:04:20 UTC
Description of problem:
I have HP server ML110 and PCI-X net card: D-Link DGE-550T Gigabit Ethernet
Adapter with is with module dl2k handle. After inserting module i see:

D-Link DL2000-based linux driver v1.17a 2002/10/04
ACPI: PCI Interrupt 0000:02:02.0[A] -> GSI 29 (level, low) -> IRQ 217
divert: allocating divert_blk for eth1
eth1: D-Link DGE-550T Gigabit Ethernet Adapter, 00:13:46:6d:91:95, IRQ 217
tx_coalesce:    16 packets
rx_coalesce:    10 packets
rx_timeout:     128000 ns
Badness in mii_wait_link at drivers/net/dl2k.c:1500
 [<e0a747b2>] mii_wait_link+0x65/0x7b [dl2k]
 [<e0a73b60>] rio_error+0x21/0x193 [dl2k]
 [<e0a73200>] rio_interrupt+0x7f/0xac [dl2k]
 [<c0107f00>] handle_IRQ_event+0x25/0x4f
 [<c01088ce>] do_IRQ+0x18a/0x2bf
 =======================
 [<c031e99c>] common_interrupt+0x18/0x20
 [<e0a72b11>] rio_open+0x20e/0x216 [dl2k]
 [<c02c0b15>] dev_open+0x2d/0x6b
 [<c02c237d>] dev_change_flags+0x48/0xed
 [<c02fec03>] devinet_ioctl+0x2b2/0x61d
 [<c0300786>] inet_ioctl+0x79/0xa5
 [<c02b83b3>] sock_ioctl+0x2dd/0x38b
 [<c0181b85>] sys_ioctl+0x297/0x336
 [<c031dfc3>] syscall_call+0x7/0xb
eth1: Link off

But net is working with this card, but today i see after dmesg:

eth1: Tx timed out (0000), is buffer full?
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Tx timed out (0000), is buffer full?
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Tx timed out (0000), is buffer full?
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Tx timed out (0000), is buffer full?
NETDEV WATCHDOG: eth1: transmit timed out
eth1: Tx timed out (0000), is buffer full?

Network is unreachable.

I removed module dl2k and inserted again and net is working again.

Version-Release number of selected component (if applicable):
Kernel version: 2.6.9-67.0.15.EL on Centos 4.6

How reproducible:
Always when i inserted module for this card.

Steps to Reproduce:
1.
2.
3.
  
Actual results:
Always strange errors after inserting module dl2k

Expected results:
Working whithout problems.

Additional info:
Yes, it is CentOS 4.6 box, not native redhat 4.6 but ...

Comment 1 Michal Schmidt 2008-06-18 15:16:07 UTC
Thanks for the report. Would you be willing to test it with a current Fedora
LiveCD? The issue may have been fixed upstream already and that would make it
easier.

Comment 2 I. Piasecki 2008-06-18 20:58:44 UTC
Sorry, i added another card on PCI BUS - it have realtek chipset handle by
kernel module: 8139too and works whithout problems. I cannot test it with Fedora
LiveCD, cause this is production server with many services running non-stop and
i can't make any experiments with this. But problem still exists - only this
card isn't in use any more, sadly, even it is inserted in our server.

Comment 3 Michal Schmidt 2009-05-22 15:06:47 UTC
The warning in mii_wait_link() is printed because it calls mdelay() from an interrupt handler. This is considered bad, because interrupt handlers in should be in general as quick as possible. But it is not a serious problem and can be ignored. I checked the current upstream driver and it does the same thing. The only difference is that no warning is printed in upstream mdelay() implementation.

The actual issue (the Tx timeout) is not related to the badness warning. The driver needs the error handling improved in the TX timeout case. It should reset the card (reloading the module does it too, that's why you got connectivity back).

I can improve the error handling, but first we need to be able to reproduce the problem.

Comment 4 I. Piasecki 2009-05-23 11:11:09 UTC
Can i help i any way ? This is production server, and i can make experiments only after work (without employee). I have installed the latest kernel for centos 4.7 and problem persists - errors are printed, but, as i wrote before, this card isn't in use, it is only in PCI-X port of our server.

You must use this card in redhat 4.7 :) to reproduce this error.

Regards,
I.Piasecki

Comment 5 Michal Schmidt 2009-05-23 17:48:33 UTC
(In reply to comment #4)
> problem persists - errors are printed

Which errors? As I tried to explain, there are two separate issues with very different severity. The Badness backtrace can be safely ignored. The Tx timeout is more serious. Can you reproduce the Tx timeout?

Comment 6 I. Piasecki 2009-05-23 18:56:02 UTC
Errors - Badness in mii_wait_link at drivers/net/dl2k.c:1500 - this message is still printed

Can i reproduce Tx timeout? I suppose - i can. I must only plug in ethernet wire to this d-link card and use it in normal way. Then i must wait some time - maybe 1 day, maybe 5 days - i exactly don't know, when i will see again problem whit TX timeout: 

eth1: Tx timed out (0000), is buffer full?

I don't know, when this ocurrs and under which conditions.

I must say, i don't tested this card with the latest kernel 2.6.9-78.0.22 , which i have installed on this machine. In monday may 25 i try to use this card again. I will post about results.

Comment 7 Jiri Pallich 2012-06-20 16:04:31 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.