Bug 119664

Summary: Hard lock with r8169 NIC module.
Product: [Fedora] Fedora Reporter: Alejandro Mota <mota>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED WORKSFORME QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: peterm, romieu
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-04-13 23:14:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 114963    

Description Alejandro Mota 2004-04-01 07:51:42 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6)
Gecko/20040211 Firefox/0.8

Description of problem:
A hard lock occurs when using the r8169 NIC module and high outbound
traffic exists. No lock is observed with other NICs or when all
traffic is inbound. Same behavior observed in kernel-2.6.4-1.298.

A patch was recently applied to this module:

  http://bugzilla.kernel.org/show_bug.cgi?id=2123

Perhaps this is the source of the problem?

Version-Release number of selected component (if applicable):
kernel-2.6.4-1.300

How reproducible:
Always

Steps to Reproduce:
1. Any process that triggers a high amount of outbound traffic will
trigger the problem. For instance:
2. cd /usr/src/
3. scp -r linux-2.6.4-1.300/ othermachine:/tmp


    

Actual Results:  Hard lock occurs shortly after the copying starts.

Expected Results:  Normal transfer of files.

Additional info:

From /var/log/messages:

Mar 31 22:49:38 xx kernel: r8169 Gigabit Ethernet driver 1.2 loaded
Mar 31 22:49:39 xx kernel: eth1: RealTek RTL8169 Gigabit Ethernet at
0x4284a800, 00:90:f5:27:01:e1, IRQ 217
Mar 31 22:49:39 xx kernel: eth1: Auto-negotiation Enabled.
Mar 31 22:49:39 xx kernel: eth1: 100Mbps Full-duplex operation.

Comment 1 Alejandro Mota 2004-04-02 10:07:20 UTC
Further testing shows that the problem happens on the SMP kernels only.
The single-processor kernels apparently are not affected by it.

Comment 2 Alejandro Mota 2004-04-13 23:14:54 UTC
Francois Romieu provided a patch that when applied to
kernel-source-2.6.5-1.319 solves the problem. The link to this patch is:
http://www.fr.zoreil.com/people/francois/misc/20040407-2.6.5-r8169.c-stable.patch

Comment 3 Fredrik Noring 2004-09-20 19:06:48 UTC
I too have a Realtek RTL-8169 and it bugs out every 2-4 days (Fedora 
Core 2, kernel 2.6.8-1.521). When it happens, I once saw the console
flooded with the message:

   eth0: Too much work on interrupt

The kernel appears to be locked up completely when this happens.
Network traffic has not been very high.

Should I open a new ticket for this?

Comment 4 Alejandro Mota 2004-09-20 19:23:03 UTC
I had this problem too on a Pentium 4 HT machine, running SMP kernels.
I solved it by adding the noapic option to the kernel at boot time.
Since I did this the problem stopped. I never experienced this bug
when running UP kernels.

Comment 5 Francois Romieu 2004-09-25 22:20:26 UTC
Please use the patch referred below to sync your kernel with a 
recent vanilla kernel. Amongst many things, napi could make a 
difference. If the symptoms do not disappear, consider opening 
a new ticket and Ccing me. 
 
Btw, assuming people are not hit by r8169 unrelated issues, the 
driver has already shown to be quite stable on real SMP systems. 
 
Patch available at: 
http://www.fr.zoreil.com/linux/kernel/2.6.x/2.6.8-1.521/r8169.c-2.6.8-1.521-to-2.6.9-rc2-dac.patch