Bug 460747 - r8169 network driver broken on some systems
Summary: r8169 network driver broken on some systems
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 10
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 468560 470408 474761 (view as bug list)
Depends On:
Blocks: F11Target
TreeView+ depends on / blocked
 
Reported: 2008-08-30 21:18 UTC by Jerry Williams
Modified: 2013-01-10 04:47 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 06:20:54 UTC
Type: ---
Embargoed:
josep.puigdemont: fedora_requires_release_note+


Attachments (Terms of Use)
stacktrace after kernel error (1.96 KB, text/plain)
2008-10-20 06:44 UTC, Josep
no flags Details
dmesg output after boot (26.79 KB, application/octet-stream)
2008-10-26 07:47 UTC, Josep
no flags Details
Kernel messages (13.36 KB, text/plain)
2008-10-26 08:03 UTC, Josep
no flags Details
Merge PHY init from Realtek's driver 6.008.00 (4.54 KB, patch)
2008-12-21 19:14 UTC, Francois Romieu
no flags Details | Diff
More PHY access hanges from Realtek's 6.008.00 driver (2.62 KB, patch)
2008-12-23 13:56 UTC, Francois Romieu
no flags Details | Diff
prevent late irq events during init (704 bytes, patch)
2009-03-31 22:04 UTC, Francois Romieu
no flags Details | Diff
linux kernel v2.6.30 r8169 driver (95.98 KB, text/plain)
2009-06-17 20:49 UTC, Francois Romieu
no flags Details
Makefile for out-of-tree build (131 bytes, application/octet-stream)
2009-06-17 21:18 UTC, Francois Romieu
no flags Details
complete dmesg (41.24 KB, text/plain)
2009-06-20 12:08 UTC, Manfred Knick
no flags Details

Description Jerry Williams 2008-08-30 21:18:11 UTC
Description of problem: When I boot the Fedora 10 Alpha boot.iso file in rescue mode and start network if I use dhcp it doesn't get an address.


Version-Release number of selected component (if applicable):
Kernel 2.6.27-0.166.rc0.git8.f10.i586

How reproducible: Always


Steps to Reproduce:
1. boot cd
2. select rescue mode
3. english
4. us
5. start networking
6. start eth0
7. select ipv4
8. select dhcp
9. skip mount disks
10. ifconfig eth0 shows no IP address.
  
Actual results: Doesn't get an IP address


Expected results: Should get an IP address


Additional info: I have 10/100 switch so I can't run at 1000.
Have speed issues with Fedora 9 as well.  Have dual boot system with Vista 64 and Fedora 9 x86_64. Some times network works and some times it doesn't with Fedora 9 2.6.25.14-108.fc9.x86_64 kernel.
It does seem to work with Fedora 10 Alpha when I give it an address.

Vista says I have RTHL8168C/8111C network card.  Vista I am setting speed & duplex to 100 full to make it work.
Try to set speed and duplex in Fedora 9 and it doesn't seem to take it.
Fedora 10 says RTL8111/8168B

Comment 1 Jerry Williams 2008-08-31 04:05:46 UTC
I did some more testing and "How reproducible: Always" isn't really true.
I did change my dhcp server to have a reserved IP and that seems to help.
So now I would say it happens about 50 percent of the time.
I did burn a DVD with Fedora 10 Alpha x86_64 and I didn't see the problem.

I decided to install the Fedora 10 kernel and it works better than the Fedora 9 one. 
I installed kernel-2.6.27-0.166.rc0.git8.fc10.x86_64

Comment 2 Josep 2008-09-06 09:18:22 UTC
I have a similar problem since F8 and after 2.6.24 kernels. Could you take a look at the following bugs: Bug #438046 (F8), Bug #444966 (F9), Bug #449094, and see if they are related.

In my case I never get the network to work unless, as root, I do:
$ rmmod r8169; modprobe r8169

For me the problem is present in F8, F9 and F10 rawhide.

Comment 3 Aaron Clark 2008-10-03 01:29:51 UTC
I was seeing this issue as well and had been sticking on the kernel-xen
packages instead while it persisted.  It appears to be fixed on the following
(recent) kernels:
2.6.26.5-28.fc8 (Fedora 8 64bit)
2.6.27-0.352.rc7.git1.fc10.x86_64 (Fedora 10 Beta LiveCD 64bit)

Comment 4 Josep 2008-10-05 17:16:25 UTC
This is still an issue on 2.6.27-0.391.rc8.git7.fc10 (32 bits kernel).

Comment 5 Erik P. Olsen 2008-10-05 17:43:59 UTC
Same here. A nasty problem which so far is a show stopper for F10.

Comment 6 Josep 2008-10-07 20:04:55 UTC
(In reply to comment #3)
> I was seeing this issue as well and had been sticking on the kernel-xen
> packages instead while it persisted.  It appears to be fixed on the following
> (recent) kernels:
> 2.6.26.5-28.fc8 (Fedora 8 64bit)
> 2.6.27-0.352.rc7.git1.fc10.x86_64 (Fedora 10 Beta LiveCD 64bit)

Just for the sake of testing I also tried the x86_64 live CD of F10 (beta), which includes Linux kernel 2.6.27-0.352.rc7.git1.fc10.x86_64 (as mentioned in comment #3), but in my case this doesn not fix the problem, contrary to what is mentioned above. The rmmod/modrpobe trick does work, though.
This is my Smolt profile:
http://www.smolts.org/client/show/pub_c348ef55-d532-4197-afac-be8e3690c35e


about comment #5: has this really been marked as a showstopper for F10?

Comment 7 Erik P. Olsen 2008-10-07 21:16:37 UTC
(In reply to comment #6)
> (In reply to comment #3)
> > I was seeing this issue as well and had been sticking on the kernel-xen
> > packages instead while it persisted.  It appears to be fixed on the following
> > (recent) kernels:
> > 2.6.26.5-28.fc8 (Fedora 8 64bit)
> > 2.6.27-0.352.rc7.git1.fc10.x86_64 (Fedora 10 Beta LiveCD 64bit)
> 
> Just for the sake of testing I also tried the x86_64 live CD of F10 (beta),
> which includes Linux kernel 2.6.27-0.352.rc7.git1.fc10.x86_64 (as mentioned in
> comment #3), but in my case this doesn not fix the problem, contrary to what is
> mentioned above. The rmmod/modrpobe trick does work, though.
> This is my Smolt profile:
> http://www.smolts.org/client/show/pub_c348ef55-d532-4197-afac-be8e3690c35e
> 
> 
> about comment #5: has this really been marked as a showstopper for F10?

For me it is a showstopper. Maybe it's just me, but if I can't prepare F10 installation using rawhide I wont be able to install F10 in due time. I'll probably have to wait for F11.

Comment 8 Adam Pribyl 2008-10-19 16:50:45 UTC
Is this really only a problem with getting IP from DHCP or is this possible caused by bug #438046?

Comment 9 Josep 2008-10-20 06:44:54 UTC
Created attachment 320848 [details]
stacktrace after kernel error

I think this is not just a DHCP problem, as I don't get the network to work either even when configuring a static IP address. When I did that, and pinged another computer on the same network, after a while I got a kernel error (see attached file). This is kernel 2.6.27.2-23.rc1.fc10.i686.

Notice that everything works OK after I manually do "rmmod r8169; modprobe r8169".

Comment 10 Chuck Ebbert 2008-10-20 18:38:54 UTC
An updated r8169 driver taken from 2.6.28-rc is in 2.6.27.3-32.rc1 and later kernels.

Comment 11 Josep 2008-10-20 19:40:33 UTC
I'm now running with rawhide kernel 2.6.27.3-27.rc1.fc10.i686, and for me the problem unfortunately persists.

Since I could not get an IP with DHCP, I decided to testa bit more by configuring the interface manually. It didn't work either, all pings showed "destination unreachable", but after a while the kernel got an error (backtrace was the same as in Comment #9, some addresses a bit different though).
What then happened is that I started getting pings back.

Could the kernel error have had the same effect as "rmmod r8169; modprobe r8169"?

Comment 12 Chuck Ebbert 2008-10-20 22:41:09 UTC
The updates from 2.6.28-rc are now in the Fedora 10 kernel. Please test a kernel version 2.6.27.3-32.rc1 or later.

Comment 13 Jerry Williams 2008-10-21 00:39:24 UTC
I tried Fedora 10 Snap 2 DVD x86_64 and it works.
But the box that you have to either use DHCP or set the network is so big that I can't see what I am really doing.
So I am kind of guessing that DHCP worked, since I couldn't really see what I was selecting.

Comment 14 Josep 2008-10-21 06:43:39 UTC
(In reply to comment #12)
> The updates from 2.6.28-rc are now in the Fedora 10 kernel. Please test a
> kernel version 2.6.27.3-32.rc1 or later.

Chuck, this is the last version I tested (see Comment #11). Unfortunately it didn't solve the problem on this HW:
http://www.smolts.org/client/show/pub_c348ef55-d532-4197-afac-be8e3690c35e

Comment 15 Chuck Ebbert 2008-10-21 17:29:36 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > The updates from 2.6.28-rc are now in the Fedora 10 kernel. Please test a
> > kernel version 2.6.27.3-32.rc1 or later.
> 
> Chuck, this is the last version I tested (see Comment #11). Unfortunately it
> didn't solve the problem on this HW:

You reported testing 2.6.27.3-27.rc1, not 2.6.27.3-32.rc1

Comment 16 Josep 2008-10-21 18:43:55 UTC
Oops, My bad, you're right.

Now I just upgraded rawhide, which installed kernel 2.6.27.3-30.rc1.fc10.i686.

It still presents problems here, and whenever I log in I get a kernel failure message pop up (the same stack trace as in comment #9, but all addresses changed).

Comment 17 Erik P. Olsen 2008-10-21 19:55:39 UTC
(In reply to comment #16)
> Oops, My bad, you're right.
> 
> Now I just upgraded rawhide, which installed kernel 2.6.27.3-30.rc1.fc10.i686.
> 
> It still presents problems here, and whenever I log in I get a kernel failure
> message pop up (the same stack trace as in comment #9, but all addresses
> changed).

I think he asked for kernel 2.6.27.3-32.rc1.fc10.i686 which is not in rawhide yet.

Comment 18 Chuck Ebbert 2008-10-23 00:53:12 UTC
-34 is in rawhide today, -39 will be in tomorrow

Comment 19 Josep 2008-10-23 06:49:11 UTC
I finally tried -34 (2.6.27.3-34.rc1.fc10.i686), same problem as before, except that I don't get the kernel error now.

I'll try -39 as soon as possible.

Don't know if this can serve as a clue, but I noticed that the interrupts for the eth0 are very numerous before I do "rmmod r8169; modprobe r8169"...

Comment 20 Josep 2008-10-23 18:44:42 UTC
Hi again,
I now tried 2.6.27.3-39.fc10.i686, the problem is still there, but I did nevertheless notice three things.

First is that udev renames eth0 to eth1, this is an extract of dmesg:
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:00:0b.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:00:0b.0: no PCI Express capability
eth0: RTL8110s at 0xf8926f00, 00:11:09:ce:00:11, XID 04000000 IRQ 16
[...few lines here...]
udev: renamed network interface eth0 to eth1
device-mapper: multipath: version 1.0.5 loaded


The second thing is that I manually assigned an IP to the interface (eth1), and started pinging another computer on the net, but never got a ping reply back, only the "Destination Host Unreachable" message.
On the other computer I ran tcpdump and could see the ping requests received and replies sent.
After a while of pinging, I got a popup indicating there had been a kernel failure. It showed the same stack trace as comment #9, except that the last call before "warn_slowpath" was "restore_nocheck_notrace":
 [<c042ba36>] warn_slowpath+0x4b/0x6c
 [<c0403cc3>] ? restore_nocheck_notrace+0x0/0xe
 [<c044179c>] ? getnstimeofday+0x3c/0xc9

After this error, I started get ping replies back.


The third thing is that with the network up I then tried to reload the kernel module with "rmmod r8169; modprobe r8169". After that NetworkManager normally detects that the device is there and requests an IP with DHCP, but this time it didn't notice the device and never requested an address. This worked after I restarted the NetworkManager service.

Comment 21 Josep 2008-10-25 21:09:14 UTC
Two more things.

First is that I realized that the MAC address for my device has been rewritten. Althought the first three bytes are the same, the other three have changed. Is that a known issue with r8169 driver?

The other thing is that I tried with the latest rawhide kernel (2.6.27.3-44.fc10.i686) with the same results as before, no luck :(

Comment 22 Francois Romieu 2008-10-25 21:39:09 UTC
Josep, can you attach some dmesg from boot to illustrate your point ?

Thanks in advance.

-- 
Ueimor

Comment 23 Chuck Ebbert 2008-10-26 05:35:35 UTC
Applied the commits suggested in bug #468360 to kernel 2.6.27.4-50.rc3.
(The previous patch has been reverted.)

Comment 24 Josep 2008-10-26 07:47:06 UTC
Created attachment 321542 [details]
dmesg output after boot

This is dmesg output after boot.

Not much can be seen here, except that udev changes device name from eth0 to eth1. This is, I guess, because a few kernels back the device's MAC addresses was rewritten. It must have been something in the driver because I don't recall doing it myself (wouldn't even know how to do it ;-))...
I mentioned that in Comment #21.

Comment 25 Josep 2008-10-26 08:03:34 UTC
Created attachment 321543 [details]
Kernel messages

The attached file is for the kernel messages with some comments I added indicating the actions I was taking:

1) The log starts just after boot. Can be seen that NetworkManager is unable to obtain an IP with DHCP.
2) I assign one IP manually, as well as a default gateway.
3) I start pinging a computer on my LAN, but I only get "Destination unreachable".
4) There is a kernel error, a watchdog fires (causing the device to restart maybe?)
5) I start getting ping replies back.
6) I tell NetworkManager to handle the device, and now it is able to get an IP with DHCP.


I don't know if this is relevant, but as I mentioned before, I noticed that the device issued a lot of interrupts in very little time (~50 million in just 3 minutes of uptime):

[josep@localhost ~]$ uptime; cat /proc/interrupts | grep eth
 08:23:42 up 3 min,  3 users,  load average: 0.83, 1.05, 0.47
 16:   49566997   IO-APIC-fasteoi   eth1

The situation stabilizes after the kernel error in point 4 above, very few interrupts are issued then.

Comment 26 David Cantrell 2008-10-27 07:35:40 UTC
*** Bug 468560 has been marked as a duplicate of this bug. ***

Comment 27 Dave Jones 2008-10-27 16:31:26 UTC
kernel-2.6.27.4-51.fc10 has two r8169 patches which might fix this bug.

Comment 28 Josep 2008-10-27 18:47:04 UTC
Hi,
just tried 2.6.27.4-51.fc10.i686, unfortunately I have to report that it didn't solve the problem here. I could also reproduce the same behavior as in comment #25.

Comment 29 Josep 2008-10-28 19:14:48 UTC
Although the last tests have not gone very well, I think we can at least say that, to some degree, the driver does actually work. Prove of that is that by removing and inserting again the module brings the interface up, and it works normally and very well after that. So not everything is ruined.

The question that puzzles me is why the driver is not working from the start. And what are all those interrupts?
Maybe the problem is not in the driver itself (it does work after all), but in the way it is initialized at boot time?

I also wonder if the tests I conducted provide all the help and information you need for debugging this issue, or if they are just confusing ;-)
Do you think the information is enough/good? Do you want me to test or try something else?

Comment 30 Chuck Ebbert 2008-11-08 21:16:47 UTC
Another r8169 update is in 2.6.27.5-88. Please test that or a later one...

Comment 31 Josep 2008-11-10 21:45:22 UTC
2.6.27.5-92.fc10.i686 from koji doesn't fix the problem.

Comment 32 Jesse Keating 2008-11-10 22:23:38 UTC
A number of models have been fixed already, and the current reporter does have a workaround.  I'm going to move this over to F10Target unless we get more evidence that this is still a more wide spread issue.

Comment 33 Josep 2008-11-11 01:21:49 UTC
Now with the latest kernel from koji, 2.6.27.5-94.fc10.i686, I see this new message in dmesg at boot:

eth0: interrupt 0025 in poll


This is an extract of dmesg output (grep'ed for r8169 and eth0):

r8169 0000:00:0b.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:00:0b.0: no PCI Express capability
eth0: RTL8110s at 0xf88c6f00, 00:11:09:ce:a6:ca, XID 04000000 IRQ 16
r8169: eth0: link up
eth0: interrupt 0025 in poll
r8169: eth0: link up
eth0: no IPv6 routers present
r8169 0000:00:0b.0: PCI INT A disabled
r8169 0000:00:0b.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
r8169 0000:00:0b.0: no PCI Express capability
eth0: RTL8110s at 0xf88c6f00, 00:11:09:ce:a6:ca, XID 04000000 IRQ 16
r8169: eth0: link up
r8169: eth0: link up
eth0: no IPv6 routers present

Comment 34 A.J. Werkman 2008-11-11 08:48:11 UTC
I encounter this problem on a Asus M2A-VM board.

This is an dual boot system with windows. In my testing I found that the problem only appears after I had been running Windows. Rebooting and a harware reset do not solve the problem. Only after I have fysically disconneted the power cord waited a few seconds and reconnected the power, the NIC comes up and is able to get a lease through DHCP again.

As long as I do not boot into windows again I have no problem. After rebooting fedora the NIC comes up fine. But immediatly after I have booted windows, the problem is there again when I bring up fedora.

The "rmmod r8169; modprobe r8169" workaround mentioned here also makes the NIC usable again.

Comment 35 Josep 2008-11-11 09:47:12 UTC
I power cycled the NIC and the computer, disconnected the network cable etc, but didn't help.
This is also a dual boot but only with Fedora 7 and Rawhide. The NIC works fine in F7.
The motherboard is a "Micro-Star International Co., Ltd. K8T NEO 2 motherboard"

Comment 36 Kasper Pedersen 2008-11-11 21:37:31 UTC
Intel D945GCLF's RTL8102EL(10/100) seem to be affected by this too, to a lesser degree; About 20% of the times with 2.26.4-79. The success rate has improved since the F10 alphas, but does not start reliably.

Comment 37 Francois Romieu 2008-11-11 22:50:28 UTC
Kasper, have you checked 2.6.27.5-94 ?

The 8102 should perform better with a (late) 2.6.27 based kernel or a
post 2.6.28-rc2 one.

-- 
Ueimor

Comment 38 Kasper Pedersen 2008-11-18 19:46:38 UTC
(In reply to comment #37)
> Kasper, have you checked 2.6.27.5-94 ?

-94 and -113 i686 are perfect on D945GCLF, no failures in ~80 boots. I can't say for x86_64 since it won't boot (id 471098).

Comment 39 Josep 2008-11-18 22:03:56 UTC
Just tried -113 on i686 here, but no luck yet.
The XID line is:
eth0: RTL8110s at 0xf88d2f00, 00:11:09:ce:a6:ca, XID 04000000 IRQ 16

Comment 40 Michal Hlavinka 2008-11-19 12:03:47 UTC
I'm experiencing the same problem.

HW: Asus A6T - Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

using module r8169
no kernel parameter specified in grub
no extra module (nvidia,... etc) used 

Fedora is the only one system I've installed, so no vista or other system can cause suspend or anything else

connected to 10/100Mb switch

everything works fine with low traffic, if traffic is higher this problem occurs:
in the beginning network is working fine - ftp to NAS with about 9 MB/s, after a few seconds all traffic goes to zero. It show kernel oops has been send, in log is:
WARNING: at net/sched/sch_generic.c:219 ev_watchdog+0xfe/0x15d()
NETDEV WATCHDOG: eth0 (r8169): transmit timed out

Call Trace:
 <IRQ>  [<ffffffff81041623>] warn_slowpath+0x8c/0xb5
 [<ffffffff810310c2>] ? resched_task+0x52/0x8c      
 [<ffffffff813325a4>] ? _spin_unlock_irqrestore+0x27/0x3e
 [<ffffffff8103a5a6>] ? try_to_wake_up+0x26f/0x281       
...

this takes about 10 sec. After that traffic goes back to about 9 MB/s (for next 10-30 sec) and then it breaks again.

Every time it gets this "paralysis" new line appends in log:
r8169: eth0: link up
it looks like this:
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
...
(no "link down")

This is extremely annoying, so if I can help with debugging or anything else, please let me know.

Comment 41 Michal Hlavinka 2008-11-20 06:13:42 UTC
problem still exists with kernel-2.6.27.5-113.fc10.x86_64
WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xfe/0x15d()                                                                                  
NETDEV WATCHDOG: eth0 (r8169): transmit timed out                                                                                                  
Modules linked in: fuse rfcomm bridge stp bnep sco l2cap rfkill_input autofs4 sunrpc nf_conntrack_netbios_ns cpufreq_ondemand powernow_k8 freq_table xfs dm_multipath kvm_amd kvm uinput snd_hda_intel snd_seq_dummy arc4 snd_seq_oss snd_seq_midi_event snd_seq ecb snd_seq_device snd_pcm_oss crypto_blkcipher snd_mixer_oss snd_pcm snd_timer b43 rfkill snd_page_alloc mac80211 sdhci_pci firewire_ohci cfg80211 input_polldev snd_hwdep sdhci firewire_core r8169 mmc_core btusb snd crc_itu_t yenta_socket bluetooth dm9601 serio_raw soundcore i2c_nforce2 k8temp stkwebcam compat_ioctl32 rsrc_nonstatic pcspkr asus_laptop battery hwmon ssb usbnet mii videodev v4l1_compat i2c_core joydev video output ac wmi ata_generic pata_acpi pata_amd       
Pid: 0, comm: swapper Not tainted 2.6.27.5-113.fc10.x86_64 #1                                                                                      

Call Trace:
 <IRQ>  [<ffffffff81041623>] warn_slowpath+0x8c/0xb5
 [<ffffffff810597fa>] ? sched_clock_cpu+0x10f/0x120 
 [<ffffffff8105990b>] ? sched_clock_tick+0x8f/0x98  
 [<ffffffff8105a18b>] ? getnstimeofday+0x3a/0x96    
 [<ffffffff813323aa>] ? _spin_lock+0x9/0xc          
 [<ffffffff812b7833>] dev_watchdog+0xfe/0x15d       
 [<ffffffff813323aa>] ? _spin_lock+0x9/0xc          
 [<ffffffff81016683>] ? pit_next_event+0x3c/0x45    
 [<ffffffff812b7735>] ? dev_watchdog+0x0/0x15d
 [<ffffffff8104ad22>] run_timer_softirq+0x19c/0x222
 [<ffffffff8105a900>] ? update_wall_time+0x411/0x41c
 [<ffffffff81046b22>] __do_softirq+0x7e/0x10c
 [<ffffffff813325a4>] ? _spin_unlock_irqrestore+0x27/0x3e
 [<ffffffff81011bcc>] call_softirq+0x1c/0x28
 [<ffffffff81012dd2>] do_softirq+0x4d/0xb0
 [<ffffffff810466f7>] irq_exit+0x4e/0x9d
 [<ffffffff810130ee>] do_IRQ+0x147/0x169
 [<ffffffff81010933>] ret_from_intr+0x0/0x2e
 <EOI>  [<ffffffff8102570a>] ? native_safe_halt+0x6/0x8
 [<ffffffff810172cb>] ? need_resched+0x1e/0x28
 [<ffffffff810173b0>] ? default_idle+0x2a/0x4c
 [<ffffffff810174d2>] ? c1e_idle+0xf2/0x127
 [<ffffffff81335490>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b
 [<ffffffff8131f33d>] ? rest_init+0x61/0x63

---[ end trace 686e7f68fe28b470 ]---


in mail for root:

 --------------------- Kernel Begin ------------------------
 WARNING:  Kernel Errors Present
    ACPI Error (psloop-0136): F ...:  8 Time(s)
    asus-laptop: Error calling BSTS ...:  2 Time(s)
 ---------------------- Kernel End -------------------------
don't know if it can be somehow related


also, based on comment #9, I assume we all have troubles with r8169 because of 
"NETDEV WATCHDOG: eth0 (r8169): transmit timed" in everyone's logs. So, probably. bug #436841 is the same as this one (even if fedora versions are different)

Comment 42 Bug Zapper 2008-11-26 02:54:40 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 43 Mace Moneta 2008-11-26 03:07:12 UTC
I just experienced this several times, kernel-2.6.27.5-120.fc10.x86_64:

Nov 25 21:19:09 slayer kernel:------------[ cut here ]------------
Nov 25 21:19:09 slayer kernel:WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0xfe/0x15d()
Nov 25 21:19:09 slayer kernel:NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Nov 25 21:19:09 slayer kernel:Modules linked in: rfcomm fuse i915 drm bridge stp bnep sco netconsole configfs l2cap bluetooth w83627ehf hwmon_vid coretemp hwmon ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm firewire_ohci snd_timer snd_page_alloc firewire_core crc_itu_t snd_hwdep pata_it8213 i2c_i801 i2c_core r8169 snd soundcore zaurus cdc_ether usbnet mii usb_storage ata_generic pata_acpi [last unloaded: nf_nat]
Nov 25 21:19:09 slayer kernel:Pid: 0, comm: swapper Not tainted 2.6.27.5-120.fc10.x86_64 #1
Nov 25 21:19:09 slayer kernel:
Nov 25 21:19:09 slayer kernel:Call Trace:
Nov 25 21:19:09 slayer kernel: <IRQ>  [<ffffffff81041623>] warn_slowpath+0x8c/0xb5
Nov 25 21:19:09 slayer kernel: [<ffffffff81331fe4>] ? _spin_unlock_irqrestore+0x27/0x3e
Nov 25 21:19:09 slayer kernel: [<ffffffff810597fa>] ? sched_clock_cpu+0x10f/0x120
Nov 25 21:19:09 slayer kernel: [<ffffffff8105a18b>] ? getnstimeofday+0x3a/0x96
Nov 25 21:19:09 slayer kernel: [<ffffffff81331dea>] ? _spin_lock+0x9/0xc
Nov 25 21:19:09 slayer kernel: [<ffffffff812b7273>] dev_watchdog+0xfe/0x15d
Nov 25 21:19:09 slayer kernel: [<ffffffff812b7175>] ? dev_watchdog+0x0/0x15d
Nov 25 21:19:09 slayer kernel: [<ffffffff8104ad22>] run_timer_softirq+0x19c/0x222
Nov 25 21:19:09 slayer kernel: [<ffffffff81046b22>] __do_softirq+0x7e/0x10c
Nov 25 21:19:09 slayer kernel: [<ffffffff81011bcc>] call_softirq+0x1c/0x28
Nov 25 21:19:09 slayer kernel: [<ffffffff81012dd2>] do_softirq+0x4d/0xb0
Nov 25 21:19:09 slayer kernel: [<ffffffff810466f7>] irq_exit+0x4e/0x9d
Nov 25 21:19:09 slayer kernel: [<ffffffff810209fa>] smp_apic_timer_interrupt+0x8f/0xa8
Nov 25 21:19:09 slayer kernel: [<ffffffff810113d8>] apic_timer_interrupt+0x88/0x90
Nov 25 21:19:09 slayer kernel: <EOI>  [<ffffffff811bc6ca>] ? acpi_idle_enter_simple+0x175/0x1b4
Nov 25 21:19:09 slayer kernel: [<ffffffff811bc6c2>] ? acpi_idle_enter_simple+0x16d/0x1b4
Nov 25 21:19:09 slayer kernel: [<ffffffff81285aa3>] ? cpuidle_idle_call+0x95/0xc9
Nov 25 21:19:09 slayer kernel: [<ffffffff8100f279>] ? cpu_idle+0xb2/0x10b
Nov 25 21:19:09 slayer kernel: [<ffffffff8131ed7d>] ? rest_init+0x61/0x63
Nov 25 21:19:09 slayer kernel:
Nov 25 21:19:09 slayer kernel:---[ end trace 2f371b0bd9a80359 ]---
Nov 25 21:19:09 slayer kernel:r8169: eth0: link up
Nov 25 21:19:21 slayer kernel:r8169: eth0: link up
Nov 25 21:19:33 slayer kernel:r8169: eth0: link up
Nov 25 21:19:45 slayer kernel:r8169: eth0: link up

Comment 44 Mace Moneta 2008-11-28 18:12:20 UTC
Now on kernel 2.6.27.7-130.fc10.x86_64, when the problem occurred I tried to remove the r8169 module:

# lsmod | grep 81
usbnet                 23816  2 zaurus,cdc_ether
r8169                  40964  0 
mii                    13056  2 usbnet,r8169

# modprobe -r r8169 

Message from syslogd@slayer at Nov 28 12:42:39 ...
 kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Message from syslogd@slayer at Nov 28 12:42:39 ...
 kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1
...

For some reason, modprobe -r acted as if --wait had been specified.  There is no alias:

# which modprobe
/sbin/modprobe

The use count was reported as zero on lsmod, but modprobe reported it as 1, an obvious error.  I needed to reboot to get eth0 back, making this a more severe issue.

Comment 45 Chuck Ebbert 2008-12-09 21:06:04 UTC
*** Bug 470408 has been marked as a duplicate of this bug. ***

Comment 46 Chuck Ebbert 2008-12-09 21:06:40 UTC
*** Bug 474761 has been marked as a duplicate of this bug. ***

Comment 47 Chuck Ebbert 2008-12-09 21:10:31 UTC
Can we start collecting the PCI device IDs of the broken adapters? People who have hit this bug should run the command 'lspci -nn' and post just the line of output for the network adapter.

Comment 48 Mace Moneta 2008-12-09 21:23:14 UTC
02:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 02)

Comment 49 Francois Romieu 2008-12-09 21:46:06 UTC
cebbert :
[...]
> People who have hit this bug should run the command 'lspci -nn' and post
> just the line of output for the network adapter.

May I add two lines ?
1) dmesg | grep XID
2) mii-tool -v | grep product

lspci -nn may not be specific enough (see r8169.c::rtl_chip_info) and the
PHYs are not exactly the same among the r8169 family.

-- 
Ueimor

Comment 50 Mace Moneta 2008-12-09 21:52:26 UTC
OK, here you go:

$ dmesg | grep XID
eth0: RTL8168c/8111c at 0xffffc20000c5c000, 00:30:48:b0:96:f0, XID 3c4000c0 IRQ 17

$ sudo mii-tool -v | grep product
  product info: vendor 00:07:32, model 17 rev 2

Comment 51 Josep 2008-12-09 22:04:13 UTC
# mii-tool -v | grep product
  product info: vendor 00:07:32, model 17 rev 0

# dmesg | grep XID
eth0: RTL8110s at 0xf8830f00, 00:11:09:ce:a6:ca, XID 04000000 IRQ 16

# lspci -nn | grep Ethernet
00:0b.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)

Comment 52 An. N 2008-12-09 23:08:28 UTC
dmesg output:
eth0: RTL8168b/8111b at 0xf8870000, 00:18:f3:43:d8:a3, XID 38000000 IRQ 19

lspci output:
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01)

mii-tool output:
eth0: negotiated 100baseTx-FD, link ok
  product info: vendor 00:07:32, model 17 rev 2
  basic mode:   autonegotiation enabled
  basic status: autonegotiation complete, link ok
  capabilities: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD
  advertising:  100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD flow-control
  link partner: 100baseTx-FD 100baseTx-HD 10baseT-FD 10baseT-HD

Comment 53 Dale Ogilvie 2008-12-15 06:21:07 UTC
lspci -nn
00:0b.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)

dmesg | grep XID
eth0: RTL8110s at 0xf8822f00, 00:0c:76:56:54:57, XID 04000000 IRQ 16

mii-tool -v | grep product
product info: vendor 00:07:32, model 17 rev 0

motherboard is a MSI K8TNEO

Comment 54 Michal Hlavinka 2008-12-15 08:44:24 UTC
dmesg | grep XID
eth0: RTL8168b/8111b at 0xffffc2000030a000, 00:22:15:86:c2:45, XID 38000000 IRQ 11


mii-tool -v | grep product
product info: vendor 00:07:32, model 17 rev 2


lspci -nn | grep Ethernet
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01)



btw, If I use just nolapic parameter in grub, I get no oopses.

other suspicious thing:

for 8 blackouts I got 8 "r8169: eth0: link up" and also 
"ACPI Error (psloop-0136): F ...:  8 Time(s)" in mail for root (see my comment #41).

Comment 55 An. N 2008-12-15 22:41:21 UTC
Is anybody working on this issue? This bug is 4 months old and every time I boot I need to wait several minutes for the NIC to come up (until the kernel oops is generated). No network until then...

Comment 56 Dale Ogilvie 2008-12-16 09:34:19 UTC
Well, dunno what this means, but I just went to the manufacturer site, downloaded the latest source for r8169 (2008-10-21), built the module following the readme on my F10 2.6.27.7-134, copied the new module over the F10 one, rebooted.

Well, well. Networking works properly! No oops. Was this a good idea? Seems to work but I wonder what special Fedora sauce (other than the oops) I'm missing.

http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=4&Level=5&Conn=4&DownTypeID=3&GetDown=false&Downloads=true#5,7,8,10,982

Comment 57 An. N 2008-12-17 22:16:59 UTC
I built the module from realtek on my system and it didn't work with -134. It couldn't even find eth0.

So to recap, when my kernel boots, it freezes twice for 10 seconds (separate bugs filed), and then it can't even bring up the network card. No support for weeks. Way to go, Fedora...

Comment 58 Dale Ogilvie 2008-12-18 01:22:20 UTC
Sorry it doesn't work for you.

The module I linked to above is specific to my PCI 8169 card.

My file was r8169-6.008.00.tar.bz2, yours would presumably be r8168-8.009.00.tar.bz2

Did you try the r8168 module?

http://www.realtek.com.tw/downloads/downloadsView.aspx?Langid=1&PNid=13&PFid=5&Level=5&Conn=4&DownTypeID=3&GetDown=false#2

Comment 59 Josep 2008-12-20 11:31:45 UTC
I can also confirm that the issue disappears with the driver from the manufacturer. See comment #51 for more details on my hardware.

Comment 60 Francois Romieu 2008-12-21 19:14:20 UTC
Created attachment 327590 [details]
Merge PHY init from Realtek's driver 6.008.00

Josep :
[...]
> I can also confirm that the issue disappears with the driver from the
> manufacturer. See comment #51 for more details on my hardware.

Thanks for the report Josep.

Can you try the attached patch against mainline ?

Note to others: the patch will not improve the situation for the
8168 based chipsets.

-- 
Ueimor

Comment 61 Josep 2008-12-22 00:32:13 UTC
Hi François,

I applied the patch on the sources for 2.6.27.7-134.fc10.i686, but I still seem to have the same issue.
Reinserting the module does still work, though.


Just to make sure whether I build the module the right way, this is how I did it:
* I downloaded the source rpm for my kernel (yumdownloader --source kernel), and installed it (rpm -Uvh kernel-...).
* Then I copied the drivers/net/r8169.c file on another directory, and applied the patch provided in #comment 60,
* I build the module as explained here:
http://fedoraproject.org/wiki/Docs/CustomKernel#Building_Only_Kernel_Modules
(I had to create the Makefile)
* Copied the r8169.ko file to /lib/modules/$(uname -r)/kernel/drivers/net
* Finally I ran "depmod -a" and restarted

Comment 62 Francois Romieu 2008-12-23 13:56:49 UTC
Created attachment 327753 [details]
More PHY access hanges from Realtek's 6.008.00 driver

Thanks for testing Josep.

Can you try the attached patch on top of the previous one ?

-- 
Ueimor

Comment 63 Josep 2008-12-28 17:58:30 UTC
Hi François, I wasn't able to test the new patch until today.

With kernel 2.6.27.7-134.fc10.i686 it still doesn't work (reinserting the module does work, though).

Can I use the same set of patches to try with the new kernel 2.6.27.9-159.fc10?

Comment 64 Josep 2008-12-28 18:34:25 UTC
I just tried with 2.6.27.9-159.fc10.i686, with the same results as before.

Maybe you are interested in this line from dmesg:
r8169 0000:00:0b.0: PHY ID2 c910

Comment 65 Dale Ogilvie 2009-01-05 11:09:14 UTC
Hi François, I also tried a *modified* version of your patch without success. Based on my reading of the Realtek driver code I modified two lines in your patches:

In rtl8169_set_speed_xmii:

	if ((tp->mac_version >= RTL_GIGA_MAC_VER_02) &&
	    (tp->mac_version <= RTL_GIGA_MAC_VER_03)) {

and in rtl8169_xmii_reset_enable:

       mdio_write(ioaddr, MII_BMCR, val & 0xffff);

For my test I replaced these with:

        /* 8169 drv logic is <= RTL_GIGA_MAC_VER_02 || <= RTL_GIGA_MAC_VER_03 */
	if ((tp->mac_version == RTL_GIGA_MAC_VER_02) ||
	    (tp->mac_version == RTL_GIGA_MAC_VER_03)) {

and
       /* 8169 drv does not have & 0xffff */
       mdio_write(ioaddr, MII_BMCR, val);

r8169 driver from realtek still works fine with the 2.6.27.9-159.fc10.i686 kernel.

Comment 66 Pierre Ossman 2009-01-08 06:53:09 UTC
In case it is still interesting to map cards seeing this:

03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 01)

Comment 67 Andreas Mayer 2009-01-21 20:31:00 UTC
Have the same problem, network after boot works only when I do
$ rmmod r8169; modprobe r8169
and then select "cable" in the network manager.

Which hardware information do you need to make this a useful bug report?

Comment 68 Andreas Mayer 2009-01-24 13:11:10 UTC
Workaround:

Append

rmmod r8169
modprobe r8169

to your /etc/rc.local

Comment 69 Dale Ogilvie 2009-01-29 09:05:58 UTC
r81969 in 2.6.27.12-170.2.5.fc10.i686 still fails to load properly. No oops this time, but no network via dhcp either.

Driver from realtek still works well.

Any plan to fix this for F11?

Comment 70 Josep 2009-02-21 21:14:49 UTC
I can confirm this issue is still present in latest rawhide (F11) kernel (2.6.29-0.137.rc5.git4.fc11.i586).

Comment 71 Michel Lind 2009-03-05 19:09:40 UTC
On my Dell Mini 9:

lspci -nn:
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8101E PCI Express Fast Ethernet controller [10ec:8136] (rev 02)

dmesg | grep XID
eth0: RTL8101e at 0xe0066000, 00:21:70:df:3b:9f, XID 24a00000 IRQ 220

mii-tool -v | grep product
SIOCGMIIPHY on 'eth0' failed: Operation not permitted

The Fedora 11 alpha (i386) network boot image failed to initialize the card (DHCP time-out), likewise with the latest (2009-03-05) Rawhide boot.iso

Using the preinstalled Ubuntu (kernel 2.6.24-19-lpia), I'd managed to get a DHCP connection *once*, and after that, it's the same time-out.

Comment 72 Dale Ogilvie 2009-03-11 10:21:31 UTC
With kernel 2.6.27.19-170.2.35.fc10.i686, the workaround of replacing the r8169 kernel module with one built from the realtek sources no longer works. Only 'rmmod r8169, modprobe r8169' gives a working network.

Comment 73 Dale Ogilvie 2009-03-11 10:35:09 UTC
Booting into the previous kernel, 2.6.27.15-170.2.24.fc10.i686, the r8169 module from realtek works fine.

"Not working" with kernel 2.6.27.19-170.2.35.fc10.i686 manifests itself in a failure to start eth0 after hanging at the "Determining IP information for eth0..." prompt for some seconds, maybe up to a minute.

Comment 74 Michal Hlavinka 2009-03-11 10:49:24 UTC
I don't know if I have *exactly* the same problem, but adding 'pci=msi' works for me without any negatives (at least I don't know them).

Comment 75 Mace Moneta 2009-03-21 21:25:40 UTC
On the rawhide kernel 2.6.29-0.255.rc8.git2.fc11.x86_64, an oops is generated but the network comes right back up successfully (ssh sessions don't even drop):

Mar 21 17:18:11 slayer kernel:------------[ cut here ]------------
Mar 21 17:18:11 slayer kernel:WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xcf/0x13a() (Not tainted)
Mar 21 17:18:11 slayer kernel:Hardware name: C2SEA
Mar 21 17:18:11 slayer kernel:NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Mar 21 17:18:11 slayer kernel:Modules linked in: nls_utf8 fuse rfcomm bridge stp llc bnep sco configfs l2cap w83627ehf hwmon_vid coretemp hwmon ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq freq_table dm_multipath kvm_intel kvm uinput snd_hda_codec_intelhdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm iTCO_wdt snd_timer firewire_ohci snd iTCO_vendor_support firewire_core soundcore i2c_i801 crc_itu_t tulip pata_it8213 r8169 btusb snd_page_alloc zaurus bluetooth cdc_ether usbnet mii usb_storage ata_generic pata_acpi i915 drm i2c_algo_bit i2c_core video output [last unloaded: netconsole]
Mar 21 17:18:11 slayer kernel:Pid: 0, comm: swapper Not tainted 2.6.29-0.255.rc8.git2.fc11.x86_64 #1
Mar 21 17:18:11 slayer kernel:Call Trace:
Mar 21 17:18:11 slayer kernel: <IRQ>  [<ffffffff8104bae3>] warn_slowpath+0xbc/0xf0
Mar 21 17:18:11 slayer kernel: [<ffffffff811a26e5>] ? debug_object_activate+0x38/0xf7
Mar 21 17:18:11 slayer kernel: [<ffffffff81071064>] ? print_lock_contention_bug+0x1b/0xe1
Mar 21 17:18:11 slayer kernel: [<ffffffff8130c66c>] ? netif_tx_lock+0x4d/0x81
Mar 21 17:18:11 slayer kernel: [<ffffffff811a1a03>] ? _raw_spin_unlock+0x8e/0x94
Mar 21 17:18:11 slayer kernel: [<ffffffff81396821>] ? _spin_unlock+0x2b/0x30
Mar 21 17:18:11 slayer kernel: [<ffffffff8130c6f7>] ? dev_watchdog+0x0/0x13a
Mar 21 17:18:11 slayer kernel: [<ffffffff8130c7c6>] dev_watchdog+0xcf/0x13a
Mar 21 17:18:11 slayer kernel: [<ffffffff8106fab8>] ? trace_hardirqs_on_caller+0x1f/0x153
Mar 21 17:18:11 slayer kernel: [<ffffffff8106fbf9>] ? trace_hardirqs_on+0xd/0xf
Mar 21 17:18:11 slayer kernel: [<ffffffff81396791>] ? _spin_unlock_irq+0x30/0x3d
Mar 21 17:18:11 slayer kernel: [<ffffffff81055e0a>] run_timer_softirq+0x182/0x1fd
Mar 21 17:18:11 slayer kernel: [<ffffffff81068298>] ? getnstimeofday+0x5f/0xb3
Mar 21 17:18:11 slayer kernel: [<ffffffff81051353>] __do_softirq+0x94/0x179
Mar 21 17:18:11 slayer kernel: [<ffffffff810127ac>] call_softirq+0x1c/0x30
Mar 21 17:18:11 slayer kernel: [<ffffffff8101393e>] do_softirq+0x52/0xb9
Mar 21 17:18:11 slayer kernel: [<ffffffff81050f76>] irq_exit+0x53/0x90
Mar 21 17:18:11 slayer kernel: [<ffffffff81023b53>] smp_apic_timer_interrupt+0x8e/0xa7
Mar 21 17:18:11 slayer kernel: [<ffffffff81012183>] apic_timer_interrupt+0x13/0x20
Mar 21 17:18:11 slayer kernel: <EOI>  [<ffffffff811fe3ba>] ? acpi_idle_enter_simple+0x14f/0x192
Mar 21 17:18:11 slayer kernel: [<ffffffff8106fbf9>] ? trace_hardirqs_on+0xd/0xf
Mar 21 17:18:11 slayer kernel: [<ffffffff811fe3c2>] ? acpi_idle_enter_simple+0x157/0x192
Mar 21 17:18:11 slayer kernel: [<ffffffff811fe3ba>] ? acpi_idle_enter_simple+0x14f/0x192
Mar 21 17:18:11 slayer kernel: [<ffffffff812d6471>] ? cpuidle_idle_call+0x8d/0xc4
Mar 21 17:18:11 slayer kernel: [<ffffffff810102c7>] ? cpu_idle+0x68/0xb3
Mar 21 17:18:11 slayer kernel: [<ffffffff8139014e>] ? start_secondary+0x199/0x19e
Mar 21 17:18:11 slayer kernel:---[ end trace ff1ecc1657656b68 ]---
Mar 21 17:18:11 slayer kernel:r8169: eth0: link up

Comment 76 Josep 2009-03-31 20:59:51 UTC
Just report that this is still a problem in rawhide kernel 2.6.29-21.fc11.i686.PAE.

I also tried adding pci=msi to the boot command line as mentioned in comment #74, but without success, the only way to get network back is still with the "rmmod; modprobe" trick.

The kernel module from realtek doesn't compile at the moment, so I couldn't try it.

Will this issue be tagged as a blocker for F11 as it was for F10, or will it be dismissed because of the workaround as it was for F10? (although nothing was mentioned in the release notes)

Comment 77 Francois Romieu 2009-03-31 22:04:47 UTC
Created attachment 337414 [details]
prevent late irq events during init

Josep, can you rebuild the r8169 module with the attached patch ?

If it does not work, please check it again : judging from the XID
of your device and the "interrupt 0025 in poll", the patch may
fix the bug for you.

-- 
Ueimor

Comment 78 Josep 2009-04-01 00:24:00 UTC
Hi François, just to confirm that the patch you just posted (comment #77) works fine here, and that so far I don't see any regressions.
Thanks!

Comment 79 Josep 2009-04-22 06:49:37 UTC
Do you know if the above fix will be included in F11? The latest rawhide kernel (2.6.29.1-102.fc11.i686.PAE) still has the issue.

Comment 80 Chuck Ebbert 2009-04-23 22:18:33 UTC
This fix is in kernel 2.6.29.2, which ought to go into f11.

Comment 81 Manfred Knick 2009-06-17 17:48:30 UTC
(In reply to comment #80)
> This fix is in kernel 2.6.29.2, which ought to go into f11.  

I just installed Fedora 11 / amd64

The kernel provided is 2.6.29.4-167.

None of the above hints helps:
changing to manual configuration,
the Network gets loaded,
 ifconfig / netstat -r show the correct values,
but every ping even on local network yields
"Destination Host unreachable",
even with permissive SElinux.

On the connected Switch's LED,
rmmod / modprobe lights off / on both LED (= GigaBit, correctly);
"ping" gives _some_ correlation to them,
but after some while of pings the LEDs don't flicker any more at all ...

Comment 82 Francois Romieu 2009-06-17 20:10:45 UTC
Manfred, can you try 2.6.30 and, if it does not work better, send
a complete dmesg of the booted kernel ?

Thanks.

-- 
Ueimor

Comment 83 Manfred Knick 2009-06-17 20:30:03 UTC
(In reply to comment #82)

> Manfred, can you try 2.6.30 

Unfortunately, as long as no network is working,
there is no "upgrade" ;)

Please be so kind to point out the exact instance you prefer to become tested as a download link;  ( rpm ? )
I will move it onto the machine of concern via USB stick.

> ... and, if it does not work better, send
> a complete dmesg of the booted kernel ?

Yes, sure;
you're welcome.

I've also digged that in 2.6.30 (mainline),
Linus has integrated some more patches for r8169,
so let's give it a test.

I've already tried to use the 2.6.30 from Fedora 12,
but unfortunately I got errors extracting / installing that rpm
( " ... invalid ... missing ... " ).

Comment 84 Francois Romieu 2009-06-17 20:49:54 UTC
Created attachment 348346 [details]
linux kernel v2.6.30 r8169 driver

Can you compile this driver with your current kernel tree ?

-- 
Ueimor

Comment 85 Manfred Knick 2009-06-17 21:02:51 UTC
(In reply to comment #84)
 
> ... your current kernel tree

As mentioned above, there is none yet!

( Plain install from DVD - 
  no network available during install -
  none afterwards:
  ==> no Download of kernel sources packages, ... )

Thus I need a complete source tree.

I fear if I just take the original 2.6.30 from www.eu.kernel.org,
I might well solve my problem, but it won't help you Fedora guys a lot.
Thus I asked which version _you_ prefer
that I should test for you / for Fedora 11 ;)

Kind regards
Manfred

Comment 86 Francois Romieu 2009-06-17 21:18:02 UTC
Created attachment 348350 [details]
Makefile for out-of-tree build

Manfred, create an empty directory, add it the aforementionned r8169 driver,
the Makefile and run make. You do not need a whole tree.

-- 
Ueimor

Comment 87 Adam Pribyl 2009-06-18 09:55:39 UTC
Manfred, does this mean that even the workaround with rmmod a modprobe of the driver does not work anymore?

Comment 88 Manfred Knick 2009-06-20 06:05:27 UTC
(In reply to comment #87)
> Manfred, does this mean that even the workaround with rmmod a modprobe of the
> driver does not work anymore?  

    Yesss !

Comment 89 Manfred Knick 2009-06-20 06:39:29 UTC
(In reply to comment #86)

> Makefile for out-of-tree build

Thanks!

RESULT: Same.

Moreover: after letting ping running loose for some minutes,
it produced error messages that there would be no space for the buffer available!
Notabene: This machine runs with a core of 8 GiB main memory !

But: There's hope in sight :)

I've quickly investigated with a Gentoo system on a parallel set of partitions
and compiled hand-crafted kernels.

RESULT: ... searching under the wrong set of bushes ... ;)

- Works beginning with 2.6.27
- Works with 2.6.28.*
- Broken throughout 2.6.29 (at least up to 2.6.29.4 today)
- Works with 2.6.30 again

Moreover:
- The ping errors were reproducible here with 2.6.29.4 too :(

ERGO:
In Fedora, I also disposed of the 2.6.29.*, 
hand-crafted a complete fresh 2.6.30 from www.eu.kernel.org
and adapted grub: VOILA!

Afterwards, normal upgrade procedures worked as expected ...

Hope this helps!

Kind regards
Manfred

Comment 90 Manfred Knick 2009-06-20 06:59:21 UTC
Cross-Reference: 

     http://bugs.gentoo.org/show_bug.cgi?id=274765

Comment 91 Francois Romieu 2009-06-20 10:38:47 UTC
Manfred Knick :
[...]
> - Works with 2.6.30 again

This is the expected result.

Could you attach a complete dmesg before the issue is closed ?

Thanks.

-- 
Ueimor

Comment 92 Manfred Knick 2009-06-20 12:08:23 UTC
Created attachment 348743 [details]
complete dmesg

Comment 93 Manfred Knick 2009-06-20 12:26:31 UTC
(In reply to comment #91)

> Manfred Knick :
> [...]
> > - Works with 2.6.30 again
 
> This is the expected result.

Well, not exactly -
because your proposed approach of just compiling that 2.6.30-r8169-module out-of-tree still (besides being non-functional) ran into that horrible memory leakage, as reported above, right?

> Could you attach a complete dmesg 

Sure, you're welcome; cf. above.

> before the issue is closed ?

I can't see how this can be closed before
- the new 2.6.30 r8160 is being backported into Fedora-2.6.29 *and*
- that leakage between r8169 and the rest of the kernel is being discovered and sorted out *and*
- a new kernel version is provided via the regular update mechanisms
*OR*
- Fedora 11 completely upgrades to a (functional) 2.6.30.

Although being the usual common procedure for typical Gentoo users,
as far as I understand, Fedora does _not_ expect it's users
to build hand-crafted kernels themselves;
moreover: especially not to identify and download the appropriate version from kernel.org and configure && make && install from scratch ...

Yours respectfully
Manfred

Comment 94 Manfred Knick 2009-06-20 12:30:04 UTC
(In reply to comment #92)

> Created an attachment (id=348743) [details]

> complete dmesg  

Sorry, unexpectedly this BUG reporting suite ate and did not report the original filename; this is

     "Fedora-2-6-29.dmesg"

Comment 95 Francois Romieu 2009-06-20 13:42:50 UTC
Manfred Knick :
> Romieu:
> > Manfred Knick :
> > [...]
> > > - Works with 2.6.30 again
> 
> > This is the expected result.
> 
> Well, not exactly -
> because your proposed approach of just compiling that 2.6.30-r8169-module
> out-of-tree still (besides being non-functional) ran into that horrible memory
> leakage, as reported above, right?

"ping sendmsg no buffer space available" keywords with a non-functional
driver does not mean that the memory is exhausted.

You saw these words, right ? :o)

[...]
> I can't see how this can be closed before
> - the new 2.6.30 r8160 is being backported into Fedora-2.6.29 *and*

I do not say the opposite.

-- 
Ueimor

Comment 96 Manfred Knick 2009-06-22 08:10:51 UTC
(In reply to comment #95)


> does not mean that the memory is exhausted.

Sure! Obviously, because the system as a whole keeps going, nevertheless ...
{
  Sorry for not stating "black humor" explicitly enough ;(
}


> I do not say the opposite.

Whenever you like to have an updated kernel tree (be it source.tar.bz2, be it rpm) tested, I volunteer to compile and run it for you upon my machine ...
you're welcome!

Kind regards
Manfred

Comment 97 Chuck Ebbert 2009-06-24 02:39:28 UTC
All of the just-released Fedora kernels (F-9, F-10 and F-11) have r8169 updates.

Comment 98 Manfred Knick 2009-06-24 07:03:36 UTC
(In reply to comment #97)

> All ... just-released ...

With amd64, "yum check-update" does not offer anything newer than 2.6.29.4-167 jet. Perhaps it's just the mirrors needing some time (? up to 24 hours ?) to sync ...

Please, could you be so kind to do us the favour and 
specify the exact version numbers which are "just released" ?

THANKS a lot!

Comment 99 Manfred Knick 2009-06-25 09:08:06 UTC
(In reply to comment #97)
> All of the just-released Fedora kernels (F-9, F-10 and F-11) have r8169
> updates.  

Today, -r5 became available as an update.

It does *NOT* solve the problems I reported above: still same result.

Already seconds after logging in, without doing anything by any user,
the kernel throws an error:


Kernel failure message 1:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0xcf/0x12c() (Not tainted)
Hardware name: To Be Filled By O.E.M.
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Modules linked in: fuse mga drm ipt_MASQUERADE iptable_nat nf_nat bridge stp llc bnep sco l2cap bluetooth sunrpc ipv6 cpufreq_ondemand powernow_k8 freq_table ext2 dm_multipath kvm_amd kvm uinput r8169 snd_usb_audio snd_pcm snd_timer snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device snd_hwdep snd i2c_nforce2 i2c_core ppdev mii sata_nv shpchp parport_pc pcspkr pata_jmicron soundcore parport ata_generic pata_acpi pata_amd matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.29.5-191.fc11.x86_64 #1
Call Trace:
 <IRQ>  [<ffffffff8104883f>] warn_slowpath+0xbc/0xf0
 [<ffffffff813abe44>] ? _spin_unlock_irqrestore+0x2c/0x42
 [<ffffffff81039604>] ? task_rq_unlock+0x11/0x13
 [<ffffffff8104056b>] ? try_to_wake_up+0x25b/0x26d
 [<ffffffff8104058f>] ? default_wake_function+0x12/0x14
 [<ffffffff8105c8ed>] ? autoremove_wake_function+0x16/0x39
 [<ffffffff810379ac>] ? __wake_up_common+0x4e/0x84
 [<ffffffff813abbfa>] ? _spin_lock+0xe/0x11
 [<ffffffff8132196d>] dev_watchdog+0xcf/0x12c
 [<ffffffff8103d3b1>] ? resched_cpu+0xa4/0xad
 [<ffffffff813abf5f>] ? _spin_lock_irq+0x27/0x2a
 [<ffffffff81051e67>] run_timer_softirq+0x19e/0x224
 [<ffffffff81063094>] ? getnstimeofday+0x5f/0xb3
 [<ffffffff8104df6f>] __do_softirq+0x94/0x155
 [<ffffffff8101274c>] call_softirq+0x1c/0x30
 [<ffffffff810138ce>] do_softirq+0x52/0xb9
 [<ffffffff8104db92>] irq_exit+0x53/0x90
 [<ffffffff81022464>] smp_apic_timer_interrupt+0x8e/0xa7
 [<ffffffff81012123>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81029424>] ? native_safe_halt+0xb/0xd
 [<ffffffff81017d30>] ? default_idle+0x51/0x7c
 [<ffffffff81017e92>] ? c1e_idle+0x124/0x12b
 [<ffffffff810102a1>] ? cpu_idle+0x68/0xb3
 [<ffffffff81397937>] ? rest_init+0x6b/0x6d
---[ end trace 21d970b403765fce ]---


Running "ping" to a host on the local Ethernet yields:

$ ping xxx.xxx.xxx.xxx
PING xxx.xxx.xxx.xxx (xxx.xxx.xxx.xxx) 56(84) bytes of data.
From xxx.xxx.xxx.xxx icmp_seq=1 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=2 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=5 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=6 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=10 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=12 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=13 Destination Host Unreachable
From xxx.xxx.xxx.xxx icmp_seq=14 Destination Host Unreachable
64 bytes from xxx.xxx.xxx.xxx: icmp_seq=15 ttl=64 time=12080 ms
64 bytes from xxx.xxx.xxx.xxx: icmp_seq=16 ttl=64 time=11080 ms
64 bytes from xxx.xxx.xxx.xxx: icmp_seq=17 ttl=64 time=10081 ms
64 bytes from xxx.xxx.xxx.xxx: icmp_seq=18 ttl=64 time=9081 ms
64 bytes from xxx.xxx.xxx.xxx: icmp_seq=19 ttl=64 time=8081 ms
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
^C
--- 10.31.101.99 ping statistics ---
36 packets transmitted, 5 received, +8 errors, 86% packet loss, time 55666ms
rtt min/avg/max/mdev = 8081.128/10081.000/12080.729/1414.071 ms, pipe 12

Which is 
- 8 x "fails"
- 4 x "works"
- then "breaks" :  Strange !

(With all the updates provided till yet, 2.6.30 still works.)

Comment 100 Manfred Knick 2009-06-25 11:08:19 UTC
(In reply to comment #99)
> (In reply to comment #97)

> Today, -r5 became available as an update.

To be precise: make this

   2.6.29.5-191.fc11.x86_64

(Sorry, I have to write these reports from a machine in a different location than the one of concern.)

Comment 101 Dale Ogilvie 2009-06-28 08:43:27 UTC
Just to let you know, after F11 upgrade (taking me to 2.6.29.5-191.fc11.i586), networking on my previously broken F10 system seems to be working fine. I can load network service on startup without incident.

lspci -nn
00:0b.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)

mesg | grep XID
eth0: RTL8110s at 0xfa664f00, 00:0c:76:56:54:57, XID 04000000 IRQ 16

mii-tool -v | grep product
product info: vendor 00:07:32, model 17 rev 0

Comment 102 Manfred Knick 2009-06-28 15:50:55 UTC
(In reply to comment #101)
> Just to let you know, after F11 upgrade (taking me to 2.6.29.5-191.fc11.i586),
> networking on my previously broken F10 system seems to be working fine. I can
> load network service on startup without incident.

  Well, it seems we can identify to x86 versus amd64 kernel versions ...
 
> lspci -nn
> 00:0b.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169
> Gigabit Ethernet [10ec:8169] (rev 10)

  Mine is 10ec:8168, which means "produced for ASUStek"
 
> mesg | grep XID
> eth0: RTL8110s at 0xfa664f00, 00:0c:76:56:54:57, XID 04000000 IRQ 16
> 
> mii-tool -v | grep product
> product info: vendor 00:07:32, model 17 rev 0  

  Mine is 00:07:32 , 17 rev 2

Comment 103 Manfred Knick 2009-06-28 15:52:57 UTC
(In reply to comment #102)
> (In reply to comment #101)
 
>   Mine is 10ec:8168, which means "produced for ASUStek"

    Yours means "produced for Gigabyte", as far as I could identitfy.

Comment 104 Manfred Knick 2009-06-28 16:02:58 UTC
http://pci-ids.ucw.cz/v2.2/pci.ids:

10ec  Realtek Semiconductor Co., Ltd.
   ...
	8168  RTL8111/8168B PCI Express Gigabit Ethernet controller
		1043 11f5  A6J-Q008
		1043 16d5  U6V laptop
		1043 81aa  P5B
		1458 e000  GA-EP45-DS5 Motherboard
		1462 238c  Onboard RTL8111b on MSI P965 Platinum Mainboard
		1462 368c  K9AG Neo2
		1849 8168  Motherboard (one of many)
	8169  RTL-8169 Gigabit Ethernet
		1025 0079  Aspire 5024WLMi
		10bd 3202  EP-320G-TX1 32-bit PCI Gigabit Ethernet Adapter
		1259 c107  CG-LAPCIGT
		1371 434e  ProG-2000L
		1385 311a  GA311
		1458 e000  GA-8I915ME-G Mainboard
		1462 030c  K8N Neo-FSR v2.0 mainboard
		1462 702c  K8T NEO 2 motherboard
		1462 7094  K8T Neo2-F V2.0
		16ec 011f  USR997903
		1734 1091  D2030-A1
		a0a0 0449  AK86-L motherboard
   ...

Comment 105 Bug Zapper 2009-11-18 08:20:17 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 106 Keith Wilkinson 2009-11-19 07:12:53 UTC
  The RTL8111C chip is widely used on the most popular AMD motherboards (such as the Gigabyte GA-MA78GM series and GA-MA78GPM), and (as mentioned in my post for bug 448712) I confirmed that my RTL8111C was not recognized by Fedora 10, Fedora 11, Fedora 12, RHEL5.3 or SLES11, so I considered this to be a major problem. 
  I was able to use the update on the Realtek web site (mentioned in my post for bug 448712) to get the RTL8111C chip to work with Fedora 11, so I thought that there was a workaround for the problem, but the Realtek site's update seems to fail with Fedora 12 (have to plug in another Ethernet controller).
  Because Realtek seems to be the major network chipset provider for AMD
motherboards, I would consider this to be a major problem for AMD, not just
Realtek. Someone should impress on them (Realtek) the importance of getting
backward-compatible driver code approved and into the LINUX kernel as early as
possible.

Comment 107 Keith Wilkinson 2009-11-19 07:15:28 UTC
Sorry I neglected to mention, but I am using x86_64.

Comment 108 Keith Wilkinson 2009-12-02 12:01:40 UTC
Realtek RTL8111C now working with Fedora 12.  Fixed by Kernel update?

Comment 109 Manfred Knick 2009-12-02 12:42:55 UTC
(In reply to comment #108)
> Realtek RTL8111C now working with Fedora 12.  Fixed by Kernel update?  

(In REMEMBRANCE OF comment #89)
> ...
> - Works beginning with 2.6.27
> - Works with 2.6.28.*
> - Broken throughout 2.6.29 (at least up to 2.6.29.4 today)
> - Works with 2.6.30 again
> ...

Comment 110 Keith Wilkinson 2009-12-10 06:48:24 UTC
It's ironic but the RTL8169/RTL8111C driver on the Fedora 11 LiveCD works OK with RTL8111C, but if you select "install to disk" then the driver on the hard disk installation does not work. I will try to find time to check if the same is true for Fedora 12 (i.e. RTL8169/RTL8111C driver works OK on the Live CD but does not work if you Install to Disk from the LiveCD.

Comment 111 Bug Zapper 2009-12-18 06:20:54 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.