Bug 252955

Summary:	[r8169?] Total system freeze during "heavy" disk load
Product:	[Fedora] Fedora	Reporter:	Micke <dukhat2259+BUGZILLA>
Component:	kernel	Assignee:	Kernel Maintainer List <kernel-maint>
Status:	CLOSED WONTFIX	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	7	CC:	alberto.gonzalez.b, chris.brown, romieu, triage
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2008-06-17 02:10:26 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Micke 2007-08-16 09:43:26 UTC

Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Micke 2007-08-16 14:49:29 UTC

(%$*& report got submitted when I pressed enter in the component version field...)

Anyway this is roughly what I've done:
0. Replaced old eth1 with a Realtek 8169 and at the same time added a new 400GB
Samsung PATA drive.
1. Set up LVM on the new drive and copied everything from a 250GB drive onto it.
Added 250 GB drive to VG.
2. Began copying VS 2005 from work via laptop on internal network and back to
the server (job -> eth0 -> eth1 -> laptop -> eth1 -> smb share) on one of the LVs.
3. After a while (say 2GB) the server froze. No mouse movement, no network, no
nothing.
4. Used Power switch to restart server.
5. A few days later I tried the same action. Same freeze at roughly the same point.
6. This time I unplugged and re-inserted the eth1 cable. System unfroze!
7. Tried to copy a third time. This time the system was left frozen during the
afternoon. Touched cable and system unfroze.

8. Later tried to expand one of the LVs and got information about *corruption*
(before actually expanding)! The VS 2005 copy shared blocks with previously
existing files! Some other minor(?) inconsistencies in the directory structure.

9. Repaired file system. Got info on shared blocks being duplicated.
10. Tried to move data off another 400GB drive in order to add it to the VG.
Lots and lots of strange PATA messages in the log file. System went into a
sluggish state where it no longer could be shut down. Hard reset necessary.
11. Tried moving data off the drive several times. Same result. 

12. Moved the old 400GB disk to the same cable as the new 400GB disk. Problem
went away!

13. Experienced another "unpluggable" system freeze while copying lots of data
from a smb share. Probably froze after 2GB.


I'd say there are two or three major problems hidden in here: System freeze,
disk corruption and PATA problems. Some problems might of course be related.


Notable logs:
(7) System frozen over long time (somewhere after 13:27 until 21:04 when the
cable was unplugged and re-attached):
Aug  7 13:27:06 faluserver kernel: PRER:IN=eth0 OUT= MAC=00:08:c7:73:81:9c:00:a0
:c5:b9:d4:57:08:00 SRC=59.124.31.175 DST=192.168.100.64 LEN=50 TOS=0x00 PREC=0x0
0 TTL=105 ID=54067 PROTO=UDP SPT=9751 DPT=22109 LEN=30
Aug  7 21:04:51 faluserver kernel: r8169: eth1: link down
Aug  7 21:04:52 faluserver kernel: PRER:IN=eth0 OUT= MAC=00:08:c7:73:81:9c:00:a0
:c5:b9:d4:57:08:00 SRC=195.20.20.130 DST=192.168.100.64 LEN=50 TOS=0x00 PREC=0x0
0 TTL=113 ID=33210 PROTO=UDP SPT=61964 DPT=22109 LEN=30
< seven hours of timeout messages snipped >
Aug  7 21:05:03 faluserver kernel: REJECT:IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:
a0:c5:b9:d4:57:08:00 SRC=192.168.100.1 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 T
TL=1 ID=38779 PROTO=2
Aug  7 21:05:03 faluserver kernel: r8169: eth1: link up
Aug  7 21:05:03 faluserver kernel: REJECT:IN=eth0 OUT= MAC=01:00:5e:00:00:01:00:
a0:c5:b9:d4:57:08:00 SRC=192.168.100.1 DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 T
TL=1 ID=38914 PROTO=2


(10) PATA messages in the log (lots of them!) 
Aug 12 21:39:32 faluserver kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x
0 action 0x2 frozen
Aug 12 21:39:32 faluserver kernel: ata3.01: cmd c8/00:08:9f:66:9b/00:00:00:00:00
/f6 tag 0 cdb 0x0 data 4096 in
Aug 12 21:39:32 faluserver kernel:          res 40/00:00:00:00:00/00:00:00:00:00
/00 Emask 0x4 (timeout)
Aug 12 21:39:32 faluserver kernel: ata3: soft resetting port
Aug 12 21:39:32 faluserver kernel: ata3.01: configured for UDMA/100
Aug 12 21:39:32 faluserver kernel: ata3: EH complete
Aug 12 21:39:34 faluserver smbd[6037]: [2007/08/12 21:39:34, 0] lib/util_tdb.c:t
db_log(662)
Aug 12 21:39:34 faluserver smbd[6037]:   tdb(/var/lib/samba/printing/ml1610.tdb)
: tdb_rec_read bad magic 0xd9fee666 at offset=21328
Aug 12 21:39:34 faluserver kernel: ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x
0 action 0x2 frozen
Aug 12 21:39:34 faluserver kernel: ata4.01: cmd 35/00:00:0f:49:68/00:01:2a:00:00
/f0 tag 0 cdb 0x0 data 131072 out
Aug 12 21:39:34 faluserver kernel:          res 40/00:00:0e:4a:68/00:00:2a:00:00
/f0 Emask 0x4 (timeout)
Aug 12 21:39:34 faluserver kernel: ata4: soft resetting port
Aug 12 21:39:35 faluserver kernel: ata4.00: configured for UDMA/100
Aug 12 21:39:35 faluserver kernel: ata4.01: configured for UDMA/100
Aug 12 21:39:35 faluserver kernel: sd 3:0:1:0: [sdd] Result: hostbyte=DID_OK dri
verbyte=DRIVER_SENSE,SUGGEST_OK
Aug 12 21:39:35 faluserver kernel: sd 3:0:1:0: [sdd] Sense Key : Aborted Command
 [current] [descriptor]
Aug 12 21:39:35 faluserver kernel: Descriptor sense data with sense descriptors
(in hex):
Aug 12 21:39:35 faluserver kernel:         72 0b 00 00 00 00 00 0c 00 0a 80 00 0
0 00 00 00
Aug 12 21:39:35 faluserver kernel:         2a 68 4a 0e
Aug 12 21:39:35 faluserver kernel: sd 3:0:1:0: [sdd] Add. Sense: No additional s
ense information
Aug 12 21:39:35 faluserver kernel: end_request: I/O error, dev sdd, sector 71147
7519

Aug 13 13:25:39 faluserver kernel: ata3.01: exception Emask 0x0 SAct 0x0 SErr 0x
0 action 0x2 frozen
Aug 13 13:25:39 faluserver kernel: ata3.01: cmd c8/00:08:9f:f8:4c/00:00:00:00:00
/f6 tag 0 cdb 0x0 data 4096 in
Aug 13 13:25:39 faluserver kernel:          res 40/00:00:00:00:00/00:00:00:00:00
/00 Emask 0x4 (timeout)
Aug 13 13:25:39 faluserver kernel: ata3: soft resetting port
Aug 13 13:25:39 faluserver kernel: ata3.01: configured for UDMA/66
Aug 13 13:25:39 faluserver kernel: ata3: EH complete
Aug 13 13:25:46 faluserver kernel: ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x
0 action 0x2 frozen
Aug 13 13:25:46 faluserver kernel: ata4.01: cmd 35/00:08:a7:54:3a/00:00:2a:00:00
/f0 tag 0 cdb 0x0 data 4096 out
Aug 13 13:25:46 faluserver kernel:          res 40/00:00:de:3d:3c/00:00:2a:00:00
/f0 Emask 0x4 (timeout)
Aug 13 13:25:46 faluserver kernel: ata4: soft resetting port
Aug 13 13:25:46 faluserver kernel: ata4.00: configured for UDMA/100
Aug 13 13:25:46 faluserver kernel: ata4.01: configured for UDMA/66
Aug 13 13:25:46 faluserver kernel: ata4: EH complete


Regarding (10) drives were sitting on a Promise Ultra100 controller.
Configuration before:
A-master  none
A-slave   new 400GB
B-master  250GB     (destination)
B-slave   400GB     (source)
After:
A-master  400GB     (source)
A-slave   new 400GB
B-master  250GB     (destination)
B-slave   none


That's about it.

Comment 2 Chuck Ebbert 2007-08-16 15:00:41 UTC

(In reply to comment #1)
> Regarding (10) drives were sitting on a Promise Ultra100 controller.
> Configuration before:
> A-master  none
> A-slave   new 400GB
> B-master  250GB     (destination)
> B-slave   400GB     (source)
> After:
> A-master  400GB     (source)
> A-slave   new 400GB
> B-master  250GB     (destination)
> B-slave   none
> 

Slave drive by itself on a cable will cause problems.

And, what kernel version was running?

Comment 3 Micke 2007-08-16 15:07:21 UTC

Strange thing was that the B-cable had the problems. But perhaps that's what to
be expected.

kernel-2.6.22.1-41.fc7

Comment 4 Micke 2007-09-20 13:13:08 UTC

Continuing with the problem, a new phase has been entered:
* No more disk corruption seen (Changed the title). Must have been due to the
missing PATA master disk.
* No more "unpluggable" system freezes. Which one might think that is good, but...
* The system now hangs completely under the same situations as before (perhaps
the fix of the 8169 problem causes a different detour in the code?)

Short summary of the incidents:
* Hung when writing data via samba to server's disks. FREQUENTLY! A single 300MB
file is now enough to choke the system.
* Hung when reading data via samba from server's disks. Once.
* Hung when moving data off a physical disk in an LVM. Twice during night.
* Hung when adding a disk to a software Raid. Once.

* Absolutely nothing in the log files!
* Currently at kernel-2.6.22.5-76.fc7

In my eyes, the system is old (450MHz) and appears to choke on the data. I'm
considering switching back to the old 100Mbit/s NIC to stay sane.

Comment 5 Chuck Ebbert 2007-09-20 20:03:08 UTC

Seems there are still bugs in the 8169 driver. And a long list of patches went
into 2.6.23 to fix the problems, so it will be hard to figure out what to apply
to 2.6.22. I would replace the card. (And can you post output of 'lspci -n' and
'lspci' for that device so we can see exactly what model it is?)

Comment 6 Chuck Ebbert 2007-09-20 21:26:06 UTC

If you can, try a different network card so we can see if it is really r8169's
fault.

Comment 7 Micke 2007-09-21 09:01:54 UTC

Point being that I'm not sure where the problem is any longer. Moving data
between disks shouldn't interfere with the NIC. But then again, what do I know...

# lspci -n
00:00.0 0600: 8086:7190 (rev 03)
00:01.0 0604: 8086:7191 (rev 03)
00:0d.0 0200: 8086:1229 (rev 05)
00:0e.0 0200: 10ec:8169 (rev 10)
00:0f.0 0c03: 1106:3038 (rev 61)
00:0f.1 0c03: 1106:3038 (rev 61)
00:0f.2 0c03: 1106:3104 (rev 63)
00:10.0 0180: 105a:4d30 (rev 02)
00:14.0 0601: 8086:7110 (rev 02)
00:14.1 0101: 8086:7111 (rev 01)
00:14.2 0c03: 8086:7112 (rev 01)
00:14.3 0680: 8086:7113 (rev 02)
01:00.0 0300: 1002:4742 (rev 5c)

# lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge
(rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX AGP bridge
(rev 03)
00:0d.0 Ethernet controller: Intel Corporation 82557/8/9 [Ethernet Pro 100] (rev 05)
00:0e.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit
Ethernet (rev 10)
00:0f.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 61)
00:0f.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
(rev 61)
00:0f.2 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 63)
00:10.0 Mass storage controller: Promise Technology, Inc. PDC20267
(FastTrak100/Ultra100) (rev 02)
00:14.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 02)
00:14.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
00:14.2 USB Controller: Intel Corporation 82371AB/EB/MB PIIX4 USB (rev 01)
00:14.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP 1X/2X
(rev 5c)

Comment 8 Micke 2007-09-24 16:01:23 UTC

OK, I've downgraded the NIC.

I've sent large files back a forth through the NIC. I've copied files internally
between a separate disk and the Raid. No freeze yet.

The old controller reads 10MB/s from the disk, and I can get 30MB/s off the
Raid. If it is a data choking problem, the 100M NIC probably receives data in a
safe rate compared to the disks. The 1G NIC should pile up data rapidly at the
gates of the disks. I distinctly remember I had to patch the 8139 drivers a few
years ago when the barfed at the data rate. If on the other hand it is a goofy
implementation of the r8169 driver, I should be on the safe side.

Comment 9 Chuck Ebbert 2007-09-24 18:46:46 UTC

There is a patch queued for the 8169 driver that might fix this. Will queue for
the next Fedora kernel.

Comment 10 Chuck Ebbert 2007-09-28 17:31:06 UTC

Potential fix in kernel 2.6.22.9-61.fc6. It will be in the
updates-testing repository soon.

Comment 11 Chuck Ebbert 2007-09-28 17:35:28 UTC

Oops, that is in kernel 2.6.22.9-91.fc7

Comment 12 Micke 2007-10-03 07:24:29 UTC

Updated to kernel 2.6.22.9-91.fc7 and inserted r8169 NIC yesterday. Copied some
files. Seemed to work. Today I moved a 4.3GB file from the server and when I
came back to check: total freeze on the server. The file had been moved
completely off the server before the freeze.

Whatever the problem is, it is not gone.

Comment 13 Christopher Brown 2008-01-11 17:05:41 UTC

Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.

Comment 14 Micke 2008-01-15 09:34:35 UTC

The bug is really annoying so I've continued to use my old 100M NIC until there
is  some sign/message of improvement. I can't use a server that stops dead ever
so often.

If there is anything new to test, I can do that. Otherwise I'll be moving off to
new hardware quite soon... (moving from 450MHz to 2x2.1GHz - won't that be an
improvement)

Comment 15 Alberto Gonzalez 2008-01-15 19:59:31 UTC

I have a similar problem with kernel 2.6.23.12-52.fc7.
My RTL-8169 NIC is a pcmcia model.
When i copy large files using NFS, the system lockup.
But, as the NIC is a pcmcia one, I can unglug and plug again.
This operation unlocks the system, showing this error:

Jan 13 09:54:00 salon kernel: BUG: soft lockup - CPU#0 stuck for 11s! [nfsd:2332
]
Jan 13 09:54:00 salon kernel: 
Jan 13 09:54:00 salon kernel: Pid: 2332, comm:                 nfsd
Jan 13 09:54:00 salon kernel: EIP: 0060:[<deaf4b31>] CPU: 0
Jan 13 09:54:00 salon kernel: EIP is at rtl8169_interrupt+0x15/0x21e [r8169]
Jan 13 09:54:00 salon kernel:  EFLAGS: 00000282    Tainted: P         (2.6.23.12
-52.fc7 #1)
Jan 13 09:54:00 salon kernel: EAX: 00000001 EBX: c1698760 ECX: 00000000 EDX: dcc
d5000
Jan 13 09:54:00 salon kernel: ESI: dccd5000 EDI: dccd5500 EBP: de84c000 DS: 007b
 ES: 007b FS: 00d8
Jan 13 09:54:00 salon kernel: CR0: 8005003b CR2: 08c4e30c CR3: 1d9fd000 CR4: 000
00080
Jan 13 09:54:00 salon kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 000
00000
Jan 13 09:54:00 salon kernel: DR6: ffff0ff0 DR7: 00000400
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085954893/-1073741824] handle_
IRQ_event+0x23/0x51
Jan 13 09:54:00 salon kernel:  [<c045a4b3>] handle_IRQ_event+0x23/0x51
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085949625/-1073741824] handle_
level_irq+0x74/0xb9
Jan 13 09:54:00 salon kernel:  [<c045b947>] handle_level_irq+0x74/0xb9
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085949741/-1073741824] handle_
level_irq+0x0/0xb9
Jan 13 09:54:00 salon kernel:  [<c045b8d3>] handle_level_irq+0x0/0xb9
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086293205/-1073741824] do_IRQ+
0x8c/0xb9
Jan 13 09:54:00 salon kernel:  [<c0407b2b>] do_IRQ+0x8c/0xb9
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086167914/-1073741824] __wake_
up+0x32/0x43
Jan 13 09:54:00 salon kernel:  [<c0426496>] __wake_up+0x32/0x43
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086299789/-1073741824] common_
interrupt+0x23/0x30
Jan 13 09:54:00 salon kernel:  [<c0406173>] common_interrupt+0x23/0x30
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084323776/-1073741824] __tcp_s
elect_window+0x1/0x10f
Jan 13 09:54:00 salon kernel:  [<c05e8840>] __tcp_select_window+0x1/0x10f
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084331155/-1073741824] __tcp_a
ck_snd_check+0x24/0x6c
Jan 13 09:54:00 salon kernel:  [<c05e6b6d>] __tcp_ack_snd_check+0x24/0x6c
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084325775/-1073741824] tcp_rcv
_established+0x4fd/0x7e0
Jan 13 09:54:00 salon kernel:  [<c05e8071>] tcp_rcv_established+0x4fd/0x7e0
Jan 13 09:54:00 salon kernel:  [<de85e3fd>] ata_qc_issue+0x464/0x4ba [libata]
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084302952/-1073741824] tcp_v4_
do_rcv+0x2d2/0x605
Jan 13 09:54:00 salon kernel:  [<c05ed998>] tcp_v4_do_rcv+0x2d2/0x605
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084409937/-1073741824] ip_rout
e_input+0x3a/0xc72
Jan 13 09:54:00 salon kernel:  [<c05d37af>] ip_route_input+0x3a/0xc72
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084538439/-1073741824] skb_che
cksum+0x4f/0x29a
Jan 13 09:54:00 salon kernel:  [<c05b41b9>] skb_checksum+0x4f/0x29a
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084292887/-1073741824] tcp_v4_
rcv+0x846/0x8d1
Jan 13 09:54:00 salon kernel:  [<c05f00e9>] tcp_v4_rcv+0x846/0x8d1
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085812004/-1073741824] __slab_
alloc+0x41a/0x466
Jan 13 09:54:00 salon kernel:  [<c047d2dc>] __slab_alloc+0x41a/0x466
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084397623/-1073741824] ip_loca
l_deliver+0x189/0x230
Jan 13 09:54:00 salon kernel:  [<c05d67c9>] ip_local_deliver+0x189/0x230
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084398073/-1073741824] ip_rcv+
0x481/0x4ba
Jan 13 09:54:00 salon kernel:  [<c05d6607>] ip_rcv+0x481/0x4ba
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086056642/-1073741824] getnsti
meofday+0x30/0xbe
Jan 13 09:54:00 salon kernel:  [<c044173e>] getnstimeofday+0x30/0xbe
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084516288/-1073741824] netif_r
eceive_skb+0x2e1/0x346
Jan 13 09:54:00 salon kernel:  [<c05b9840>] netif_receive_skb+0x2e1/0x346
Jan 13 09:54:00 salon kernel:  [<deaf47cd>] rtl8169_rx_interrupt+0x49a/0x4a9 [r8
169]
Jan 13 09:54:00 salon kernel:  [<deaf64e2>] rtl8169_poll+0x36/0x1f8 [r8169]
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084507772/-1073741824] net_rx_
action+0x9a/0x196
Jan 13 09:54:00 salon kernel:  [<c05bb984>] net_rx_action+0x9a/0x196
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086120466/-1073741824] __do_so
ftirq+0x66/0xd3
Jan 13 09:54:00 salon kernel:  [<c0431dee>] __do_softirq+0x66/0xd3
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086293443/-1073741824] do_soft
irq+0x6c/0xce
Jan 13 09:54:00 salon kernel:  [<c0407a3d>] do_softirq+0x6c/0xce
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085949741/-1073741824] handle_
level_irq+0x0/0xb9
Jan 13 09:54:00 salon kernel:  [<c045b8d3>] handle_level_irq+0x0/0xb9
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086120783/-1073741824] irq_exi
t+0x38/0x6b
Jan 13 09:54:00 salon kernel:  [<c0431cb1>] irq_exit+0x38/0x6b
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086293186/-1073741824] do_IRQ+
0x9f/0xb9
Jan 13 09:54:00 salon kernel:  [<c0407b3e>] do_IRQ+0x9f/0xb9
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084115722/-1073741824] __sched
_text_start+0x5be/0x638
Jan 13 09:54:00 salon kernel:  [<c061b4f6>] __sched_text_start+0x5be/0x638
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086299789/-1073741824] common_
interrupt+0x23/0x30
Jan 13 09:54:00 salon kernel:  [<c0406173>] common_interrupt+0x23/0x30
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084115603/-1073741824] __sched
_text_start+0x635/0x638
Jan 13 09:54:00 salon kernel:  [<c061b56d>] __sched_text_start+0x635/0x638
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084113947/-1073741824] schedul
e_timeout+0x70/0x8f
Jan 13 09:54:00 salon kernel:  [<c061bbe5>] schedule_timeout+0x70/0x8f
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1085906113/-1073741824] __alloc
_pages+0x64/0x2a2
Jan 13 09:54:00 salon kernel:  [<c046633f>] __alloc_pages+0x64/0x2a2
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086109714/-1073741824] process
_timeout+0x0/0x5
Jan 13 09:54:00 salon kernel:  [<c04347ee>] process_timeout+0x0/0x5
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1084113952/-1073741824] schedul
e_timeout+0x6b/0x8f
Jan 13 09:54:00 salon kernel:  [<c061bbe0>] schedule_timeout+0x6b/0x8f
Jan 13 09:54:00 salon kernel:  [<df18c40b>] svc_recv+0x232/0x393 [sunrpc]
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086161825/-1073741824] default
_wake_function+0x0/0xc
Jan 13 09:54:00 salon kernel:  [<c0427c5f>] default_wake_function+0x0/0xc
Jan 13 09:54:00 salon kernel:  [<df1f26f2>] nfsd+0xd8/0x282 [nfsd]
Jan 13 09:54:00 salon kernel:  [<df1f261a>] nfsd+0x0/0x282 [nfsd]
Jan 13 09:54:00 salon kernel:  [phys_startup_32+-1086299113/-1073741824] kernel_
thread_helper+0x7/0x10
Jan 13 09:54:00 salon kernel:  [<c0406417>] kernel_thread_helper+0x7/0x10
Jan 13 09:54:00 salon kernel:  =======================
...
...
Jan 13 09:56:38 salon kernel: pccard: card ejected from slot 0
Jan 13 09:56:38 salon kernel: ACPI: PCI interrupt for device 0000:02:00.0 disabl
ed
Jan 13 09:56:41 salon kernel: cs: pcmcia_socket0: unable to apply power.
Jan 13 09:56:42 salon kernel: pccard: CardBus card inserted into slot 0
Jan 13 09:56:42 salon kernel: r8169 Gigabit Ethernet driver 2.2LK-NAPI loaded
Jan 13 09:56:42 salon kernel: PCI: Enabling device 0000:02:00.0 (0000 -> 0003)
Jan 13 09:56:42 salon kernel: ACPI: PCI Interrupt 0000:02:00.0[A] -> Link [LNKA]
 -> GSI 5 (level, low) -> IRQ 5
Jan 13 09:56:42 salon kernel: eth0: RTL8169sb/8110sb at 0xde84c000, 00:10:60:26:
07:87, XID 10000000 IRQ 5
Jan 13 09:56:43 salon kernel: r8169: eth0: link down
Jan 13 09:56:43 salon kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Jan 13 09:56:44 salon kernel: r8169: eth0: link up
Jan 13 09:56:44 salon kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

Comment 16 Christopher Brown 2008-01-16 01:53:36 UTC

(In reply to comment #14)
> The bug is really annoying so I've continued to use my old 100M NIC until there
> is  some sign/message of improvement. I can't use a server that stops dead ever
> so often.
> 
> If there is anything new to test, I can do that. Otherwise I'll be moving off to
> new hardware quite soon... (moving from 450MHz to 2x2.1GHz - won't that be an
> improvement)
> 

The two things to do would be:

1) Test 2.6.24 when it arrives (or install now from development repository)
2) File a bug upstream a bugzilla.kernel.org and reference this bug if you like

I'll change status to ASSIGNED though its difficult to know who best to ask to
take a look at it...

Comment 17 Bug Zapper 2008-05-14 14:00:27 UTC

This message is a reminder that Fedora 7 is nearing the end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 7. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '7'.

Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 7's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 7 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug. If you are unable to change the version, please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. If possible, it is recommended that you try the newest available Fedora distribution to see if your bug still exists.

Please read the Release Notes for the newest Fedora distribution to make sure it will meet your needs:
http://docs.fedoraproject.org/release-notes/

The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 18 Bug Zapper 2008-06-17 02:10:24 UTC

Fedora 7 changed to end-of-life (EOL) status on June 13, 2008. 
Fedora 7 is no longer maintained, which means that it will not 
receive any further security or bug fix updates. As a result we 
are closing this bug. 

If you can reproduce this bug against a currently maintained version 
of Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.