Bug 206416 - Complete IO system hangs under NFS stress
Summary: Complete IO system hangs under NFS stress
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-09-14 09:21 UTC by Thomas Boerkel
Modified: 2015-01-04 22:28 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-11-24 22:59:28 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Thomas Boerkel 2006-09-14 09:21:48 UTC
Description of problem:
Complete IO system hangs under NFS stress.

Version-Release number of selected component (if applicable):
Kernel 2.6.17-1.2174_FC5

How reproducible:
I am copying large files (each > 1GB) to a remote NFS share (8k block size,
tcp). Using dstat, I see that FC5 reads about 45 MB from disk (XFS) and then
writes 45 MB to network and so on.
While copying, I start 2 DVB recordings to the same disk (maybe this could be 2
other processes writing, if this is not DVB related). 
After 5-10 minutes, the complete IO system hangs. WaitIO is at 99%. DVB driver
can't write to disk anymore. Copying does not progress anymore. Not only this
filesystem hangs, because when I try to copy a file from and to another disk,
this copies some MB (1-18) and then hangs also. So, the hang is not partition or
disk specific.
When I CTRL-C the large NFS copy process (works mostly), then everything goes
back to normal again and continues.
One time, I think, it happened even with no recordings running, but with the
recordings, it is easily reproducable.

Steps to Reproduce:
1. Initiate the copy of the large files to remote NFS share
2. start 2 DVB recordings (or maybe other write processes)
  
Actual results:
IO system hangs.

Expected results:
IO system should not hang.

Additional info:
I tried to reproduce this under 2.6.16, but there my DVB card is not being
recognized.

It does not happen, if I copy the files by running the cp command on the remote
computer and using this computer as a remote share for the other one.

Also, it does not seem to happen, if I set NFS block size to 1k.

I even tried with another router with the same effect.

This computer has a Netgear 100 MBit network card:
02:0e.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller
        Subsystem: Netgear FA311 / FA312 (FA311 with WoL HW)

Comment 1 Thomas Boerkel 2006-09-14 09:38:56 UTC
I just managed to create a similar situation just by copying large files to a
remote NFS share (1k block size, remote is Suse 10.1, Kernel 2.6.16)) until the
remote share is 100% full. 
The local cp process does not recognize this and hangs (not killable). After a
while (while a third remote computer copies data from this one over NFS), the
complete IO system hangs like described above. So, it has nothing to do with the
DVB drivers.

Comment 2 Dave Jones 2006-10-17 00:25:48 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 3 Dave Jones 2006-11-24 22:59:28 UTC
This bug has been mass-closed along with all other bugs that
have been in NEEDINFO state for several months.

Due to the large volume of inactive bugs in bugzilla, this
is the only method we have of cleaning out stale bug reports
where the reporter has disappeared.

If you can reproduce this bug after installing all the
current updates, please reopen this bug.

If you are not the reporter, you can add a comment requesting
it be reopened, and someone will get to it asap.

Thank you.


Note You need to log in before you can comment on or make changes to this bug.