Red Hat Bugzilla – Bug 206416
Complete IO system hangs under NFS stress
Last modified: 2015-01-04 17:28:38 EST
Description of problem:
Complete IO system hangs under NFS stress.
Version-Release number of selected component (if applicable):
I am copying large files (each > 1GB) to a remote NFS share (8k block size,
tcp). Using dstat, I see that FC5 reads about 45 MB from disk (XFS) and then
writes 45 MB to network and so on.
While copying, I start 2 DVB recordings to the same disk (maybe this could be 2
other processes writing, if this is not DVB related).
After 5-10 minutes, the complete IO system hangs. WaitIO is at 99%. DVB driver
can't write to disk anymore. Copying does not progress anymore. Not only this
filesystem hangs, because when I try to copy a file from and to another disk,
this copies some MB (1-18) and then hangs also. So, the hang is not partition or
When I CTRL-C the large NFS copy process (works mostly), then everything goes
back to normal again and continues.
One time, I think, it happened even with no recordings running, but with the
recordings, it is easily reproducable.
Steps to Reproduce:
1. Initiate the copy of the large files to remote NFS share
2. start 2 DVB recordings (or maybe other write processes)
IO system hangs.
IO system should not hang.
I tried to reproduce this under 2.6.16, but there my DVB card is not being
It does not happen, if I copy the files by running the cp command on the remote
computer and using this computer as a remote share for the other one.
Also, it does not seem to happen, if I set NFS block size to 1k.
I even tried with another router with the same effect.
This computer has a Netgear 100 MBit network card:
02:0e.0 Ethernet controller: National Semiconductor Corporation DP83815
(MacPhyter) Ethernet Controller
Subsystem: Netgear FA311 / FA312 (FA311 with WoL HW)
I just managed to create a similar situation just by copying large files to a
remote NFS share (1k block size, remote is Suse 10.1, Kernel 2.6.16)) until the
remote share is 100% full.
The local cp process does not recognize this and hangs (not killable). After a
while (while a third remote computer copies data from this one over NFS), the
complete IO system hangs like described above. So, it has nothing to do with the
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.
Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.
This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.
Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.
In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed. See bug 207474 for further details.
If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.
If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.
This bug has been mass-closed along with all other bugs that
have been in NEEDINFO state for several months.
Due to the large volume of inactive bugs in bugzilla, this
is the only method we have of cleaning out stale bug reports
where the reporter has disappeared.
If you can reproduce this bug after installing all the
current updates, please reopen this bug.
If you are not the reporter, you can add a comment requesting
it be reopened, and someone will get to it asap.