Red Hat Bugzilla – Bug 198565
NFS/udp Data Corruption
Last modified: 2008-08-02 19:40:34 EDT
Description of problem:
Data in Binary files gets corrupted when transfering files
between machines via NFS using udp when the network is under a heavy load.
MTU: 1500 everywhere
Version-Release number of selected component (if applicable): kernel 2.6.15 and
Steps to Reproduce:
Copy a binary file from/to an NFS mount over a busy Gigabit
network multiple times, and diff it with the original each time.
(to busy up the network either copy multiple files or use 'sudo ping -f -s 60000')
***Script to copy file and diff:
# Script to test I/O on 2.6.15 kernel
# Give it filename and optional argument s to stop on error
set stop = 0set stop = 0
set file = $argv
if ($#argv > 1) then
if ($argv == 's') set stop = 1
\cp $file $file.copy
diff $file $file.copy
if ($status) then
echo "ERROR AT "`date`
if ($stop) exit 1
Binary files differ
Files should not differ
Was not fixed in kernel 2.6.17-1.2139_FC4
Created attachment 132270 [details]
script to copy files and diff them repeatedly
Could you please post the oops output?
Created attachment 132943 [details]
simple script to test with
Created attachment 132944 [details]
original binary file
Created attachment 132945 [details]
differing binary file after copy on NFS filesystem
also I'm removing one of the 'script' attachments because they are the same
Just curious... Does the same problem happen with TCP mounts?
copying files on TCP mounts works fine; we tried it
With busy networks you really want to use TCP since it know how to
deal with congestion much much much better than RPC/UDP will....
Now I'm a bit surprised that TCP is not comparable to UDP since
with UDP I'm sure you getting tons and tons of retransmits which
in turns is just added even more congestion to an already busy
network... to prove this, simply do a 'nfsstat -rc' using both UDP
and TCP. You will see the number of 'retrans' will be much much
smaller with TCP than UDP...
This problem seems to be related to our parallel processing system. When nodes
(~8 usually) are done processing they copy the data back to an nfs-mounted
We are now using mount options:
on the client nodes.
when using a low timeout of 1 (timeo=1) this bug can be typically be reproduced
in under 10 minutes. It happens even when using tcp mounts.
(In reply to comment #7)
> copying files on TCP mounts works fine; we tried it
> slower tho..
This actually FAILS, although there is more success with TCP
Try turning off soft mounts...
I'm Toby's supervisor and we thought it would help if I weighted in at this
point since we are feeling rather desperate.
To answer your latest question: we tried hard mounts between two machines and
ran our standard copy test with timeo=1 to try to make it fail. It ran
successfully for at least an hour, whereas soft mounts would fail within 5
But when we switched all of our machines to hard mounts a few days ago, with
the setting timeo=25, our users still got data corruption. It's hard to say if
this was any more or less frequent than with soft mounts, but one occurrence
was quite severe with over 30 glitches in a 2 GB file.
During this time there were a number of messages like this in the server log:
Jan 4 11:24:50 simba kernel: RPC: bad TCP reclen 0x08020703 (large)
Jan 4 11:24:50 simba kernel: RPC: bad TCP reclen 0x7902dc02 (large)
Jan 4 11:24:50 simba kernel: rpc-srv/tcp: nfsd: got error -104 when sending 140
bytes - shutting down socket
Jan 4 11:24:50 simba last message repeated 3 times
Jan 4 11:25:01 simba kernel: RPC: bad TCP reclen 0x40038d02 (non-terminal)
Jan 4 11:25:01 simba kernel: RPC: bad TCP reclen 0x5302b302 (non-terminal)
Jan 4 11:25:01 simba kernel: RPC: bad TCP reclen 0x0b02ff02 (large)
We are currently trying some parameter changes to see if they help this:
echo '8388608' > /proc/sys/net/core/rmem_default
echo '8388608' > /proc/sys/net/core/rmem_max
echo '8388608' > /proc/sys/net/core/wmem_default
echo '8388608' > /proc/sys/net/core/wmem_max
echo '32768 65536 8388608' > /proc/sys/net/ipv4/tcp_rmem
echo '32768 65536 8388608' > /proc/sys/net/ipv4/tcp_wmem
echo '8388608 8388608 8388608' > /proc/sys/net/ipv4/tcp_mem
We can reproduce this bug within 5 minutes with an NFS connection between two
workstations with two intervening gigabit switches, by running the test script
while continuously copying a large file or directory tree (the copy
occasionally gives Input/Output errors as well). The parameters that give
rapid failure are:
The mounts are all done with automount.
The failures occur with any kernel past 2.6.14.
The failures occur with tcp as well as udp.
Increasing the timeo to 10 or higher greatly reduces the failure rate: the
simple test will not fail but our users still get data corruption if the
network is busy.
The test also does not fail quickly with hard mounts but there is still
corruption at some times.
As I said, we're getting desperate. We're trying a few more things (including
a new main switch) but will then have to go back to the 2.6.14 kernel, which
may mean we are effectively stuck at Fedora 4 until this is resolved.
Current nfs mount options are
The corruption could be due to lower level network corruption . So would it
be possible to get a packet trace when the corruption happens? Something
"tethereal -w /tmp/bz198565.pcap host <server> ; bzip2 /tmp/bz198565.pcap"
What I'm looking for is TCP checksum error or TCP retransmissions or other
TCP errors. If these type of errors are indeed happening, then your network
is dropping packets which could be the cause of the corruption...
Created attachment 145104 [details]
Image showing glitch of inserted bytes then real image data out of register
That image definitely looks messed up... but without the packet
trace as described in Comment #15 its hard to tell what is happening...
We've discerned that the default nfs value for protocol is TCP instead of UDP.
The manpage states that it's UDP and that's why we thought all along that we
were using UDP mounts but instead we were using TCP! Oops.
It looks like UDP with these options has been successful in avoiding data
Now we get lots of retrans in 'nfsstat -rc' with _some_ of our machines--things
aren't perfect, and we would like to run NFS over TCP.
Another thing we need to correct from our earlier statements is the one about
hard versus soft mounts. When we first tested the hard mounts we forced it to
mount UDP, thinking we were testing the worst case, and it didn't show
corruption simply because it was UDP.
Attached are two tethereal outputs taken while using the 'tcp' option. In each
case the corruption occurred during the last 1-2 seconds of the tethereal output.
They show the following TCP errors consistently throughout the tethereal output:
-[Unreassembled Packet [incorrect TCP checksum]]
-NFS [TCP ACKed lost segment] [TCP Previous segment lost]
-[TCP ZeroWindow] [TCP ACKed lost segment] [TCP Previous segment lost]
Created attachment 145845 [details]
tethereal output using nfs over TCP
This report targets the FC3 or FC4 products, which have now been EOL'd.
Could you please check that it still applies to a current Fedora release, and
either update the target product or close it ?
This problem has occurred with every release kernel past 2.6.14 that we have
tested. It occurred in Fedora 5 and it occurs in Fedora 6 with the current
Congratulations. The problem with data corruption under TCP appears to be
solved with the latest Fedora kernel, 2.6.19-1.2911.6.5.fc6. The standard test
that fails in 5-10 minutes ran for 4 hours without a problem. I did not test
the previous 2.6.19 kernels.
Can we close this bug?
I've tested it for 50 minutes under the 2.6.20 kernel and it is OK there too.
So yes, you can close the bug.