Red Hat Bugzilla – Bug 454235
NFSv4 broken renames / deletes when write load on server
Last modified: 2009-10-29 14:47:43 EDT
Description of problem:
A couple/few months ago, something changed in one of the applicable packages
that has broken renames (and maybe more) on NFS mounts. On very large files
(2GB+), renames by two programs I use often (rsync & rdiff-backup) will
sometimes fail and the program will exit with errors. This never used to happen
and was 100% reliable even with 4GB+ files.
Verbose rdiff-backup log excerpt (see attached fuller log):
Renaming /data/Netbak/z/firewall/_/shared/shared/NETBAK/rdiff-backup.tmp.1 to
OSError: [Errno 2] No such file or directory
Rsync will also do the same. It will rsync the big file to a temp file and it
will be 100% and issue the rename and die with an error. However, the rename
appears to have occurred successfully! A subsequent rsync will make no changes
(and no errors).
It seems to me that the program is issuing the rename and something in nfs is
saying error even though the rename is occurring. Maybe a race condition or nfs
My hunch is this is an NFS issue, not an rsync issue, but I may be wrong.
We are using only NFSv4.
I have switched to sync mounts to see if that has anything to do with this and
will report back. The errors occurred in async mode, which we use(d) for
Version-Release number of selected component (if applicable):
Happens every few days.
Steps to Reproduce:
1. mount some nfs
2. run rsync/rdiff-backup on client on some big files
Rename error, program dies.
Rename succeeds, program keeps going as normal
I should note the rsync/rdiff-backup source is a remote (internet 2Mb or so
upload link) connection. So for these big 2GB+ files you'll have a long wait
(10-30mins) between the open / writing and renaming.
The rsync errors look like (swiped from an rsync list post as I hosed my rsync
debug output by accident):
-> "RX_LVL_HH_Inc_Amt10_14_9MM.dat.0": No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(702)
Created attachment 311130 [details]
more of the rdiff-backup verbose log showing failure
Here's what rsync is doing:
"shared/NETBAK/ORTHO2BU.dmp": No such file or directory (2)
But the rename IS succeeding and everything is ok (a rerun of rsync returns
almost immediately confirming a successful sync).
Sound like there is some type of race condition in the rename code...
I'll try to reproduce this locally...
Every once in a while I would see a strange glitch on the cmd line or in
Nautilus also, while accessing the NFS share. Always on renames or deletes
(though I'm not 100% sure on the deletes).
Just 2 days ago I switched my NFS exports to be sync instead of async. I'm
hoping this will help. So far it seems ok, but it may take a week or 2 to show up.
I can now confirm this error occurs if the NFSv4 export is exported in sync
mode. So sync/async mode does not affect this bug. Now I'm really starting to
worry as I really needed a workaround.
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '8'.
Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 8's end of life.
Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 8 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.
Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.
The process we are following is described here:
Is this happening in a F-9 or F-10 kernel?
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.
If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.
Thank you for reporting this bug and we are sorry it could not be fixed.
I just upgraded from F8 to F10. I will report back and reopen if I see this again.
Since my F10 upgrade I have not seen the problem yet. It may have been fixed in F10. I will reopen if it pops up.
Spoke too soon. This just hit me right now during a nautilus GUI file delete. Deleting 8 files. It got to "4 left" and stopped:
"Error while deleting. There was an error deleting <filename>. Error removing file: No such file or directory"
I still have the error up and looked at the state of the files and it has already deleted <filename> but not the ones after it. I just hit "skip" and it completed ok and all files were deleted as desired.
I was doing a big (200MB) file copy at the same time onto the NFS fs (ie: big writes). I'm pretty sure that's why it hit now. Perhaps the trick is: load NFS server down with a big write; then try to delete smaller files in batches.
Anyhow, this confirms the bug still exists in F10
And, there is nothing interesting (ie: related) at all in the client or server dmesg/messages.
Just to be clear, the error is happen from a F-10 client to a
Also what arguments are you giving to rdiff-backup?
This bug has occurred on F8->F8 and on F10 client -> F8 server. I have not upgraded my NFS server yet. If this bug is server-side then I suppose I'll have to do so to confirm it's present in F10. If no one else can reproduce this in F9 or 10 then perhaps it is server-side and since fixed.
As I recall it is simple to reproduce, just hammer the heck out of the NFS server with *local* writes, and at the same time try to delete a dir of tons of small files over NFS.
Maybe best to forget about rdiff-backup, the bug is a general bug occurring in all programs and shell, as per comment #2 and #11. I've moved my rdiff-backup jobs directly onto the NFS server as the bug was unbearable.
F8 was a bit "challenged" when it came to its NFSv4 stability :-\
So would suggest upgrading the sever to see if the problem persists.
For now I'll close this bug but please feel free reopen it if the
problem does indeed come back...
Thanks you for using Fedora...
I upgraded the file server to F10. This problem still exists in the latest F10 (all rpms updated as of last week).
It's the same issue. Run rdiff-backup using a NFS share as the destination, or anything that does a large number of renames under load to a NFS share. It will randomly bomb on renames. This bug is not in rdiff-backup, as it can be reproduced from the shell or things like galeon. If required, I can easily write a simple bash test case loop.
Exception '[Errno 17] File exists: '/data/Bak/ThriceDailyOnsite/Baks/rdiff-backup-data/increments/home/trevor/.wine/drive_c/windows/system32/spool'' raised of class '<type 'exceptions.OSError'>':
File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 304, in error_check_Main
File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 324, in Main
File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 280, in take_action
elif action == "backup": Backup(rps, rps)
File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 346, in Backup
File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 38, in Mirror
File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 232, in patch
File "/usr/lib/python2.5/site-packages/rdiff_backup/rorpiter.py", line 284, in __call__
File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 625, in start_process
File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 205, in get_mirror_inc_rps
alt_inc, inc_rp = find_inc_pair(index, mirror_rp, alt_mirror, alt_inc)
File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 191, in find_inc_pair
if check_new_index(inc_root, trial_inc_index, make_dirs = 1):
File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 130, in check_new_index
parent_result = wrap_call(make_parent, rp)
File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 111, in wrap_call
try: result = func(*args)
File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 124, in make_parent
File "/usr/lib/python2.5/site-packages/rdiff_backup/rpath.py", line 988, in makedirs
File "/usr/lib/python2.5/os.py", line 171, in makedirs
Exception '[Errno 2] No such file or directory: '/data/Bak/ThriceDailyOnsite/Baks/bak/Diskinfo/rdiff-backup.tmp.407'' raised of class '<type 'exceptions.OSError'>':
Always a different file it craps out on.
My NFS is mounted nfs4 rw,bg,hard,intr,noatime,nosuid,proto=tcp,timeo=15,rsize=8192,wsize=8192
My rdiff-backup args:
/bin/nice -19 rdiff-backup [bunch of --excludes] [bunch of --includes] --include /home --exclude / / /data/Bak/ThriceDailyOnsite/Baks
I've had a similar problem on a Suse SLES 9 system.
I think it is because the NFS protocol requires file operations (rename, delete, etc.) be written to disk before the reply is sent back to the client.
I was writing 300MB to disk locally (in a test, copying from /dev/urandom), and then trying to rename a (different) file remotely from an NFS client. The rename failed on the client, but in fact worked on the server.
Doing a tcpdump showed the Windows client sent 3 NFS rename requests, 4 seconds apart (it timed out shortly after, and reported a failure). The (single) "reply ok" from the server didn't get sent until about 100 seconds later!
As far as I could tell the delay was caused by the server flushing the 300MB from cache to disk (I guess the updates have to be in order), before it would tell the client the rename was done. I got around it by doing a "sync" on the server before trying the rename on the client.
I would have thought exporting the filesystem "async" would also fix it. You say "sync" doesn't fix it, but that's the default! (at least on SLES). Have you tried exporting the filesystem async? You do risk data lose in some circumstances, but that would allow the NFS server to lie and tell the client the rename/delete has been done and committed to disk, when it hasn't really.
Yes, I was originally using explicit async. I have not tried async in a long while though. I am changing back to async and will test after my next reboot (might be a little while). I probably have not tried async with F10 yet.
Your theory and packet traces make sense, though you'd think the protocol would have some method to get around write starvation on the server. This ties in with the various write starvation bugs I am following as well. Linux behaves quite badly when doing massive writes and trying to get little ones done in a timely manner.
Did going async completely solve the problem for you?
I just tried it, and NFS exporting the filesystem with the async option did fix it, at least for my test case (which has been 100% reliable at reproducing the problem so far).
We have gone with the work-around of running "sync" to force changed blocks to disk after the rsync, and before creating the file that the NFS client renames. This isn't a general solution to the problem, obviously.
I'm not sure who is to blame, here. It seems to be a disconnect between the semantics of NFS (when certain operations are said to be complete, it means the changes have made it to permanent storage), and the semantics of Linux I/O (when the same operations are said to be complete, it just means they've made it to the cache and have not necessarily been written to disk).
You could also try tuning the cache, so it gets written to disk in a more timely manner. Here's what some random web page say about it:
"Tuning Recommendations for write-heavy operations
The usual issue that people who are writing heavily encouter is that Linux buffers too much information at once, in its attempt to improve efficiency. This is particularly troublesome for operations that require synchronizing the filesystem using system calls like fsync. If there is a lot of data in the buffer cache when this call is made, the system can freeze for quite some time to process the sync."
There's lots of details there about the various kernel parameters for tuning the cache.
I am now running async again. I will see if the problem comes up again.
Just had the problem hit again. I am doing a 750GB (single) file delete on a 5TB md RAID6 array that was 86% full. The file delete took over 20 minutes. During that time doing file deletes/moves with nautilus over NFS I got multiple "file does not exist" errors. As usual, the operations were performed anyhow, and the error was bogus.
So that seems like a good way to easily reproduce this problem. Delete a multi-hundred GB file on a nearly-full fs. Not sure if the RAID6 is making it worse or not. There was still idle time that top showed, so the system wasn't fully loaded.
PS: what does CLOSED DEFERRED mean?