Bug 454235 - NFSv4 broken renames / deletes when write load on server
Summary: NFSv4 broken renames / deletes when write load on server
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 10
Hardware: All
OS: Linux
low
low
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-07 05:18 UTC by Trevor Cordes
Modified: 2009-10-29 18:47 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2009-09-29 20:09:23 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
more of the rdiff-backup verbose log showing failure (2.70 KB, text/plain)
2008-07-07 05:18 UTC, Trevor Cordes
no flags Details

Description Trevor Cordes 2008-07-07 05:18:33 UTC
Description of problem:
A couple/few months ago, something changed in one of the applicable packages
that has broken renames (and maybe more) on NFS mounts.  On very large files
(2GB+), renames by two programs I use often (rsync & rdiff-backup) will
sometimes fail and the program will exit with errors.  This never used to happen
and was 100% reliable even with 4GB+ files.

Verbose rdiff-backup log excerpt (see attached fuller log):
Renaming /data/Netbak/z/firewall/_/shared/shared/NETBAK/rdiff-backup.tmp.1 to
/data/Netbak/z/firewall/_/shared/shared/NETBAK/ORTHO2BU.dmp
...
OSError: [Errno 2] No such file or directory

Rsync will also do the same.  It will rsync the big file to a temp file and it
will be 100% and issue the rename and die with an error.  However, the rename
appears to have occurred successfully!  A subsequent rsync will make no changes
(and no errors).

It seems to me that the program is issuing the rename and something in nfs is
saying error even though the rename is occurring.  Maybe a race condition or nfs
caching issue.

My hunch is this is an NFS issue, not an rsync issue, but I may be wrong.
We are using only NFSv4.

I have switched to sync mounts to see if that has anything to do with this and
will report back.  The errors occurred in async mode, which we use(d) for
performance reasons.

Version-Release number of selected component (if applicable):
kernel-PAE-2.6.25.4-10.fc8
nfs-utils-lib-1.1.0-4.fc8
nfs-utils-1.1.1-2.fc8
rdiff-backup-1.0.5-5.fc8
librsync-0.9.7-11.fc8
rsync-2.6.9-5.fc8

How reproducible:
Happens every few days.

Steps to Reproduce:
1. mount some nfs
2. run rsync/rdiff-backup on client on some big files
  
Actual results:
Rename error, program dies.

Expected results:
Rename succeeds, program keeps going as normal

Additional info:
I should note the rsync/rdiff-backup source is a remote (internet 2Mb or so
upload link) connection.  So for these big 2GB+ files you'll have a long wait
(10-30mins) between the open / writing and renaming.

The rsync errors look like (swiped from an rsync list post as I hosed my rsync
debug output by accident):
rsync: rename 
"/opt/pivot/pivotlink3/virtual/sanvita/http/data/.RX_LVL_HH_Inc_Amt10_14_9MM.dat.0.P9Iy08"

-> "RX_LVL_HH_Inc_Amt10_14_9MM.dat.0": No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(702)

Comment 1 Trevor Cordes 2008-07-07 05:18:33 UTC
Created attachment 311130 [details]
more of the rdiff-backup verbose log showing failure

Comment 2 Trevor Cordes 2008-07-25 09:36:31 UTC
Here's what rsync is doing:

shared/NETBAK/ORTHO2BU.dmp
rsync: rename
"/data/Netbak/zzz/firewall/_/shared/shared/NETBAK/.ORTHO2BU.dmp.njyXED" ->
"shared/NETBAK/ORTHO2BU.dmp": No such file or directory (2)

But the rename IS succeeding and everything is ok (a rerun of rsync returns
almost immediately confirming a successful sync).

Comment 3 Steve Dickson 2008-07-25 13:09:33 UTC
Sound like there is some type of race condition in the rename code... 

I'll try to reproduce this locally... 

Comment 4 Trevor Cordes 2008-07-26 17:39:45 UTC
Every once in a while I would see a strange glitch on the cmd line or in
Nautilus also, while accessing the NFS share.  Always on renames or deletes
(though I'm not 100% sure on the deletes).

Just 2 days ago I switched my NFS exports to be sync instead of async.  I'm
hoping this will help.  So far it seems ok, but it may take a week or 2 to show up.


Comment 5 Trevor Cordes 2008-07-29 13:06:45 UTC
I can now confirm this error occurs if the NFSv4 export is exported in sync
mode.  So sync/async mode does not affect this bug.  Now I'm really starting to
worry as I really needed a workaround.



Comment 6 Bug Zapper 2008-11-26 10:58:18 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 7 Steve Dickson 2008-12-02 16:09:19 UTC
Is this happening in a F-9 or F-10 kernel?

Comment 8 Bug Zapper 2009-01-09 07:50:58 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 9 Trevor Cordes 2009-02-23 13:28:19 UTC
I just upgraded from F8 to F10.  I will report back and reopen if I see this again.

Comment 10 Trevor Cordes 2009-03-31 14:38:11 UTC
Since my F10 upgrade I have not seen the problem yet.  It may have been fixed in F10.  I will reopen if it pops up.

Comment 11 Trevor Cordes 2009-04-09 04:37:06 UTC
Spoke too soon.  This just hit me right now during a nautilus GUI file delete.  Deleting 8 files.  It got to "4 left" and stopped:

"Error while deleting. There was an error deleting <filename>.  Error removing file: No such file or directory"

I still have the error up and looked at the state of the files and it has already deleted <filename> but not the ones after it.  I just hit "skip" and it completed ok and all files were deleted as desired.

I was doing a big (200MB) file copy at the same time onto the NFS fs (ie: big writes).  I'm pretty sure that's why it hit now.  Perhaps the trick is: load NFS server down with a big write; then try to delete smaller files in batches.

Anyhow, this confirms the bug still exists in F10

kernel-PAE-2.6.27.19-170.2.35.fc10.i686
nfs-utils-1.1.4-8.fc10.i386

Comment 12 Trevor Cordes 2009-04-09 04:40:31 UTC
And, there is nothing interesting (ie: related) at all in the client or server dmesg/messages.

Comment 13 Steve Dickson 2009-06-11 14:07:06 UTC
Just to be clear, the error is happen from a F-10 client to a
F-10 server?

Also what arguments are you giving to rdiff-backup?

Comment 14 Trevor Cordes 2009-06-11 14:42:14 UTC
This bug has occurred on F8->F8 and on F10 client -> F8 server.  I have not upgraded my NFS server yet.  If this bug is server-side then I suppose I'll have to do so to confirm it's present in F10.  If no one else can reproduce this in F9 or 10 then perhaps it is server-side and since fixed.

As I recall it is simple to reproduce, just hammer the heck out of the NFS server with *local* writes, and at the same time try to delete a dir of tons of small files over NFS.

Maybe best to forget about rdiff-backup, the bug is a general bug occurring in all programs and shell, as per comment #2 and #11.  I've moved my rdiff-backup jobs directly onto the NFS server as the bug was unbearable.

Comment 15 Steve Dickson 2009-06-11 15:22:07 UTC
F8 was a bit "challenged" when it came to its NFSv4 stability :-\
So would suggest upgrading the sever to see if the problem persists.
For now I'll close this bug but please feel free reopen it if the
problem does indeed come back... 

Thanks you for using Fedora...

Comment 16 Trevor Cordes 2009-07-27 21:15:01 UTC
I upgraded the file server to F10.  This problem still exists in the latest F10 (all rpms updated as of last week).

It's the same issue.  Run rdiff-backup using a NFS share as the destination, or anything that does a large number of renames under load to a NFS share.  It will randomly bomb on renames.  This bug is not in rdiff-backup, as it can be reproduced from the shell or things like galeon.  If required, I can easily write a simple bash test case loop.

Run 1:
Exception '[Errno 17] File exists: '/data/Bak/ThriceDailyOnsite/Baks/rdiff-backup-data/increments/home/trevor/.wine/drive_c/windows/system32/spool'' raised of class '<type 'exceptions.OSError'>':
  File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 304, in error_check_Main
    try: Main(arglist)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 324, in Main
    take_action(rps)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 280, in take_action
    elif action == "backup": Backup(rps[0], rps[1])
  File "/usr/lib/python2.5/site-packages/rdiff_backup/Main.py", line 346, in Backup
    backup.Mirror(rpin, rpout)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 38, in Mirror
    DestS.patch(dest_rpath, source_diffiter)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 232, in patch
    ITR(diff.index, diff)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/rorpiter.py", line 284, in __call__
    branch.start_process(*args)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/backup.py", line 625, in start_process
    self.CCPP.get_rorps(index), self.basis_root_rp)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 205, in get_mirror_inc_rps
    alt_inc, inc_rp = find_inc_pair(index, mirror_rp, alt_mirror, alt_inc)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 191, in find_inc_pair
    if check_new_index(inc_root, trial_inc_index, make_dirs = 1):
  File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 130, in check_new_index
    parent_result = wrap_call(make_parent, rp)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 111, in wrap_call
    try: result = func(*args)
  File "/usr/lib/python2.5/site-packages/rdiff_backup/longname.py", line 124, in make_parent
    parent.makedirs()
  File "/usr/lib/python2.5/site-packages/rdiff_backup/rpath.py", line 988, in makedirs
    self.conn.os.makedirs(self.path)
  File "/usr/lib/python2.5/os.py", line 171, in makedirs
    mkdir(name, mode)

Run 2:
Exception '[Errno 2] No such file or directory: '/data/Bak/ThriceDailyOnsite/Baks/bak/Diskinfo/rdiff-backup.tmp.407'' raised of class '<type 'exceptions.OSError'>':
...

Always a different file it craps out on.

My NFS is mounted nfs4 rw,bg,hard,intr,noatime,nosuid,proto=tcp,timeo=15,rsize=8192,wsize=8192

Comment 17 Trevor Cordes 2009-07-27 21:17:59 UTC
My rdiff-backup args:

/bin/nice -19 rdiff-backup [bunch of --excludes] [bunch of --includes] --include /home --exclude / / /data/Bak/ThriceDailyOnsite/Baks

Comment 18 Sandor 2009-08-26 15:25:36 UTC
I've had a similar problem on a Suse SLES 9 system.

I think it is because the NFS protocol requires file operations (rename, delete, etc.) be written to disk before the reply is sent back to the client.

I was writing 300MB to disk locally (in a test, copying from /dev/urandom), and then trying to rename a (different) file remotely from an NFS client. The rename failed on the client, but in fact worked on the server.

Doing a tcpdump showed the Windows client sent 3 NFS rename requests, 4 seconds apart (it timed out shortly after, and reported a failure). The (single) "reply ok" from the server didn't get sent until about 100 seconds later!

As far as I could tell the delay was caused by the server flushing the 300MB from cache to disk (I guess the updates have to be in order), before it would tell the client the rename was done. I got around it by doing a "sync" on the server before trying the rename on the client.

I would have thought exporting the filesystem "async" would also fix it. You say "sync" doesn't fix it, but that's the default! (at least on SLES). Have you tried exporting the filesystem async? You do risk data lose in some circumstances, but that would allow the NFS server to lie and tell the client the rename/delete has been done and committed to disk, when it hasn't really.

Comment 19 Trevor Cordes 2009-08-26 17:45:25 UTC
Yes, I was originally using explicit async.  I have not tried async in a long while though.  I am changing back to async and will test after my next reboot (might be a little while).  I probably have not tried async with F10 yet.

Your theory and packet traces make sense, though you'd think the protocol would have some method to get around write starvation on the server.  This ties in with the various write starvation bugs I am following as well.  Linux behaves quite badly when doing massive writes and trying to get little ones done in a timely manner.

Did going async completely solve the problem for you?

Comment 20 Sandor 2009-08-27 08:42:04 UTC
I just tried it, and NFS exporting the filesystem with the async option did fix it, at least for my test case (which has been 100% reliable at reproducing the problem so far).

We have gone with the work-around of running "sync" to force changed blocks to disk after the rsync, and before creating the file that the NFS client renames. This isn't a general solution to the problem, obviously.

I'm not sure who is to blame, here. It seems to be a disconnect between the semantics of NFS (when certain operations are said to be complete, it means the changes have made it to permanent storage), and the semantics of Linux I/O (when the same operations are said to be complete, it just means they've made it to the cache and have not necessarily been written to disk).

You could also try tuning the cache, so it gets written to disk in a more timely manner. Here's what some random web page say about it:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

"Tuning Recommendations for write-heavy operations

The usual issue that people who are writing heavily encouter is that Linux buffers too much information at once, in its attempt to improve efficiency. This is particularly troublesome for operations that require synchronizing the filesystem using system calls like fsync. If there is a lot of data in the buffer cache when this call is made, the system can freeze for quite some time to process the sync."

There's lots of details there about the various kernel parameters for tuning the cache.

Comment 21 Trevor Cordes 2009-09-26 14:04:43 UTC
I am now running async again.  I will see if the problem comes up again.

Comment 22 Trevor Cordes 2009-10-29 18:47:43 UTC
Just had the problem hit again.  I am doing a 750GB (single) file delete on a 5TB md RAID6 array that was 86% full.  The file delete took over 20 minutes.  During that time doing file deletes/moves with nautilus over NFS I got multiple "file does not exist" errors.  As usual, the operations were performed anyhow, and the error was bogus.

So that seems like a good way to easily reproduce this problem.  Delete a multi-hundred GB file on a nearly-full fs.  Not sure if the RAID6 is making it worse or not.  There was still idle time that top showed, so the system wasn't fully loaded.

PS: what does CLOSED DEFERRED mean?


Note You need to log in before you can comment on or make changes to this bug.