Bug 113636 - NFS: bad interaction between rsync and NFS
NFS: bad interaction between rsync and NFS
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel (Show other bugs)
3.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: Steve Dickson
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-01-15 19:29 EST by Ben Woodard
Modified: 2007-11-30 17:07 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2004-09-02 00:31:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
NFS packet trace (8.14 KB, text/plain)
2004-02-04 16:40 EST, Ben Woodard
no flags Details
2nd NFS packet trace (8.14 KB, text/plain)
2004-02-04 16:41 EST, Ben Woodard
no flags Details
tcpdump capture file to go with my comments above (70.83 KB, application/octet-stream)
2004-02-26 11:42 EST, Neil Horman
no flags Details
Current patch (4.96 KB, patch)
2004-05-19 21:11 EDT, Ben Woodard
no flags Details | Diff

  None (edit)
Description Ben Woodard 2004-01-15 19:29:42 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1)
Gecko/20030225

Description of problem:
There seems to be some sort of problem with NFS interacts with rsync
on RHEL3. If you rsync a directory from one host to another, then rm
-rf the directory tree, then rsync it again, another machine gets
stale file handles.

Version-Release number of selected component (if applicable):
kernel-2.4.21-7EL

How reproducible:
Always

Steps to Reproduce:
/tftpboot is a local filesystem on tdevi.  
It is NFS mounted on /mnt on tdev0:

tdevi:/tftpboot on /mnt type nfs (rw,addr=134.9.36.120)

1) Create files

tdevi /tftpboot/garlick > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  29440.67 bytes/sec
total size is 43854  speedup is 0.99

2) See them on NFS client

tdev0 /mnt/garlick > ls -l storage
total 52
-r--r--r--    1 garlick  garlick       374 Jun  3  2003 ftpaccess.ocf
-r--r--r--    1 garlick  garlick       359 Jun  3  2003 ftpusers.ocf
-r--r--r--    1 garlick  garlick     43121 Jun  3  2003 verifiertable.ocf

3) Remove them and rsync them again

tdevi /tftpboot/garlick > rm -f storage/*
tdevi /tftpboot/garlick > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99

4) Try to see them on NFS client

Interestingly, if I replace the rsync with a cp, I don't see the problem.

Actual Results:  # ls -l storage
ls: storage/verifiertable.ocf: Stale NFS file handle
ls: storage/ftpaccess.ocf: Stale NFS file handle
ls: storage/ftpusers.ocf: Stale NFS file handle
total 0

Expected Results:  You see the files.

Additional info:

This has been verified on both ia32 and ia64. (At first we thought it
was ia64 only.)

The command that we used to gather the log was:
ethereal -i eth1 -R '((ip.src == 192.168.20.1 && ip.dst ==
192.168.20.198) || (ip.src == 192.168.20.198 && ip.dst == 192.168.20.1))'
Comment 1 Ben Woodard 2004-01-15 19:32:33 EST
Here is some ethereal output for the NFS interchange before and after
the remove/rsync.  20.1 is the client, 20.198 is the server.

successful ls -l

 74.176978 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x2e2f18ee
 74.177017 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x2e2f18ee
 74.177284 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x2f2f18ee
 74.177302 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x2f2f18ee
 74.177532 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x302f18ee
 74.177551 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x302f18ee
 74.177782 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x312f18ee
 74.177799 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x312f18ee
 74.178018 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x322f18ee
 74.178070 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x322f18ee
 74.178265 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x332f18ee
 74.178281 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x332f18ee
 74.178513 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x342f18ee
 74.178530 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x342f18ee
 74.178765 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x352f18ee
 74.178781 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x352f18ee
 74.179013 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x362f18ee
 74.179071 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x362f18ee
 74.179262 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x372f18ee
 74.179314 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x372f18ee
 74.179511 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x382f18ee
 74.179528 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x382f18ee
 74.179765 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x392f18ee
 74.179780 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x392f18ee

stale ls -l

 91.032865 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x3a2f18ee
 91.033120 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x3a2f18ee
 91.033341 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x3b2f18ee
 91.033410 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x3b2f18ee
 91.033611 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x3c2f18ee
 91.033662 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x3c2f18ee
 91.033855 192.168.20.1 -> 192.168.20.198 NFSACL V3 GETACL Call XID
0x3d2f18ee
 91.033877 192.168.20.198 -> 192.168.20.1 NFSACL V3 GETACL Reply XID
0x3d2f18ee
 91.034104 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x3e2f18ee
 91.034125 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x3e2f18ee
 91.034358 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x3f2f18ee
 91.034387 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x3f2f18ee
 91.034590 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x402f18ee
 91.034611 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x402f18ee
 91.034845 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x412f18ee
 91.034865 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x412f18ee
 91.035150 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x422f18ee
 91.035192 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x422f18ee
 91.035339 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x432f18ee
 91.035396 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x432f18ee
 91.035594 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x442f18ee
 91.035614 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x442f18ee
 91.035859 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x452f18ee
 91.035914 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x452f18ee
 91.036096 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x462f18ee
 91.036132 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x462f18ee
 91.036347 192.168.20.1 -> 192.168.20.198 NFS V3 GETATTR Call XID
0x472f18ee
 91.036369 192.168.20.198 -> 192.168.20.1 NFS V3 GETATTR Reply XID
0x472f18ee

Comment 2 Ben Woodard 2004-01-15 19:42:14 EST
reread the notes they sent me. It doesn't involve netapps anymore.
Comment 3 Steve Dickson 2004-01-16 08:00:39 EST
Ben,

Could you make the binary files (i.e. ethereal -w /tmp/data) of these 
traces available?
Comment 5 Ben Woodard 2004-02-04 12:22:14 EST
From: 	Makia Minich <makia@llnl.gov>
To: 	Jim Garlick <garlick@llnl.gov>, bwoodard@llnl.gov
Cc: 	braby1@llnl.gov
Subject: 	Re: update on Bugzilla #113636
Date: 	Tue, 03 Feb 2004 16:11:00 -0800	
Jim / Ben,

OK, here's the result of my testing:

First, to get a heads up, I attempted to re-create what was on the
bugzilla 
report with toadi and toad8:

# toadi /root > mkdir /tftpboot/makia
# toadi /root > cd /tftpboot/makia
# toadi /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadi /tftpboot/makia > 


# toad8 /root > mount etoadi:/tftpboot/makia /mnt
# toad8 /root > ls -l /mnt/storage/
total 52
-r--r--r--    1 root     root          374 Jun  3  2003 ftpaccess.ocf
-r--r--r--    1 root     root          359 Jun  3  2003 ftpusers.ocf
-r--r--r--    1 root     root        43121 Jun  3  2003 verifiertable.ocf
# toad8 /root > 

# toadi /tftpboot/makia > rm -rf storage/*
# toadi /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadi /tftpboot/makia > 

# toad8 /root > ls -l /mnt/storage/ ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 15:48:08 PST 2004
# toad8 /root > ls -l /mnt/storage/ ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 15:49:10 PST 2004
# toad8 /root > 


This backs up Jim's claim that it persists indefinetely (at least
longer than 60 seconds... I tried many times after that but to no
avail).  For my second test, I used toadi running the
2.4.21-p4smp-27chaos kernel and toad8 running the 2.4.21-9.EL kernel:

# toad8 /root > mount etoadi:/tftpboot/makia /mnt
# toad8 /root > ls -l /mnt/storage
total 52
-r--r--r--    1 root     root          374 Jun  3  2003 ftpaccess.ocf
-r--r--r--    1 root     root          359 Jun  3  2003 ftpusers.ocf
-r--r--r--    1 root     root        43121 Jun  3  2003 verifiertable.ocf
# toad8 /root > 

# toadi /tftpboot/makia > rm -rf storage/*
# toadi /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadi /tftpboot/makia > 

# toad8 /root > ls -l /mnt/storage ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 15:58:25 PST 2004
# toad8 /root > ls -l /mnt/storage ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 16:00:02 PST 2004
# toad8 /root > 

Looks like a client side NFS isn't the problem (not that it was in
question).  So, for the next test, I will use toad8 (client) and toadj
(server) both running with the 2.4.21-9.EL kernels:

# toadj /root > mkdir /tftpboot/makia
# toadj /root > cd /tftpboot/makia/
# toadj /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadj /tftpboot/makia > 

# toad8 /root > mount etoadj:/tftpboot/makia /mnt
# toad8 /root > ls -l /mnt/storage
total 52
-r--r--r--    1 root     root          374 Jun  3  2003 ftpaccess.ocf
-r--r--r--    1 root     root          359 Jun  3  2003 ftpusers.ocf
-r--r--r--    1 root     root        43121 Jun  3  2003 verifiertable.ocf
# toad8 /root > 

# toadj /tftpboot/makia > rm -rf storage/*
# toadj /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadj /tftpboot/makia > 

# toad8 /root > ls -l /mnt/storage ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 16:07:12 PST 2004
# toad8 /root > ls -l /mnt/storage ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Tue Feb  3 16:08:26 PST 2004
# toad8 /root > 

Unless someone can see something that I missed, it looks like the
problem is still occuring.

Comment 6 Ben Woodard 2004-02-04 16:38:40 EST
From: 	Makia Minich <makia@llnl.gov>
To: 	Ben Woodard <bwoodard@llnl.gov>
Cc: 	Jim Garlick <garlick@llnl.gov>, Ryan Braby <braby1@llnl.gov>
Subject: 	Re: update on Bugzilla #113636
Date: 	Wed, 04 Feb 2004 11:19:11 -0800	
OK, here's the testing with ethereal going.  Attached are the ethereal
outputs 
(the first line in each file is the ethereal command that I used):

# toadj /root > mkdir /tftpboot/makia
# toadj /root > cd /tftpboot/makia/
# toadj /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadj /tftpboot/makia > 

# toad8 /root > mount etoadj:/tftpboot/makia /mnt
# toad8 /root > ls /mnt/storage 
ftpaccess.ocf  ftpusers.ocf  verifiertable.ocf
# toad8 /root > ls -l /mnt/storage
total 52
-r--r--r--    1 root     root          374 Jun  3  2003 ftpaccess.ocf
-r--r--r--    1 root     root          359 Jun  3  2003 ftpusers.ocf
-r--r--r--    1 root     root        43121 Jun  3  2003 verifiertable.ocf
# toad8 /root > 

<SEE SECTIONS TITLED "MOUNT" AND "LS">

# toadj /tftpboot/makia > rm -rf storage/*
# toadj /tftpboot/makia > rsync -av /var/dist/storage .
building file list ... done
storage/
storage/ftpaccess.ocf
storage/ftpusers.ocf
storage/verifiertable.ocf
wrote 44093 bytes  read 68 bytes  88322.00 bytes/sec
total size is 43854  speedup is 0.99
# toadj /tftpboot/makia > 

# toad8 /root > ls -l /mnt/storage ; date
ls: /mnt/storage/verifiertable.ocf: Stale NFS file handle
ls: /mnt/storage/ftpaccess.ocf: Stale NFS file handle
ls: /mnt/storage/ftpusers.ocf: Stale NFS file handle
total 0
Wed Feb  4 11:13:23 PST 2004
# toad8 /root > 

<SEE SECTIONS TITLED "LS AFTER RSYNC" AND "LS AGAIN AFTER >60secs">

Let me know if there's anything else you need.  As for updating bugzilla 
reports, I don't know how to do that...
Comment 7 Ben Woodard 2004-02-04 16:40:24 EST
Created attachment 97473 [details]
NFS packet trace
Comment 8 Ben Woodard 2004-02-04 16:41:08 EST
Created attachment 97474 [details]
2nd NFS packet trace
Comment 10 Neil Horman 2004-02-26 11:41:07 EST
I've also managed to recreate this problem, and I think my trace shows
the problem fairly well.  I took this trace from the NFS server in my
configuration, and preformed the following operations:

1) populated the test NFS mount directory with three files from the
NFS client

2) issued an ls -l from the client (which properly showed the files in
the directory)

3) removed the files from the NFS server and recreated them with rsync.

4) issued an ls -l from the client (which returned stale NFS file handles)

5) removed the files from the NFS server

6) issued an ls -l from the client (which showed zero files present)

7) re created the files on the server with rsync

8) issued an ls -l from the client (which showed the files present as
expected).

The trace indicates that at first we are able to successfully grab the
attributes of all the files (as expected), but after we remove and
repopulate the directory, whatever code determines the validity of the
directory entry gets confused, as it thinks the contents of the
directory have not changed.  Consequently when we preform the second
ls -l in step 4, we wind up issuing getattr requests with the same
file id's in them, but are returned ERR_STALE becuse "those files" no
longer exist. In the subsequent steps we remove the files on the
server again, and then force the NFS client to look at the directory
entry, which it then realizes has changed, and consequently another
READDIRPLUS command is issued and we correctly see zero files persent.
 A subsequent rsync of the directory from the NFS server causes
another  READDIRPLUS packet to be issued from the client on the final
ls -l and the result is the files are displayed properly.  As an
example frames 100 and 249 illustrate this progression.  We start at
frame 100 by issuing getattr requests on file id 7127438, and are
successful.  Following the file removal on the server and
repopulation, we see in frame 249 another getattr request with the
same file id.  It no longer exists on the server, but the client
doesn't know that because it hasn't invalidated its dentry cache entry
for this directory.  So teh server corerctly responds with ERR_STALE.
 Later we see the readdirplus comands issued when we force the client
to recognize the directory contents have changed, and we start using a
new set of file ids which return successful results on their getattr
requests.

It would seem to me that the appropriate thing to do here is when we
encounter a ERR_STALE on a given file handle in a directory we should
issue another readdirplus command to refresh the directory contents,
and retry the getattr request.  There may well be unintended
consequence of this that I am unaware of, but it seems like the thing
to do for this particular problem.
Comment 11 Neil Horman 2004-02-26 11:42:03 EST
Created attachment 98075 [details]
tcpdump capture file to go with my comments above

the tcpdump file with the test I described above.
Comment 12 Neil Horman 2004-02-26 14:29:29 EST
I should also note that I repeated the above test, but instead of
removing the files from the directory I added a new file to the
directory in step 5.   The results were the same, which further
suggests to me that the NFS client is not correctly invalidating
entries in the dentry cache.
Comment 13 William Shubert 2004-03-19 20:33:52 EST
I have what might be a similar bug. Kernel 2.4.21-9.0.1.ELsmp on both
NFS server and client (1 cpu w/HT on each, 512MB RAM on server, 1GB on
client). All standard NFS options, other than "no_root_quash" is
added. Mounted directory is EXT3, again all defaults other than I have
tried both with and without ACLs on (same results each way). I have a
repeatable failure. Do these on the NFS client:
   - "cd /path/to/nfs/dir"
   - "sudo bash" (ie, keep the shell in the NFS mounted dir)
   - "cd /path/to/non-nfs/dir"
   - start up a demon that does heavy non-NFS disk work on the client
(I start up a program that uses MySQL to do a 5-way join across tables
with millions of rows, returning about a million rows in the result
set. This takes 30secs to complete and will obviously put a lot of
pressure on the block buffer cache and the local disk).
   - When the heavy workload is done, exit from sudo shell
   - ls
Every time, ls says ".: stale NFS handle". If I then do "cd
/path/to/nfs/dir" again, everything is back to normal.

I have no idea why doing a heavy local disk I/O would make NFS do
this. I was getting intermittent NFS problems (stale handles, "ld"
reporting files not there when they clearly were, etc.), and was
suspecting hardware, but this is so repeatable that flaky memory
and/or network doesn't seem likely. I have tried both with and without
ACLs on the server (I was using ACLs, but thought maybe they were
confusing NFS, so I tried without them), same result. Rebooted both
machines several times, same result.

Does this sound like the same problem? Or totally unrelated? Should I
open up a separate bug report for it? And, what can I do to help get
more information on this? It is very frustrating, but when I searched
bugzilla I found this bug that sounds quite similar, in that it forces
NFS to give stale handle errors.
Comment 14 William Shubert 2004-03-20 20:08:26 EST
More from wms - I was able to isolate my problem down. Still not sure
if it is related to this one or not, but I have found a *very*
repeatable, *very* simple test case. I have submitted that as a
separate bug, complete with the ethereal binary dump of the traffic,
as bug #118816.
Comment 16 Steve Coile 2004-05-19 16:47:39 EDT
We are seeing the same problem.  RHEL 3 NFS server, current with
patches as of 5/17/04.  RHEL 3 client, newly-installed (not yet
updated).  Here's the scenerio:

- Update files under /home/someuser, local to NFS server.

- rsync files from /home/someuser to /shared/mountpoint/someuser, NFS
mounted on the NFS server from itself.

- ls /shared/mountpoint/someuser on the client box, get "Stale NFS
file handle" error only on the file(s) directly affected by the rsync.
 All other files displayed properly.

Problem also occurs on client if I rsync files to the local, exported
filesystem (/shared/export/mountpoint/someuser) instead of the NFS
mount on the NFS server.

During testing, I also got the NFS server itself to exhibit the
problem through the local mount (the server is a client to itself).  

Have to unmount and mount the NFS filesystem to eliminate the error,
which obviously isn't acceptable in production.

Repeating the process using cp instead of rsync does not result in the
error.

Problem does not occur with RHEL 2.1 client.
Comment 17 Ben Woodard 2004-05-19 21:11:55 EDT
Created attachment 100363 [details]
Current patch

This is the patch that we are currently running here at LLNL. It seems to work
but we are still evaluating it. We did find a bad interaction with the NFS>16
groups patch.
Comment 18 Ernie Petrides 2004-06-24 06:13:54 EDT
A fix for this problem has just been committed to the RHEL3 U3
patch pool this evening (in kernel version 2.4.21-15.17.EL).
Comment 19 John Flanagan 2004-09-02 00:31:02 EDT
An errata has been issued which should help the problem 
described in this bug report. This report is therefore being 
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files, 
please follow the link below. You may reopen this bug report 
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-433.html

Note You need to log in before you can comment on or make changes to this bug.