Below is a bug report that I'd previously (last spring) sent to unfsd.de. I later tried sending it directly to the maintainer. I have not received any response and my patch for this problem has not appeared in any later releases of the nfs server. I'm now running RedHat 5.2 and the problem still exists. I've adapted my patch for nfs-server-2.2beta37 which I'll send to BugZilla once a bug ID has been assigned... .... I have found and fixed some interoperability problems between Solaris 2.5 and linux. I'm running a heavily upgraded Slackware 3.0 system w/ kernel 2.0.33. My rpc.nfsd and rpc.mountd are from nfs-server-2.2beta29. (Though, until very recently, I was running an older version, but I am not sure which.) The linux machine is called redrock and the Solaris machine is called saguaro. The solaris machine is running Solaris 2.5 with the recommended patches. There are two distinct problems I've found are related to the caching of file handles. Here is a sample session illustrating these problems. (I can also provide debugging output from nfsd if necessary, but I believe that I will be able to adequately summarize the problem.) The following session is run from the Solaris box. The machine redrock is my linux box. # mount redrock:/redrock1 /redrock1 # exit saguaro:kev$ cd /redrock1/netstuff saguaro:netstuff$ mkdir test saguaro:netstuff$ cd test saguaro:test$ echo 'This is the foo file!' > foo saguaro:test$ cat foo This is the foo file! saguaro:test$ ls -l total 1 -rw-r--r-- 1 kev staff 22 Apr 1 11:31 foo saguaro:test$ cd .. saguaro:netstuff$ mv test test2 saguaro:netstuff$ cd test2 saguaro:test2$ cat foo cat: cannot open foo saguaro:test2$ cd .. saguaro:netstuff$ mv test2 test saguaro:netstuff$ cd test saguaro:test$ cat foo This is the foo file! So the problem is that we were unable to open 'foo' after directory containing 'foo' was renamed from test to test2. Yet when we renamed it back, the directory could be found. The reason for the bug is as follows... The linux nfsd has a file handle cache. (In NFS V2, file handles are 32 byte opaque objects, i.e, the guts have meaning to the server, but not to the client.) This cache associates file handles with information about the actual file, including the path name. After the rename operation of the containing directory occurs (i.e, test -> test2), the pathname associated with foo is still /redrock1/netstuff/test/foo, not /redrock1/netstuff/test2/foo. This causes fhc_getattr to fail when attempting the lstat() call -- because it is being called with a pathname which no longer exists. My solution to this problem is to attempt to rebuild the path name when the lstat() call in fhc_getattr fails. The lstat() call is then retried. If lstat() still gives an error condition, we return as before. The second problem is more subtle and concerns the client side cache. Continuing the above session... saguaro:test$ ln -s foo bar saguaro:test$ cat bar This is the foo file! saguaro:test$ ls -l total 1 lrwxrwxrwx 1 kev staff 3 Apr 1 11:32 bar -> foo -rw-r--r-- 1 kev staff 22 Apr 1 11:31 foo saguaro:test$ rm bar saguaro:test$ echo 'This is the bar file' >bar saguaro:test$ ls -l bar -rw-r--r-- 1 kev staff 0 Apr 1 11:32 bar saguaro:test$ cat bar cat: cannot open bar saguaro:test$ cat foo This is the foo file! I believe what is happening above is that the Solaris side is doing caching of its own. I don't know the specifics, but it appears that at the very least it is associating the file handle with information about the file's type. (I don't know this for certain, since I have not seen the Solaris code.) In any event, the inode that Solaris reports via 'ls -i' is the same for foo both used as a symbolic link and as a normal file. (If the inodes are different, the problem doesn't arise.) I solved this problem by encoding the file type in the file handle. This way the solaris machine is given distinct file handles for different file types even if the inode numbers and file names are the same. So be warned! The code which I'm submitting in the patch doesn't look like it's doing much, but it is! It's making sure that filehandles with the same pseudo inodes and hash paths are different if the file type is different. BTW, I first noticed these problems when attempting to build gcc-2.7.2.3 on an NFS mounted partition on my linux box from Solaris. That is to say, I was building gcc on Solaris, for Solaris, but with my cwd set to a directory on my linux machine. I was getting a failure part way through the stage 2 build resembling the symbolic link problem illustrated above. I have built gcc in this fashion twice with my patches installed without incident. With my patches installed for both nfsd and mountd, the above examples work properly: # umount /redrock1 # mount redrock:/redrock1 /redrock1 # exit saguaro:kev$ cd /redrock1/netstuff saguaro:netstuff$ mkdir test saguaro:netstuff$ cd test saguaro:test$ echo 'This is the foo file!' > foo saguaro:test$ cat foo This is the foo file! saguaro:test$ ls -l total 1 -rw-r--r-- 1 kev staff 22 Apr 1 12:11 foo saguaro:test$ cd .. saguaro:netstuff$ mv test test2 saguaro:netstuff$ cd test2 saguaro:test2$ cat foo This is the foo file! saguaro:test2$ cd .. saguaro:netstuff$ mv test2 test saguaro:netstuff$ cd test saguaro:test$ cat foo This is the foo file! saguaro:test$ ln -s foo bar saguaro:test$ cat bar This is the foo file! saguaro:test$ ls -l total 1 lrwxrwxrwx 1 kev staff 3 Apr 1 12:12 bar -> foo -rw-r--r-- 1 kev staff 22 Apr 1 12:11 foo saguaro:test$ ls -i bar 556210367 bar saguaro:test$ rm bar saguaro:test$ echo 'This is the bar file' >bar saguaro:test$ ls -l bar -rw-r--r-- 1 kev staff 21 Apr 1 12:12 bar saguaro:test$ ls -i bar 556210367 bar saguaro:test$ cat bar This is the bar file saguaro:test$
I've been having very similar problems with Solaris 2.5.1 (with recommended patches) as a server and Red Hat 5.2 as a client. It appears that these problems don't occur with vanilla Red Hat 5.0, but they do occur when I've installed the nfs updates. For the moment, I'm living with Red Hat 5.0.
unfsd is long retired