845 – Linux / Solaris NFS interoperability problems

Bug 845 - Linux / Solaris NFS interoperability problems

Summary: Linux / Solaris NFS interoperability problems

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	nfs-server
Sub Component:
Version:	5.2
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Michael K. Johnson
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	1999-01-15 23:04 UTC by kev
Modified:	2008-05-01 15:37 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2002-12-15 04:11:12 UTC
Embargoed:

Attachments	(Terms of Use)

Description kev 1999-01-15 23:04:46 UTC

Below is a bug report that I'd previously (last spring) sent
to unfsd.de.  I later tried sending it directly to
the
maintainer.  I have not received any response and my patch
for this
problem has not appeared in any later releases of the nfs
server.

I'm now running RedHat 5.2 and the problem still exists.
I've adapted
my patch for nfs-server-2.2beta37 which I'll send to
BugZilla once a
bug ID has been assigned...

....

I have found and fixed some interoperability problems
between Solaris 2.5
and linux.

I'm running a heavily upgraded Slackware 3.0 system w/
kernel 2.0.33.  My
rpc.nfsd and rpc.mountd are from nfs-server-2.2beta29.
(Though, until
very recently, I was running an older version, but I am not
sure which.)
The linux machine is called redrock and the Solaris machine
is called
saguaro.

The solaris machine is running Solaris 2.5 with the
recommended patches.

There are two distinct problems I've found are related to
the caching
of file handles.  Here is a sample session illustrating
these
problems.  (I can also provide debugging output from nfsd if
necessary, but I believe that I will be able to adequately
summarize
the problem.)

The following session is run from the Solaris box.  The
machine redrock is
my linux box.

    # mount redrock:/redrock1 /redrock1
    # exit
    saguaro:kev$ cd /redrock1/netstuff
    saguaro:netstuff$ mkdir test
    saguaro:netstuff$ cd test
    saguaro:test$ echo 'This is the foo file!' > foo
    saguaro:test$ cat foo
    This is the foo file!
    saguaro:test$ ls -l
    total 1
    -rw-r--r--   1 kev      staff          22 Apr  1 11:31
foo
    saguaro:test$ cd ..
    saguaro:netstuff$ mv test test2
    saguaro:netstuff$ cd test2
    saguaro:test2$ cat foo
    cat: cannot open foo
    saguaro:test2$ cd ..
    saguaro:netstuff$ mv test2 test
    saguaro:netstuff$ cd test
    saguaro:test$ cat foo
    This is the foo file!

So the problem is that we were unable to open 'foo' after
directory
containing 'foo' was renamed from test to test2.  Yet when
we renamed
it back, the directory could be found.

The reason for the bug is as follows...

The linux nfsd has a file handle cache.  (In NFS V2, file
handles are 32
byte opaque objects, i.e, the guts have meaning to the
server, but not
to the client.) This cache associates file handles with
information
about the actual file, including the path name.  After the
rename
operation of the containing directory occurs (i.e, test ->
test2), the
pathname associated with foo is still
/redrock1/netstuff/test/foo, not
/redrock1/netstuff/test2/foo.  This causes fhc_getattr to
fail when
attempting the lstat() call -- because it is being called
with a
pathname which no longer exists.

My solution to this problem is to attempt to rebuild the
path name
when the lstat() call in fhc_getattr fails.  The lstat()
call is then
retried.  If lstat() still gives an error condition, we
return as
before.

The second problem is more subtle and concerns the client
side cache.
Continuing the above session...

    saguaro:test$ ln -s foo bar
    saguaro:test$ cat bar
    This is the foo file!
    saguaro:test$ ls -l
    total 1
    lrwxrwxrwx   1 kev      staff           3 Apr  1 11:32
bar -> foo
    -rw-r--r--   1 kev      staff          22 Apr  1 11:31
foo
    saguaro:test$ rm bar
    saguaro:test$ echo 'This is the bar file' >bar
    saguaro:test$ ls -l bar
    -rw-r--r--   1 kev      staff           0 Apr  1 11:32
bar
    saguaro:test$ cat bar
    cat: cannot open bar
    saguaro:test$ cat foo
    This is the foo file!

I believe what is happening above is that the Solaris side
is doing
caching of its own.  I don't know the specifics, but it
appears that
at the very least it is associating the file handle with
information
about the file's type.  (I don't know this for certain,
since I have
not seen the Solaris code.)  In any event, the inode that
Solaris
reports via 'ls -i' is the same for foo both used as a
symbolic link
and as a normal file.  (If the inodes are different, the
problem
doesn't arise.)

I solved this problem by encoding the file type in the file
handle.
This way the solaris machine is given distinct file handles
for
different file types even if the inode numbers and file
names are the
same.  So be warned!  The code which I'm submitting in the
patch
doesn't look like it's doing much, but it is!  It's making
sure that
filehandles with the same pseudo inodes and hash paths are
different
if the file type is different.

BTW, I first noticed these problems when attempting to build
gcc-2.7.2.3 on an NFS mounted partition on my linux box from
Solaris.
That is to say, I was building gcc on Solaris, for Solaris,
but with
my cwd set to a directory on my linux machine.  I was
getting a
failure part way through the stage 2 build resembling the
symbolic
link problem illustrated above.  I have built gcc in this
fashion
twice with my patches installed without incident.

With my patches installed for both nfsd and mountd, the
above examples
work properly:

    # umount /redrock1
    # mount redrock:/redrock1 /redrock1
    # exit
    saguaro:kev$ cd /redrock1/netstuff
    saguaro:netstuff$ mkdir test
    saguaro:netstuff$ cd test
    saguaro:test$ echo 'This is the foo file!' > foo
    saguaro:test$ cat foo
    This is the foo file!
    saguaro:test$ ls -l
    total 1
    -rw-r--r--   1 kev      staff          22 Apr  1 12:11
foo
    saguaro:test$ cd ..
    saguaro:netstuff$ mv test test2
    saguaro:netstuff$ cd test2
    saguaro:test2$ cat foo
    This is the foo file!
    saguaro:test2$ cd ..
    saguaro:netstuff$ mv test2 test
    saguaro:netstuff$ cd test
    saguaro:test$ cat foo
    This is the foo file!
    saguaro:test$ ln -s foo bar
    saguaro:test$ cat bar
    This is the foo file!
    saguaro:test$ ls -l
    total 1
    lrwxrwxrwx   1 kev      staff           3 Apr  1 12:12
bar -> foo
    -rw-r--r--   1 kev      staff          22 Apr  1 12:11
foo
    saguaro:test$ ls -i bar
    556210367 bar
    saguaro:test$ rm bar
    saguaro:test$ echo 'This is the bar file' >bar
    saguaro:test$ ls -l bar
    -rw-r--r--   1 kev      staff          21 Apr  1 12:12
bar
    saguaro:test$ ls -i bar
    556210367 bar
    saguaro:test$ cat bar
    This is the bar file
    saguaro:test$

Comment 1 borchers 1999-01-21 01:04:59 UTC

I've been having very similar problems with Solaris 2.5.1 (with
recommended patches) as a server and Red Hat 5.2 as a client.  It
appears that these problems don't occur with vanilla Red Hat 5.0, but
they do occur when I've installed the nfs updates.  For the moment,
I'm living with Red Hat 5.0.

Comment 2 Alan Cox 2002-12-15 04:11:12 UTC

unfsd is long retired

Note You need to log in before you can comment on or make changes to this bug.