Bug 160851 - NFSv4 locks appear broken -- kernel: lease broken & NFSD: preprocess_seqid_op: no stateowner or nfs4_stateid
Summary: NFSv4 locks appear broken -- kernel: lease broken & NFSD: preprocess_seqid_op...
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 3
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Ben Levenson
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-17 20:04 UTC by Clint Goudie
Modified: 2007-11-30 22:11 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-09-06 12:47:52 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
bzipped tethereal -w (551.96 KB, application/octet-stream)
2005-06-17 23:17 UTC, Clint Goudie
no flags Details

Description Clint Goudie 2005-06-17 20:04:27 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
I'm trying to set up a distributed system where multiple machines running the sun Java 1.5.0_03 jdk talk to a central NFSv4 export.

Server side my export contains: 

/usr/local/somedir           *(rw,async,fsid=0,no_subtree_check)

Client side, it is mounted like so: 

mount -t nfs4 -o rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp someipnumber:/ /opt/somedir

Everything appears to start up fine, but shortly into the execution of my app on the client, when I call tryLock on a file, I see this:

java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)

Usually one or two tryLocks on different files will work. It seems like the third tryLock that gets executed runs into this. (Even when it's on a file that hasn't attempted to be locked before)

Serverside I see this in /var/log/messages:

Jun 17 13:48:46 g5 kernel: NFSD: preprocess_seqid_op: no stateowner or nfs4_stateid!

also sometimes I see this:

Jun 17 11:06:07 g5 kernel: lease broken - owner pid = 3067

I've tried an assortment of client side changes to the mount command and I've also tried a number of different tweaks to /etc/exports... nothing seems to help.

This exact same code executes flawlessly on NFSv3, but I'd really like to use NFSv4 for the improved locking.

I've tried this on a clean FC3 box, up2dated and also on a clean FC4 box up2dated, in both scenerios I run into this problem.

Also, each of the client machines have a unique hostname, and they resolve their hostname to a unique IP number.

This happens even if only one computer has the NFS export mounted, and with only one JVM interacting with that mount.

Is there something obvious I'm missing here or is something broken?

Thanks for your help,

Clint

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. setup NFSv4 export
2. mount NFSv4 export on another machine
3. use java to lock and unlock files a few times
  

Additional info:

Comment 1 Clint Goudie 2005-06-17 23:17:46 UTC
Created attachment 115646 [details]
bzipped tethereal -w

This is a ethereal dump of the dialog that ends up giving me the IO errors on
the NFS Server.

Comment 2 Clint Goudie 2005-06-20 16:24:04 UTC
This java code will duplicate the issue every time

File fa = new File("/mnt/nfs4/a/b/c/file.txt");
RandomAccessFile rafa = new RandomAccessFile(fa, "r");
FileChannel fca = rafa.getChannel();
FileLock locka = null;
locka = fca.tryLock(0, Long.MAX_VALUE, true);

FileInputStream fisa = new FileInputStream(fa);

byte[] data = new byte[(int)fa.length()];
fisa.read(data);

try
{
	fisa.close();
}
catch(Throwable t)
{
	System.err.println("Unable to close fis");
	t.printStackTrace();
}


try
{
	locka.release();
}
catch(Throwable t)
{
	System.err.println("Unable to release lock");
	t.printStackTrace();
}
try
{
	rafa.close();
}
catch(Throwable t)
{
	System.err.println("Unable to close RandomAccessFile");
	t.printStackTrace();
}

File fb = new File("/mnt/nfs4/a/b/c/file.txt");
RandomAccessFile rafb = new RandomAccessFile(fb, "r");
FileChannel fcb = rafb.getChannel();
FileLock lockb = null;
fcb.tryLock(0, Long.MAX_VALUE, true); //The error happens here.


Comment 3 Clint Goudie 2005-06-20 16:25:28 UTC
Forgot to mention, this happens with either the sun 1.5.0_03 jdk or the sun
1.4.2_08 jdk

Comment 4 Clint Goudie 2005-06-20 17:20:18 UTC
Here's some more weird fruit to add to the mix. 

if /etc/hosts on the client looks like: 

127.0.0.1       apollo.thtoolbox.com apollo     localhost.localdomain   localhost

the above code will work _without errors_, however, the nfs export can only be
mounted by the first box. When mounted by the second box in this mode, on the
client you see:

mount: File exists

on the nfs server you see:

kernel: NFSD: setclientid: string in use by client(clientid 42b3483c/00000004)

-- Note: Is this a seperate bug? Should it be logged as such?

if I change the /etc/hosts on the client to look like 

192.168.0.149       apollo.thtoolbox.com apollo     
127.0.0.1       localhost.localdomain   localhost

as suggested by
http://groups-beta.google.com/group/linux.debian.bugs.dist/msg/86844812715c2ab2?hl=en

I can now mount the nfs export on multiple machines but I get the locking error
described in this bug (even when only mounted on the one box).

Comment 5 Clint Goudie 2005-06-21 23:59:02 UTC
As an additional set of comments on this, I've discovered that if I open the
RandomAccessFile in "rw" mode instead of "r" mode that this code will work. It
seems really odd that I would have to get read and write access to a file to
lock it for only a shared read lock. This solution will probably work for me,
but it's less than ideal.

Hopefully this info will lend itself to fixing the issue.

Comment 6 Steve Dickson 2005-09-06 12:47:52 UTC
This should be fixed in FC4, please upgrade since its
not likely we will backport the needed patches.


Note You need to log in before you can comment on or make changes to this bug.