Bug 160851

Summary: NFSv4 locks appear broken -- kernel: lease broken & NFSD: preprocess_seqid_op: no stateowner or nfs4_stateid
Product: [Fedora] Fedora Reporter: Clint Goudie <clint>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED RAWHIDE QA Contact: Ben Levenson <benl>
Severity: high Docs Contact:
Priority: medium    
Version: 3   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-09-06 08:47:52 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Description Flags
bzipped tethereal -w none

Description Clint Goudie 2005-06-17 16:04:27 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4

Description of problem:
I'm trying to set up a distributed system where multiple machines running the sun Java 1.5.0_03 jdk talk to a central NFSv4 export.

Server side my export contains: 

/usr/local/somedir           *(rw,async,fsid=0,no_subtree_check)

Client side, it is mounted like so: 

mount -t nfs4 -o rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp someipnumber:/ /opt/somedir

Everything appears to start up fine, but shortly into the execution of my app on the client, when I call tryLock on a file, I see this:

java.io.IOException: Input/output error
        at sun.nio.ch.FileChannelImpl.lock0(Native Method)
        at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822)

Usually one or two tryLocks on different files will work. It seems like the third tryLock that gets executed runs into this. (Even when it's on a file that hasn't attempted to be locked before)

Serverside I see this in /var/log/messages:

Jun 17 13:48:46 g5 kernel: NFSD: preprocess_seqid_op: no stateowner or nfs4_stateid!

also sometimes I see this:

Jun 17 11:06:07 g5 kernel: lease broken - owner pid = 3067

I've tried an assortment of client side changes to the mount command and I've also tried a number of different tweaks to /etc/exports... nothing seems to help.

This exact same code executes flawlessly on NFSv3, but I'd really like to use NFSv4 for the improved locking.

I've tried this on a clean FC3 box, up2dated and also on a clean FC4 box up2dated, in both scenerios I run into this problem.

Also, each of the client machines have a unique hostname, and they resolve their hostname to a unique IP number.

This happens even if only one computer has the NFS export mounted, and with only one JVM interacting with that mount.

Is there something obvious I'm missing here or is something broken?

Thanks for your help,


Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. setup NFSv4 export
2. mount NFSv4 export on another machine
3. use java to lock and unlock files a few times

Additional info:
Comment 1 Clint Goudie 2005-06-17 19:17:46 EDT
Created attachment 115646 [details]
bzipped tethereal -w

This is a ethereal dump of the dialog that ends up giving me the IO errors on
the NFS Server.
Comment 2 Clint Goudie 2005-06-20 12:24:04 EDT
This java code will duplicate the issue every time

File fa = new File("/mnt/nfs4/a/b/c/file.txt");
RandomAccessFile rafa = new RandomAccessFile(fa, "r");
FileChannel fca = rafa.getChannel();
FileLock locka = null;
locka = fca.tryLock(0, Long.MAX_VALUE, true);

FileInputStream fisa = new FileInputStream(fa);

byte[] data = new byte[(int)fa.length()];

catch(Throwable t)
	System.err.println("Unable to close fis");

catch(Throwable t)
	System.err.println("Unable to release lock");
catch(Throwable t)
	System.err.println("Unable to close RandomAccessFile");

File fb = new File("/mnt/nfs4/a/b/c/file.txt");
RandomAccessFile rafb = new RandomAccessFile(fb, "r");
FileChannel fcb = rafb.getChannel();
FileLock lockb = null;
fcb.tryLock(0, Long.MAX_VALUE, true); //The error happens here.
Comment 3 Clint Goudie 2005-06-20 12:25:28 EDT
Forgot to mention, this happens with either the sun 1.5.0_03 jdk or the sun
1.4.2_08 jdk
Comment 4 Clint Goudie 2005-06-20 13:20:18 EDT
Here's some more weird fruit to add to the mix. 

if /etc/hosts on the client looks like:       apollo.thtoolbox.com apollo     localhost.localdomain   localhost

the above code will work _without errors_, however, the nfs export can only be
mounted by the first box. When mounted by the second box in this mode, on the
client you see:

mount: File exists

on the nfs server you see:

kernel: NFSD: setclientid: string in use by client(clientid 42b3483c/00000004)

-- Note: Is this a seperate bug? Should it be logged as such?

if I change the /etc/hosts on the client to look like       apollo.thtoolbox.com apollo       localhost.localdomain   localhost

as suggested by

I can now mount the nfs export on multiple machines but I get the locking error
described in this bug (even when only mounted on the one box).
Comment 5 Clint Goudie 2005-06-21 19:59:02 EDT
As an additional set of comments on this, I've discovered that if I open the
RandomAccessFile in "rw" mode instead of "r" mode that this code will work. It
seems really odd that I would have to get read and write access to a file to
lock it for only a shared read lock. This solution will probably work for me,
but it's less than ideal.

Hopefully this info will lend itself to fixing the issue.
Comment 6 Steve Dickson 2005-09-06 08:47:52 EDT
This should be fixed in FC4, please upgrade since its
not likely we will backport the needed patches.