From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050513 Fedora/1.0.4-1.3.1 Firefox/1.0.4 Description of problem: I'm trying to set up a distributed system where multiple machines running the sun Java 1.5.0_03 jdk talk to a central NFSv4 export. Server side my export contains: /usr/local/somedir *(rw,async,fsid=0,no_subtree_check) Client side, it is mounted like so: mount -t nfs4 -o rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp someipnumber:/ /opt/somedir Everything appears to start up fine, but shortly into the execution of my app on the client, when I call tryLock on a file, I see this: java.io.IOException: Input/output error at sun.nio.ch.FileChannelImpl.lock0(Native Method) at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:822) Usually one or two tryLocks on different files will work. It seems like the third tryLock that gets executed runs into this. (Even when it's on a file that hasn't attempted to be locked before) Serverside I see this in /var/log/messages: Jun 17 13:48:46 g5 kernel: NFSD: preprocess_seqid_op: no stateowner or nfs4_stateid! also sometimes I see this: Jun 17 11:06:07 g5 kernel: lease broken - owner pid = 3067 I've tried an assortment of client side changes to the mount command and I've also tried a number of different tweaks to /etc/exports... nothing seems to help. This exact same code executes flawlessly on NFSv3, but I'd really like to use NFSv4 for the improved locking. I've tried this on a clean FC3 box, up2dated and also on a clean FC4 box up2dated, in both scenerios I run into this problem. Also, each of the client machines have a unique hostname, and they resolve their hostname to a unique IP number. This happens even if only one computer has the NFS export mounted, and with only one JVM interacting with that mount. Is there something obvious I'm missing here or is something broken? Thanks for your help, Clint Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1. setup NFSv4 export 2. mount NFSv4 export on another machine 3. use java to lock and unlock files a few times Additional info:
Created attachment 115646 [details] bzipped tethereal -w This is a ethereal dump of the dialog that ends up giving me the IO errors on the NFS Server.
This java code will duplicate the issue every time File fa = new File("/mnt/nfs4/a/b/c/file.txt"); RandomAccessFile rafa = new RandomAccessFile(fa, "r"); FileChannel fca = rafa.getChannel(); FileLock locka = null; locka = fca.tryLock(0, Long.MAX_VALUE, true); FileInputStream fisa = new FileInputStream(fa); byte[] data = new byte[(int)fa.length()]; fisa.read(data); try { fisa.close(); } catch(Throwable t) { System.err.println("Unable to close fis"); t.printStackTrace(); } try { locka.release(); } catch(Throwable t) { System.err.println("Unable to release lock"); t.printStackTrace(); } try { rafa.close(); } catch(Throwable t) { System.err.println("Unable to close RandomAccessFile"); t.printStackTrace(); } File fb = new File("/mnt/nfs4/a/b/c/file.txt"); RandomAccessFile rafb = new RandomAccessFile(fb, "r"); FileChannel fcb = rafb.getChannel(); FileLock lockb = null; fcb.tryLock(0, Long.MAX_VALUE, true); //The error happens here.
Forgot to mention, this happens with either the sun 1.5.0_03 jdk or the sun 1.4.2_08 jdk
Here's some more weird fruit to add to the mix. if /etc/hosts on the client looks like: 127.0.0.1 apollo.thtoolbox.com apollo localhost.localdomain localhost the above code will work _without errors_, however, the nfs export can only be mounted by the first box. When mounted by the second box in this mode, on the client you see: mount: File exists on the nfs server you see: kernel: NFSD: setclientid: string in use by client(clientid 42b3483c/00000004) -- Note: Is this a seperate bug? Should it be logged as such? if I change the /etc/hosts on the client to look like 192.168.0.149 apollo.thtoolbox.com apollo 127.0.0.1 localhost.localdomain localhost as suggested by http://groups-beta.google.com/group/linux.debian.bugs.dist/msg/86844812715c2ab2?hl=en I can now mount the nfs export on multiple machines but I get the locking error described in this bug (even when only mounted on the one box).
As an additional set of comments on this, I've discovered that if I open the RandomAccessFile in "rw" mode instead of "r" mode that this code will work. It seems really odd that I would have to get read and write access to a file to lock it for only a shared read lock. This solution will probably work for me, but it's less than ideal. Hopefully this info will lend itself to fixing the issue.
This should be fixed in FC4, please upgrade since its not likely we will backport the needed patches.