Our application creates large binary files. To create these files we first create a zero length file and lock it (fcntl(... SETLK)). Then we write a second data file. When the write is complete we fsync the data file and then close both file descriptors. This should release the lock. When a large amount of data is written quickly, the lock is not released (about 100 MB is enough). I've generated this bug with the Linux box at kernel 2.2.12-20 and 2.2.12-32. I've used data servers on Solaris 2.5.1, Solaris 2.7 and HP-UX 11.00. By the way, our Solaris boxes do have patch 105299-02 installed (sunsolve ID 4071076) so this is not a repeat of that problem (thank you for referencing that in the other bug reports it did solve another problem I was having). Please contact me at lawrence and I will provide a full testcase via ftp. Jay Lawrence
Created attachment 1814 [details] tar file of test case, please read README file, modify Makefile macro and type make
Further testing has been done. If the remote directory is on HP 10.20 it also fails. If the remote directory is on another Linux 2.2.12 machine or a Network Appliance file server the problem does not occur. The NetApp might be an anomoly because it is on a gigabit network and may be consuming the data VERY fast coming off my 100TX linux machines.
assigned to johnsonm
I tried to append this previously but it somehow disappeared.... A customer of mine had his remote file systems mounted with mount -o rw,nolock .... In this configuration the problem did NOT occur. In my opinion this is a crazy way to mount a file system since it bypassed NFS locking all together and could lead to corruption problems, but it did serve to isolate the bug to NFS locking. With 'nolock' the lock is maintained locally on the client and the problem did not occur. Jay
Hello, Has there been any progress on this issue? Thanks, Michael
Could you try our 2.2.19 errata kernel? it has majorly revamped NFS code, including NFSv3 support.