When trying to lock files accross nfs (Linux Kernel: 2.2.12-20) noticed that Linux is reading out of it's own cache and not explicitly fetching the data from the source file accross nfs. This is evident in the snoop output where a lock request is In this test, both pugsley (10.161.6.56) and munster (10.161.4.63) are Linux RedHat 6.1 workstations. mrcoffee is a NetApp filer running an old release of DATA ontap. The symptoms outlined in Network Appliance case 102681 are being experienced, and this is the snoop trace. Following is just an example of what is going on. There is a lot of stuff like this in the snoop trace, but this is only one occurrence. Both linux systems are mounting with the "noac" option to prevent attribute caching from hiding updated mtimes on the file. Here, munster has done a write request. 10.161.6.56 is attempting to get a lock on the file. 408 0.00066 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NFS C WRITE2 FH=9A AA at 0 for 4 409 0.00094 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NFS R WRITE2 OK You will see a lot of lock contention since the clients are each running 5 processes, and the processes are not blocking (which would be preferable) but are running in wait loops instead. 410 0.00068 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C UNLOCK1 OH=D8D7 FH=9AAA PID=1466 Region=0:1 411 0.00221 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=C730 denied 412 0.00029 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CA30 FH=126F PID=32710 Region=0:1 413 0.00280 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CB30 FH=126F PID=32713 Region=0:1 414 0.00185 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=C830 denied 415 0.00436 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=C930 denied 416 0.00079 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DCD7 denied 417 0.00076 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=D9D7 FH=126F PID=1467 Region=0:1 418 0.00013 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DDD7 denied 419 0.00066 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=DAD7 FH=126F PID=1464 Region=0:1 420 0.00013 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DED7 denied 421 0.00075 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=DBD7 FH=126F PID=1463 Region=0:1 422 0.00013 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DFD7 denied 423 0.00039 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R UNLOCK1 OH=D8D7 granted 424 0.00022 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E4D7 FH=126F PID=1465 Region=0:1 425 0.00048 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C UNLOCK1 OH=E5D7 FH=126F PID=1466 Region=0:1 426 0.00441 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=CA30 denied 427 0.00104 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CC30 FH=126F PID=32710 Region=0:1 428 0.00020 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CD30 FH=126F PID=32711 Region=0:1 429 0.00026 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CE30 FH=126F PID=32712 Region=0:1 430 0.00359 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=CB30 denied 431 0.00080 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=D9D7 denied 432 0.00073 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E6D7 FH=126F PID=1467 Region=0:1 433 0.00014 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DAD7 denied 434 0.00069 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E7D7 FH=126F PID=1464 Region=0:1 435 0.00013 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=DBD7 denied 436 0.00077 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E0D7 FH=126F PID=1463 Region=0:1 437 0.00018 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R LOCK1 OH=E4D7 denied 438 0.00054 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NLM R UNLOCK1 OH=E5D7 granted munster just released it's lock on the file. pugsley is about to get the lock. 439 0.00023 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E1D7 FH=126F PID=1465 Region=0:1 440 0.00038 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NFS C LOOKUP2 FH=3283 .foo 441 0.00066 mrcoffee.hq.netapp.com -> munster.hq.netapp.com NFS R LOOKUP2 OK FH=126F 442 0.00069 munster.hq.netapp.com -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=E2D7 FH=126F PID=1466 Region=0:1 443 0.00545 10.161.6.56 -> mrcoffee.hq.netapp.com NLM C LOCK1 OH=CF30 FH=126F PID=32713 Region=0:1 444 0.00015 mrcoffee.hq.netapp.com -> 10.161.6.56 NLM R LOCK1 OH=CC30 granted Now pugsley has the lock. But wait, what does it do next? Where is the READ request? 445 0.00035 10.161.6.56 -> mrcoffee.hq.netapp.com NFS C WRITE2 FH=9AAA at 0 f or 4 pugsley (10.161.6.56) just trusted it's own buffering of the file to do the write. It did not issue a read request in order to get the correct contents in the file before proceeding. If you compile the test program on Solaris, you find that it works flawlessly. You also find that in situations such as this, a READ request is always issued by the client who has obtained the lock. One last note. The correct way to do something like this is not by sharing a file like this. It is dangerous considering all of the NFS clientside issues that still exist in this sort of situation. The correct way is to keep a single process watching on a TCP socket for requests to increment the number, or read the current number. Such a server would also prevent all of the obvious network spam caused by the locking contention we see in the network trace.
Created attachment 155 [details] This is the program that the customer is running accross nfs to exhibit the problem.
This bug was opened March 3, 2000. Not a single comment or action appears to have been taken on this issue. Will someone read this bug report and take some sort of action please?
Stuff did happen, but the bugs in question were handled in the community rather than just by Red Hat. Newer Linuxen are aware of the fcntl() file locking/buffering conventions SunOS introduced